I recently had a server with some bandwidth limitations (tested using scp
and rsync -P
), where I was wondering if the problem was the data being transferred, or the server's link speed.
The simplest way to debug and verify TCP performance is to install iperf3
and run an iperf speed test between the server and my computer.
On the server, you run iperf3 -s
, and on my computer, iperf3 -c [server ip]
.
But iperf3
requires port 5201 (by default) to be open on the server, and in many cases—especially if the server is inside a restricted environment and only accessible through SSH (e.g. through a bastion or limited to SSH connectivity only)—you won't be able to get that port accessible.
So in my case, I wanted to run iperf through an SSH tunnel. This isn't ideal, because you're testing the TCP performance through an encrypted connection. But in this case both the server and my computer are extremely new/fast, so I'm not too worried about the overhead lost to the connection encryption, and my main goal was to get a performance baseline.
Without further ado, here's how I set up the SSH-tunneled iperf3 run:
On my machine, I set up a tunnel for port 7001:
$ ssh -p [ssh port on server] -L7001:localhost:7001 jeffgeerling@[server ip]
Then, SSH'ed into the server, I started an instance of iperf3
listening on port 7001:
$ iperf3 -s -p 7001
Finally, on my machine, I ran some iperf3 tests:
$ iperf3 -c localhost -p 7001
Connecting to host localhost, port 7001
[ 7] local ::1 port [port here] connected to ::1 port 7001
[ ID] Interval Transfer Bitrate
[ 7] 0.00-1.00 sec 23.9 MBytes 200 Mbits/sec
[ 7] 1.00-2.00 sec 20.9 MBytes 176 Mbits/sec
This answer on the Unix stack exchange was helpful in writing this post, and it also goes one level deeper showing how to tunnel through a bastion server instead of just directly through SSH to the server.
Comments
I'd be curious if you saw a noticeable speed difference when using stunnel compared to ssh. Years ago we ran into libssh read function limit when doing rsync over ssh and found stunnel to not have that limitation (allowing almost full line speed when taking into account the speed of light over distance).
Running iperf through SSH means you include the SSH overhead in the test. And it will put a considerable workload on the SSH servers. I would love to see the top output during the exercise.
I ran top and atop, and CPU usage never went above 12-13% on either machine (one was Intel Xeon Gold, 1-2 generations behind, other was my MacBook Air M2).
Wouldn't that be 100% per core? If so it would still indicate a CPU bottleneck because encryption is not that parallel. Last time I checked, there was a huge difference across the cipher suites. Here's an edited output from an example run I had:
```
% for cipher in $(ssh -Q cipher); do echo $cipher; ssh -F /dev/null -c $cipher HOST "cat seqwrite.1.0" | pv > /dev/null; done
3des-cbc
2GiB 0:01:40 [20.3MiB/s]
blowfish-cbc
2GiB 0:00:25 [81.9MiB/s]
cast128-cbc
2GiB 0:00:27 [74.9MiB/s]
aes128-cbc
2GiB 0:00:08 [ 240MiB/s]
aes192-cbc
2GiB 0:00:08 [ 233MiB/s]
aes256-cbc
2GiB 0:00:09 [ 212MiB/s]
aes128-ctr
2GiB 0:00:05 [ 361MiB/s]
aes192-ctr
2GiB 0:00:06 [ 336MiB/s]
aes256-ctr
2GiB 0:00:05 [ 364MiB/s]
[email protected]
2GiB 0:00:04 [ 437MiB/s]
[email protected]
2GiB 0:00:04 [ 426MiB/s]
[email protected]
2GiB 0:00:11 [ 172MiB/s]
```
Different distros seem to have different defaults over time as well.