I noticed on one of the CI servers I'm running that the .ssh/known_hosts
file had ballooned up to over 1,000,000 lines!
Looking into the root cause (I tail
ed the file until I could track down a few jobs that ran every minute), I found that there was the following line in a setup script:
ssh-keyscan -t rsa github.com >> /var/lib/jenkins/.ssh/known_hosts
"This can't be good!" I told myself, and I decided to add a condition to make it idempotent (that is, able to be run once or one million times but only affecting change the first time it's run—basically, a way to change something only if the change is required):
if ! grep -q "^github.com" /var/lib/jenkins/.ssh/known_hosts; then
ssh-keyscan -t rsa github.com >> /var/lib/jenkins/.ssh/known_hosts
fi
Now the host key for github.com is only scanned once the first time that script runs, and it is only stored in known_hosts one time for the host github.com... instead of millions of times!
Note: The above test won't work if you have
HashKnownHosts
enabled—which is the default on Debian 9, at least. You should use the testssh-keygen -H -F github.com
instead.
Comments
You don't actually need the `-H` parameter to ssh-keygen when searching a hashed file with `-F`.
Arguably, the test should *always* be `ssh-keygen -F github.com` instead of grep, and you never need know if it is hashed or not.