ansible

Fixing curl install failures with Ansible on Red Hat-derivative OSes

Over the past few months, I've noticed some of my automation failing on Red Hat-derivative OSes like Rocky Linux and AlmaLinux. The reason for this has to do with the inclusion of a curl-minimal package in some distros, which conflicts with curl if you try installing the full package.

Unfortunately, the fix for this is a little strange, and so only ends up in Ansible's dnf module, not in the more cross-compatible package module.

The error I was seeing is like:

Installing Ansible on a RISC-V computer

Ansible runs on Python, and Python runs on... well pretty much everything. Including newer RISC-V machines.

But Ansible has a lot of dependencies, and some of these dependencies have caused frustration from time to time on x86 and Arm (so having issues with a dependency is just a way of life when you enter dependency hell)... but in this case, for the past few months, I've never had luck installing Ansible from PyPI (Python's Package Index) on any RISC-V system, using pip install ansible.

I prefer installing this way (rather than compiling from source or from system packages) because it generally gets the latest version of Ansible, with an easy upgrade/downgrade path. It's also easy to add ansible to a Python requirements.txt file and install it alongside other package dependencies.

Regardless, the cryptography library, which requires a Rust compiler to build if the package is not already built for a particular system, has made it difficult to install Ansible from pip:

Newer versions of Ansible don't work with RHEL 8

Red Hat Enterprise Linux 8 is supported until 2029, and that distribution includes Python 3.6 for system python. Ansible's long been stuck between a rock and a hard place supporting certain modules (especially packaging modules like dnf/yum on RHEL and its derivatives, because the Python bindings for the packaging modules are stuck supporting system Python.

Users are getting errors like:

/bin/sh: /usr/bin/python3: No such file or directory
The module failed to execute correctly, you probably need to set the interpreter.\nSee stdout/stderr for the exact error.

...or...

SyntaxError: future feature annotations is not defined

As ansible-core evolves, they don't want to support old insecure versions of Python forever—Python 3.6 was out of security support back in 2021!.

Saying a lot while saying nothing at all about Ansible AWX

A few days ago, the post Upcoming Changes to the AWX Project came across my feed. An innocuous title, but sometimes community-impacting changes are buried in posts like this. So, as an interested Ansible user, I read through the post.

In 1,610 words, almost nothing of substance was written.

A lot about how it's not 2014 anymore, so 2014-era architecture doesn't suit AWX. Then a big bold disclaimer at the bottom:

Before we conclude, we should be clear about what will not happen.

  • We are not changing the Ansible project
  • We are not adjusting our OSS license structure

Ultimately, we need to make some changes to the way our systems work and our projects are structured. Not a rewrite but a refactoring and restructuring of how some of the core components connect and communicate with each other.

Ansible Galaxy error 'Unable to compare role versions'

Ansible Galaxy was recently updated to the 'Next Generation' (Galaxy-NG) codebase.

There are some growing pains, as a lot of Galaxy NG was built up around Collections, and Ansible role support was written into the codebase over the past year or so, after it became obvious Galaxy roles would not be deprecated.

Unfortunately, one of the major issues right now—which I'm seeing pop up in many places—is an error that occurs upon installation of Galaxy roles for any playbook (e.g. when you run ansible-galaxy install to download a role), for any role that has had a new version released in the past few weeks.

You wind up with an error like:

Docker and systemd, getting rid of dreaded 'Failed to connect to bus' error

The following error has been the bane of my existence for the past few months:

TASK [geerlingguy.containerd : Ensure containerd is started and enabled at boot.] ***
fatal: [instance]: FAILED! => {
  "changed": false,
  "cmd": "/bin/systemctl",
  "msg": "Failed to connect to bus: No such file or directory",
  "rc": 1,
  "stderr": "Failed to connect to bus: No such file or directory",
  "stderr_lines": [
    "Failed to connect to bus: No such file or directory"
  ],
  "stdout": "",
  "stdout_lines": []
}

Since I use Molecule with my Ansible roles and playbooks to test them in identical CI environments both locally and in GitHub Actions, I can maintain an identical environment inside which tests are run. And many of my roles and playbooks need to test whether systemd services are configured and run correctly.

But Docker recently switched from cgroups v1 to cgroups v2, and that started this 'Failed to connect to bus' business—systemd relied on some configuration that was easy enough to add in the past: just run your containers with these options:

Automating my Homelab with Ansible (AnsibleFest 2022)

At AnsibleFest 2022, I presented Ansible for the Homelab.

Jeff Geerling's Homelab Rack in 2022

In the presentation, I gave a tour of my homelab, highlighting it's growth from a modem and 5-port switch to a full 24U rack with a petabyte of storage and multiple 10 gigabit switches!

Then I spent some time discussing how various components are automated using Ansible, mostly using open source projects on GitHub.

Unfortunately for attendees, the room my session was in was packed, and a lot of people who wanted to see it were turned away.

Clearing Cloudflare and Nginx caches with Ansible

Since being DDoS continuously earlier this year, I've set up extra caching in front of my site. Originally I just had Nginx's proxy cache, but that topped out around 100 Mbps of continuous bandwidth and maybe 5-10,000 requests per second on my little DigitalOcean VPS.

So then I added Cloudflare's proxy caching service on top, and now I've been able to handle months with 5-10 TB of traffic (with multiple spikes of hundreds of mbps per second).

That's great, but caching comes with a tradeoff—any time I post a new article, update an old one, or a post receives a comment, it can take anywhere between 10-30 minutes before that change is reflected for end users.

I used to use Varnish, and with Varnish, you could configure cache purges directly from Drupal, so if any operation occurred that would invalidate cached content, Drupal could easily purge just that content from Varnish's cache.

apt_key deprecated in Debian/Ubuntu - how to fix in Ansible

2023 Update: Ansible now has the ansible.builtin.deb822_repository module, which can add keys and repositories in one task. It's a little more complex than the old way, and requires Ansible 2.15 or later. See some common deb822_repository examples here, for example, the Jenkins tasks below can be consolidated (though the structure of the templated vars would need reworking):

- name: Add Jenkins repo using key from URL.
  ansible.builtin.deb822_repository:
    name: jenkins
    types: [deb]
    uris: "https://pkg.jenkins.io/debian-stable"
    components: [binary]
    signed_by: https://pkg.jenkins.io/debian-stable/jenkins.io-2023.key
    state: present
    enabled: true

For many packages, like Elasticsearch, Docker, or Jenkins, you need to install a trusted GPG key on your system before you can install from the official package repository.

Using an Ansible playbook with an SSH bastion / jump host

Since I've set this up a number of times, but I just realized I've never documented it on my blog, I thought I'd finally do that.

I have a set of servers that are running on a private network. That network is connected to the Internet through a single reverse proxy / 'bastion' host.

But I still want to be able to manage the servers on the private network behind the bastion from outside.

Method 1 - Inventory vars

The first way to do it with Ansible is to describe how to connect through the proxy server in Ansible's inventory. This is helpful for a project that might be run from various workstations or servers without the same SSH configuration (the configuration is stored alongside the playbook, in the inventory).

In my Ansible project, I had an inventory file like the following: