[Short Tip] Handling “can’t concat str to bytes” error in Ansible’s uri module

Ansible Logo

When working with web services, especially REST APIs, Ansible can be of surprising help when you need to automate those, our want to integrate them into your automation.

However, today I run into a strange Python bug while I tried to use the uri module:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: can't concat str to bytes
fatal: [localhost]: FAILED! => {"changed": false, "content": "", "elapsed": 0, "msg": "Status code was -1 and not [200]: An unknown error occurred: can't concat str to bytes", "redirected": false, "status": -1, "url": "https://www.ansible.com"}

This drove me almost nuts because it happened on all kinds of machines I tested, even with Ansible’s devel upstream version. It was even independent of the service I targeted – the error happened way earlier. And a playbook to showcase this was suspiciously short and simple:

---                                 
- name: Show concat str byte error  
  hosts: localhost                  
  connection: local                 
  gather_facts: no                  
                                    
  tasks:                            
    - name: call problematic URL call
      uri:                          
        url: "https://www.ansible.com"
        method: POST                
        body_format: json
        body:                       
          name: "myngfw"

When I was about to fill an issue at Ansible’s Github page I thought again and wondered that this is too simple: I couldn’t imagine that I was the only one hitting this problem. I realized that the error had to be on my side. And thus meant that something was missing.

And indeed: the body_format option was not explicitly stated, so Ansible assumed “raw”, while my body data were provided in json format. A simple

        body_format: json

solved my problems.

Never dare to ask me how long it took me to figure this one out. And that from the person who write and entire how to about how to provide payload with the Ansible URI module….

Advertisements

[Howto] Three commands to update Fedora

Fedora Logo Bubble

These days using Fedora Workstation there are multiple commands necessary to update the entire software on the system: not everything is installed as RPMs anymore – and some systems hardly use RPMs at all anyway.

Background

In the past all updates of a Fedora system were easily applied with one single command:

$ yum update

Later on, yum was replaced by DNF, but the idea stayed the same:

$ dnf update

Simple, right? But not these days: Fedora recently added capabilities to install and manage code via other ways: Flatpak packages are not managed by DNF. Also, many firmware updates are managed via the dedicated management tool fwupd. And lost but not least, Fedora Silverblue does not support DNF at all.

GUI solution Gnome Software – one tool to rule them all…

To properly update your Fedora system you have to check multiple sources. But before we dive into detailed CLI commands there is a simple way to do that all in one go: The Gnome Software tool does that for you. It checks all sources and just provides the available updates in its single GUI:

The above screenshot highlights that Gnome Software just shows available updates and can manage those. The user does not even know where those come from.

If we have a closer look at the configured repositories in Gnome Software we see that it covers main Fedora repositories, 3rd party repositories, flatpaks, firmware and so on:

Using the GUI alone is sufficient to take care of all update routines. However, if you want to know and understand what happens underneath it is good to know the separate CLI commands for all kinds of software resources. We will look at them in the rest of the post.

System packages

Each and every system is made up at least of a basic set of software. The Kernel, a system for managing services like systemd, core libraries like libc and so on. With Fedora used as a Workstation system there are two ways to manage system packages, because there are two totally different spins of Fedora: the normal one, traditionally based on DNF and thus comprised out of RPM packages, and the new Fedora Silverblue, based on immutable ostree system images.

Traditional: DNF

Updating a RPM based system via DNF is easy:

$ dnf upgrade
[sudo] password for liquidat: 
Last metadata expiration check: 0:39:20 ago on Tue 18 Jun 2019 01:03:12 PM CEST.
Dependencies resolved.
================================================================================
 Package                      Arch       Version             Repository    Size
================================================================================
Installing:
 kernel                       x86_64     5.1.9-300.fc30      updates       14 k
 kernel-core                  x86_64     5.1.9-300.fc30      updates       26 M
 kernel-modules               x86_64     5.1.9-300.fc30      updates       28 M
 kernel-modules-extra         x86_64     5.1.9-300.fc30      updates      2.1 M
[...]

This is the traditional way to keep a Fedora system up2date. It is used for years and well known to everyone.

And in the end it is analogue to the way Linux distributions are kept up2date for ages now, only the command differs from system to system (apt-get, etc.)

Silverblue: OSTree

With the recent rise of container technologies the idea of immutable systems became prominent again. With Fedora Silverblue there is an implementation of that approach as a Fedora Workstation spin.

[Unlike] other operating systems, Silverblue is immutable. This means that every installation is identical to every other installation of the same version. The operating system that is on disk is exactly the same from one machine to the next, and it never changes as it is used.

Silverblue’s immutable design is intended to make it more stable, less prone to bugs, and easier to test and develop. Finally, Silverblue’s immutable design also makes it an excellent platform for containerized apps as well as container-based software development development. In each case, apps and containers are kept separate from the host system, improving stability and reliability.

https://docs.fedoraproject.org/en-US/fedora-silverblue/

Since we are dealing with immutable images here, another tool to manage them is needed: OSTree. Basically OSTree is a set of libraries and tools which helps to manage images and snapshots. The idea is to provide a basic system image to all, and all additional software on top in sandboxed formats like Flatpak.

Unfortunately, not all tools can be packages as flatpak: especially command line tools are currently hardly usable at all as flatpak. Thus there is a way to install and manage RPMs on top of the OSTree image, but still baked right into it: rpm-ostreee. In fact, on Fedora Silverblue, all images and RPMs baked into it are managed by it.

Thus updating the system and all related RPMs needs the command rpm-ostreee update:

$ rpm-ostree update
⠂ Receiving objects: 98% (4653/4732) 4,3 MB/s 129,7 MB 
Receiving objects: 98% (4653/4732) 4,3 MB/s 129,7 MB... done
Checking out tree 209dfbe... done
Enabled rpm-md repositories: fedora-cisco-openh264 rpmfusion-free-updates rpmfusion-nonfree fedora rpmfusion-free updates rpmfusion-nonfree-updates
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2019-03-21T15:16:16Z
rpm-md repo 'rpmfusion-free-updates' (cached); generated: 2019-06-13T10:31:33Z
rpm-md repo 'rpmfusion-nonfree' (cached); generated: 2019-04-16T21:53:39Z
rpm-md repo 'fedora' (cached); generated: 2019-04-25T23:49:41Z
rpm-md repo 'rpmfusion-free' (cached); generated: 2019-04-16T20:46:20Z
rpm-md repo 'updates' (cached); generated: 2019-06-17T18:09:33Z
rpm-md repo 'rpmfusion-nonfree-updates' (cached); generated: 2019-06-13T11:00:42Z
Importing rpm-md... done
Resolving dependencies... done
Checking out packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Freed: 50,2 MB (pkgcache branches: 0)
Upgraded:
  gcr 3.28.1-3.fc30 -> 3.28.1-4.fc30
  gcr-base 3.28.1-3.fc30 -> 3.28.1-4.fc30
  glib-networking 2.60.2-1.fc30 -> 2.60.3-1.fc30
  glib2 2.60.3-1.fc30 -> 2.60.4-1.fc30
  kernel 5.1.8-300.fc30 -> 5.1.9-300.fc30
  kernel-core 5.1.8-300.fc30 -> 5.1.9-300.fc30
  kernel-devel 5.1.8-300.fc30 -> 5.1.9-300.fc30
  kernel-headers 5.1.8-300.fc30 -> 5.1.9-300.fc30
  kernel-modules 5.1.8-300.fc30 -> 5.1.9-300.fc30
  kernel-modules-extra 5.1.8-300.fc30 -> 5.1.9-300.fc30
  plymouth 0.9.4-5.fc30 -> 0.9.4-6.fc30
  plymouth-core-libs 0.9.4-5.fc30 -> 0.9.4-6.fc30
  plymouth-graphics-libs 0.9.4-5.fc30 -> 0.9.4-6.fc30
  plymouth-plugin-label 0.9.4-5.fc30 -> 0.9.4-6.fc30
  plymouth-plugin-two-step 0.9.4-5.fc30 -> 0.9.4-6.fc30
  plymouth-scripts 0.9.4-5.fc30 -> 0.9.4-6.fc30
  plymouth-system-theme 0.9.4-5.fc30 -> 0.9.4-6.fc30
  plymouth-theme-spinner 0.9.4-5.fc30 -> 0.9.4-6.fc30
Run "systemctl reboot" to start a reboot

Desktop applications: Flatpak

Installing software – especially desktop related software – on Linux is a major pain for distributors, users and developers alike. One attempt to solve this is the flatpak format, see also Flatpak – a solution to the Linux desktop packaging problem.

Basically Flatpak is a distribution independent packaging format targeted at desktop applications. It does come along with sandboxing capabilities and the packages usually have hardly any dependencies at all besides a common set provided to all of them.

Flatpak also provide its own repository format thus Flatpak packages can come with their own repository to be released and updated independently of a distribution release cycle.

In fact, this is what happens with the large Flatpak community repository flathub.org: all packages installed from there can be updated via flathub repos fully independent from Fedora – which also means independent from Fedora security teams, btw….

So Flatpak makes developing and distributing desktop programs much easier – and provides a tool for that. Meet flatpak!

$ flatpak update
Looking for updates…

        ID                                            Arch              Branch            Remote            Download
 1. [✓] org.freedesktop.Platform.Locale               x86_64            1.6               flathub            1.0 kB / 177.1 MB
 2. [✓] org.freedesktop.Platform.Locale               x86_64            18.08             flathub            1.0 kB / 315.9 MB
 3. [✓] org.libreoffice.LibreOffice.Locale            x86_64            stable            flathub            1.0 MB / 65.7 MB
 4. [✓] org.freedesktop.Sdk.Locale                    x86_64            1.6               flathub            1.0 kB / 177.1 MB
 5. [✓] org.freedesktop.Sdk.Locale                    x86_64            18.08             flathub            1.0 kB / 319.3 MB

Firmware

And there is firmware: the binary blobs that keep some of our hardware running and which is often – unfortunately – closed source.

A lot of Kernel related firmware is managed as system packages and thus part of the system image or packaged via RPM. But device related firmware (laptops, docking stations, and so on) is often only provided in Windows executable formats and difficult to handle.

Luckily, recently the Linux Vendor Firmware Service (LVFS) gained quite some traction as the default way for many vendors to make their device firmware consumable to Linux users:

The Linux Vendor Firmware Service is a secure portal which allows hardware vendors to upload firmware updates.

This site is used by all major Linux distributions to provide metadata for clients such as fwupdmgr and GNOME Software.

https://fwupd.org/

End users can take advantage of this with a tool dedicated to identify devices and manage the necessary firmware blobs for them: meet fwupdmgr!

$ fwupdmgr update                                                                                                                                                         No upgrades for 20L8S2N809 System Firmware, current is 0.1.31: 0.1.25=older, 0.1.26=older, 0.1.27=older, 0.1.29=older, 0.1.30=older
No upgrades for UEFI Device Firmware, current is 184.65.3590: 184.55.3510=older, 184.60.3561=older, 184.65.3590=same
No upgrades for UEFI Device Firmware, current is 0.1.13: 0.1.13=same
No releases found for device: Not compatible with bootloader version: failed predicate [BOT01.0[0-3]_* regex BOT01.04_B0016]

In the above example there were no updates available – but multiple devices are supported and thus were checked.

Forgot something? Gnome extensions…

The above examples cover the major ways to managed various bits of code. But they do not cover all cases, so for the sake of completion I’d like to highlight a few more here.

For example, Gnome extensions can be installed as RPM, but can also be installed via extensions.gnome.org. In that case the installation is done via a browser plugin.

The same is true for browser plugins themselves: they can be installed independently and extend the usage of the web browser. Think of the Chrome Web Store here, or Firefox Add-ons.

Conclusion

Keeping a system up2date was easier in the past – with a single command. However, at the same time that meant that those systems were limited by what RPM could actually deliver.

With the additional ways to update systems there is an additional burden on the system administrator, but at the same time there is much more software and firmware available these ways – code which was not available in the old RPM times at all. And with Silverblue an entirely new paradigm of system management is there – again something which would not have been the case with RPM at all.

At the same time it needs to be kept in mind that these are pure desktop systems – and there Gnome Software helps by being the single pane of glas.

So I fully understand if some people are a bit grumpy about the new needs for multiple tools. But I think the advantages by far outweigh the disadvantages.

Ansible and Ansible Tower special variables

Ansible Logo

Ansible and Ansible Tower provide a powerful variable system. At the same time, there are some variables reserved to one or the other, which cannot be used by others, but can be helpful. This post lists all reserved and magic variables and also important keywords.

Ansible Variables

Variables in Ansible are a powerful tool to influence and control your automation execution. In fact, I’ve dedicated a fare share of posts to the topic over the years:

The official documentation of Ansible variables is also quite comprehensive.

The variable system is in fact so powerful that Ansible uses it itself. There are certain variables which are reserved, the so called magic variables.

The given documentation lists many of them – but is missing the Tower ones. For that reason this post list all magic variables in Ansible and Ansible Tower with references to more information.

Note that the variables and keywords might be different for different Ansible versions. The lists provided here are for Ansible 2.8 which is the current release and als shipped in Fedora – and Tower 3.4/3.5.

Reserved & Magic Variables

Magic Variables

The following list shows true magic variables. They are reserved internally and are overwritten by Ansible if needed. A “(D)” highlights that the variable is deprecated.

ansible_check_mode
ansible_dependent_role_names
ansible_diff_mode
ansible_forks
ansible_inventory_sources
ansible_limit
ansible_loop
ansible_loop_var
ansible_play_batch (D)
ansible_play_hosts (D)
ansible_play_hosts_all
ansible_play_role_names
ansible_playbook_python
ansible_role_names
ansible_run_tags
ansible_search_path
ansible_skip_tags
ansible_verbosity
ansible_version
group_names
groups
hostvars
inventory_hostname
inventory_hostname_short
inventory_dir
inventory_file
omit
play_hosts (D)
ansible_play_name
playbook_dir
role_name
role_names
role_path

Source: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html

Facts

Facts are not magic variables because they are not internal. But they are collected during facts gathering or execution of the setup module, so it helps to keep them in mind. There are two “main” variables related to facts, and a lot of other variables depending on what the managed node has to offer. Since those are different from system to system, it is tricky to list them all. But they can be easily identified by the leading ansible_.

ansible_facts
ansible_local
ansible_*

Sources: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html & docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html

Connection Variables

Connection variables control the way Ansible connects to target machines: what connection plugin to use, etc.

ansible_become_user
ansible_connection
ansible_host
ansible_python_interpreter
ansible_user

Soource: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html

Ansible Tower

Tower has its own set of magic variables which are used internally to control the execution of the automation. Note that those variables can optionally start with awx_ instead of tower_.

tower_job_id
tower_job_launch_type
tower_job_template_id
tower_job_template_name
tower_user_id
tower_user_name
tower_schedule_id
tower_schedule_name
tower_workflow_job_id
tower_workflow_job_name

Source: docs.ansible.com/ansible-tower/latest/html/userguide/job_templates.html

Keywords

Keywords are strictly speaking not variables. In fact, you can even set a variable named as a key word. Instead, they are the parts of a playbook that make a playbook work: think of the keys hosts, tasks, name or even the parameters of a module.

It is just important to keep those keywords in mind – and it certainly helps when you name your variables in a way that they are not mixed up with keywords by chance.

The following lists shows all keywords by where they can appear. Note that some keywords are listed multiple times because they can be used at different places.

Play

any_errors_fatal
become
become_flags
become_method
become_user
check_mode
collections
connection
debugger
diff
environment
fact_path
force_handlers
gather_facts
gather_subset
gather_timeout
handlers
hosts
ignore_errors
ignore_unreachable
max_fail_percentage
module_defaults
name
no_log
order
port
post_tasks
pre_tasks
remote_user
roles
run_once
serial
strategy
tags
tasks
vars
vars_files
vars_prompt

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Role

any_errors_fatal
become
become_flags
become_method
become_user
check_mode
collections
connection
debugger
delegate_facts
delegate_to
diff
environment
ignore_errors
ignore_unreachable
module_defaults
name
no_log
port
remote_user
run_once
tags
vars
when

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Block

always
any_errors_fatal
become
become_flags
become_method
become_user
block
check_mode
collections
connection
debugger
delegate_facts
delegate_to
diff
environment
ignore_errors
ignore_unreachable
module_defaults
name
no_log
port
remote_user
rescue
run_once
tags
vars
when

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Task

action
any_errors_fatal
args
async
become
become_flags
become_method
become_user
changed_when
check_mode
collections
connection
debugger
delay
delegate_facts
delegate_to
diff
environment
failed_when
ignore_errors
ignore_unreachable
local_action
loop
loop_control
module_defaults
name
no_log
notify
poll
port
register
remote_user
retries
run_once
tags
until
vars
when
with_<lookup_plugin&gt;

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

[Howto] Using include directive with ssh client configuration

A SSH client configuration makes accessing servers much easier and more convenient. Until recently the configuration was done in one single file which could be problematic. But newer versions support includes to read configuration from multiple places.

SSH is the default way to access servers remotely – Linux and other UNIX systems, and since recently Windows as well.

One feature of the OpenSSH client is to configure often used parameters for SSH connections in a central config file, ~/.ssh/config. This comes in especially handy when multiple remote servers require different parameters: varying ports, other user names, different SSH keys, and so on. It also provides the possibility to define aliases for host names to avoid the necessity to type in the FQDN each time. Since such a configuration is directly read by the SSH client other tools wich are using the SSH client in the background – like Ansible – can benefit from the configuration as well.

A typical configuration of such a config file can look like this:

Host webapp
    HostName webapp.example.com
    User mycorporatelogin
    IdentityFile ~/.ssh/id_corporate
    IdentitiesOnly yes
Host github
    HostName github.com
    User mygithublogin
    IdentityFile ~/.ssh/id_ed25519
Host gitlab
    HostName gitlab.corporate.com
    User mycorporatelogin
    IdentityFile ~/.ssh/id_corporate
Host myserver
    HostName myserver.example.net
    Port 1234
    User myuser
    IdentityFile ~/.ssh/id_ed25519
Host aws-dev
    HostName 12.24.33.66
    User ec2-user
    IdentityFile ~/.ssh/aws.pem
    IdentitiesOnly yes
    StrictHostKeyChecking no
Host azure-prod
    HostName 4.81.234.19
    User azure-prod
    IdentityFile ~/.ssh/azure_ed25519

While this is very handy and helps a lot to maintain sanity even with very different and strange SSH configurations, a single huge file is hard to manage.

Cloud environments for example change constantly, so it makes sense to update/rebuild the configuration regularly. There are scripts out there, but they either just overwrite existing configuration, or do entirely work on an extra file which is referenced in each SSH client call with ssh -F .ssh/aws-config, or they require to mark sections in the .ssh/config like "### AZURE-SSH-CONFIG BEGIN ###". All attempts are either clumsy or error prone.

Another use case is where parts of the SSH configuration is managed by configuration management systems or by software packages for example by a company – again that requires changes to a single file and might alter or remove existing configuration for your others services and servers. After all, it is not uncommon to use your more-or-less private Github account for your company work so that you have mixed entries in your .ssh/config.

The underneath problem of managing more complex software configurations in single files is not unique to OpenSSH, but more or less common across many software stacks which are configured in text files. Recently it became more and more common to write software in a way that configuration is not read as a single file, but that all files from a certain directory are read in. Examples for this include:

  • sudo with the directory /etc/sudoers.d/
  • Apache with /etc/httpd/conf.d
  • Nginx with /etc/nginx/conf.d/
  • Systemd with /etc/systemd/system.conf.d/*
  • and so on…

Initially such an approach was not possible with the SSH client configuration in OpenSSH but there was a bug reported even including a patch quite some years ago. Luckily, almost three years ago OpenSSH version 7.3 was released and that version did come with the solution:

* ssh(1): Add an Include directive for ssh_config(5) files.


https://www.openssh.com/txt/release-7.3

So now it is possible to add one or even multiple files and directories from where additional configuration can be loaded.

Include

Include the specified configuration file(s). Multiple pathnames may be specified and each pathname may contain glob(7) wildcards and, for user configurations, shell-like `~’ references to user home directories. Files without absolute paths are assumed to be in ~/.ssh if included in a user configuration file or /etc/ssh if included from the system configuration file. Include directive may appear inside a Match or Host block to perform conditional inclusion.

https://www.freebsd.org/cgi/man.cgi?ssh_config(5)

The following .ssh/config file defines a sub-directory from where additional configuration can be read in:

$ cat ~/.ssh/config 
Include ~/.ssh/conf.d/*

Underneath ~/.ssh/conf.d there can be additional files, each containing one or more host definitions:

$ ls ~/.ssh/conf.d/
corporate.conf
github.conf
myserver.conf
aws.conf
azure.conf
$ cat ~/.ssh/conf.d/aws.conf
Host aws-dev
    HostName 12.24.33.66
    User ec2-user
    IdentityFile ~/.ssh/aws.pem
    IdentitiesOnly yes
    StrictHostKeyChecking no

This feature made managing SSH configuration for me much easier, and I only have few use cases and mainly require it to keep a simple overview over things. For more flexible (aka cloud based) setups this is crucial and can make things way easier.

Note that the additional config files should only contain host definitions! General SSH configuration should be inside ~/.ssh/config and should be before the include directive: any configuration provided after a “Host” keyword is interpreted as part of that exact host definition – until the next host block or until the next “Match” keyword.

[Howto] Adding SSH keys to Ansible Tower via tower-cli [Update]

Ansible Logo

The tool tower-cli is often used to pre-configure Ansible Tower in a scripted way. It provides a convenient way to boot-strap a Tower configuration. But adding SSH keys as machine credentials is far from easy.

Boot-strapping Ansible Tower can become necessary for testing and QA environments where the same setup is created and destroyed multiple times. Other use cases are when multiple Tower installations need to be configured in the same way or share at least a larger part of the configuration.

One of the necessary tasks in such setups is to create machine credentials in Ansible Tower so that Ansible is able to connect properly to a target machine. In a Linux environment, this is often done via SSH keys.

However, tower-cli calls the Tower API in the background – and JSON POST data need to be in one line. But SSH keys come in multiple lines, so providing the file via a $(cat ssh_file) does not work:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs="{\"username\":\"ansible\",\"ssh_key_data\":\"$(cat .ssh/id_rsa)\",\"become_method\":\"sudo\"}"

Multiple workarounds can be found on the net, like manually editing the file to remove the new lines or creating a dedicated variables file containing the SSH key. There is even a bug report discussing that.

But for my use case I needed to read an existing SSH file directly, and did not want to add another manual step or create an additional variables file. The trick is a rather complex piece of SED:

$(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' /home/ansible/.ssh/id_rsa)

This basically reads in the entire file (instead of just line by line), removes the new lines and replaces them with \n. To be precise:

  • we first create a label "a"
  • append the next line to the pattern space ("N")
  • find out if this is the last line or not ("$!"), and if not
  • branch back to label a ("ba")
  • after that, we search for the new lines ("\r{0,1}")
  • and replace them with the string for a new line, "\n"

Note that this needs to be accompanied with proper line endings and quotation marks. The full call of tower-cli with the sed command inside is:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs="{\"username\":\"ansible\",\"ssh_key_data\":\"$(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' /home/ansible/.ssh/id_rsa)\n\",\"become_method\":\"sudo\"}"

Note all the escaped quotations marks.

Update

Another way to add the keys is to provide yaml in the shell command:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs='username: ansible
become_method: sudo
ssh_key_data: |
'"$(sed 's/^/    /' /home/ansible/.ssh/id_rsa)"

This method is appealing since the corresponding sed call is a little bit easier to understand. But make sure to indent the variables exactly like shown above.

Thanks to the @ericzolf of the Red Hat Automation Community of Practice hinting me to that solution. If you are interested in the Red Hat Communities of Practice, you can read more about them in the blog “Communities of practice: Straight from the open source”.

Of debugging Ansible Tower and underlying cloud images

Ansible Logo

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Background – Isolated Nodes

Ansible Tower has a nice feature called “isolated nodes”. Those are dedicated Tower instances which can manage nodes in separated environments – basically an Ansible Tower Proxy.

An Isolated Node is an Ansible Tower node that contains a small piece of software for running playbooks locally to manage a set of infrastructure. It can be deployed behind a firewall/VPC or in a remote datacenter, with only SSH access available. When a job is run that targets things managed by the isolated node, the job and its environment will be pushed to the isolated node over SSH, where it will run as normal.

Ansible Tower Feature Spotlight: Instance Groups and Isolated Nodes

Isolated nodes are especially handy when you setup your automation in security sensitive environments. Think of DMZs here, of network separation and so on.

I was fooling around with a clustered Tower installation on RHEL 7 VMs in a cloud environment when I run into trouble though.

My problem – Isolated node unavailable

Isolated nodes – like instance groups – have a status inside Tower: if things are problematic, they are marked as unavailable. And this is what happened with my instance isonode.remote.example.com running in my lab environment:

Ansible Tower showing an instance node as unavailable

I tried to turn it “off” and “on” again with the button in the control interface. It made the node available, it was even able to executed jobs – but it became quickly unavailable soon after.

Analysis

So what happened? The Tower logs showed a Python error:

# tail -f /var/log/tower/tower.log
fatal: [isonode.remote.example.com]: FAILED! =&gt; {"changed": false,
"module_stderr": "Shared connection to isonode.remote.example.com
closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n
File \"/var/lib/awx/.ansible/tmp/ansible-tmp-1552400585.04
-60203645751230/AnsiballZ_awx_capacity.py\", line 113, in <module&gt;\r\n
_ansiballz_main()\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 105, in
_ansiballz_main\r\n    invoke_module(zipped_mod, temp_path,
ANSIBALLZ_PARAMS)\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 48, in
invoke_module\r\n    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 74, in
<module&gt;\r\n  File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\",
line 60, in main\r\n  File
\"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 27, in
get_cpu_capacity\r\nAttributeError: 'module' object has no attribute
'cpu_count'\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}

PLAY RECAP *********************************************************************
isonode.remote.example.com : ok=0    changed=0    unreachable=0    failed=1  

Apparently a Python function was missing. If we check the code we see that indeed in line 27 of file awx_capacity.py the function psutil.cpu_count() is called:

def get_cpu_capacity():
    env_forkcpu = os.getenv('SYSTEM_TASK_FORKS_CPU', None)
    cpu = psutil.cpu_count()

Support for this function was added in version 2.0 of psutil:

2014-03-10
Enhancements
424: [Windows] installer for Python 3.X 64 bit.
427: number of logical and physical CPUs (psutil.cpu_count()).

psutil history

Note the date here: 2014-03-10 – pretty old! I check the version of the installed package, and indeed the version was pre-2.0:

$ rpm -q --queryformat '%{VERSION}\n' python-psutil
1.2.1

To be really sure and also to ensure that there was no weird function backporting, I checked the function call directly on the Tower machine:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
&gt;&gt;&gt; import inspect
&gt;&gt;&gt; import psutil as module
&gt;&gt;&gt; functions = inspect.getmembers(module, inspect.isfunction)
&gt;&gt;&gt; functions
[('_assert_pid_not_reused', <function _assert_pid_not_reused at
0x7f9eb10a8d70&gt;), ('_deprecated', <function deprecated at 0x7f9eb38ec320&gt;),
('_wraps', <function wraps at 0x7f9eb414f848&gt;), ('avail_phymem', <function
avail_phymem at 0x7f9eb0c32ed8&gt;), ('avail_virtmem', <function avail_virtmem at
0x7f9eb0c36398&gt;), ('cached_phymem', <function cached_phymem at
0x7f9eb10a86e0&gt;), ('cpu_percent', <function cpu_percent at 0x7f9eb0c32320&gt;),
('cpu_times', <function cpu_times at 0x7f9eb0c322a8&gt;), ('cpu_times_percent',
<function cpu_times_percent at 0x7f9eb0c326e0&gt;), ('disk_io_counters',
<function disk_io_counters at 0x7f9eb0c32938&gt;), ('disk_partitions', <function
disk_partitions at 0x7f9eb0c328c0&gt;), ('disk_usage', <function disk_usage at
0x7f9eb0c32848&gt;), ('get_boot_time', <function get_boot_time at
0x7f9eb0c32a28&gt;), ('get_pid_list', <function get_pid_list at 0x7f9eb0c4b410&gt;),
('get_process_list', <function get_process_list at 0x7f9eb0c32c08&gt;),
('get_users', <function get_users at 0x7f9eb0c32aa0&gt;), ('namedtuple',
<function namedtuple at 0x7f9ebc84df50&gt;), ('net_io_counters', <function
net_io_counters at 0x7f9eb0c329b0&gt;), ('network_io_counters', <function
network_io_counters at 0x7f9eb0c36500&gt;), ('phymem_buffers', <function
phymem_buffers at 0x7f9eb10a8848&gt;), ('phymem_usage', <function phymem_usage at
0x7f9eb0c32cf8&gt;), ('pid_exists', <function pid_exists at 0x7f9eb0c32140&gt;),
('process_iter', <function process_iter at 0x7f9eb0c321b8&gt;), ('swap_memory',
<function swap_memory at 0x7f9eb0c327d0&gt;), ('test', <function test at
0x7f9eb0c32b18&gt;), ('total_virtmem', <function total_virtmem at
0x7f9eb0c361b8&gt;), ('used_phymem', <function used_phymem at 0x7f9eb0c36050&gt;),
('used_virtmem', <function used_virtmem at 0x7f9eb0c362a8&gt;), ('virtmem_usage',
<function virtmem_usage at 0x7f9eb0c32de8&gt;), ('virtual_memory', <function
virtual_memory at 0x7f9eb0c32758&gt;), ('wait_procs', <function wait_procs at
0x7f9eb0c32230&gt;)]

Searching for a package origin

So how to solve this issue? My first idea was to get this working by updating the entire code part to the multiprocessor lib:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> cpu = multiprocessing.cpu_count()
>>> cpu
4

But while I was filling a bug report I wondered why RHEL shipped such an ancient library. After all, RHEL 7 was released in June 2014, and psutil had cpu_count available since early 2014! And indeed, a quick search for the package via the Red Hat package search showed a weird result: python-psutil was never part of base RHEL 7! It was only shipped as part of some very, very old OpenStack channels:

access.redhat.com package search, results for python-psutil

Newer OpenStack channels in fact come along with newer versions of python-psutil.

So how did this outdated package end up on this RHEL 7 image? Why was it never updated?

The cloud image is to blame! The package was installed on it – most likely during the creation of the image: python-psutil is needed for OpenStack Heat, so I assume that these RHEL 7 images where once created via OpenStack and then used as the default image in this demo environment.

And after the initial creation of the image the Heat packages were forgotten. In the meantime the image was updated to newer RHEL versions, snapshots were created as new defaults and so on. But since the package in question was never part of the main RHEL repos, it was never changed or removed. It just stayed there. Waiting, apparently, for me 😉

Conclusion

This issue showed me how tricky cloud images can be. Think about your own cloud images: have you really checked all all of them and verified that no package, no start up script, no configuration was changed from the Linux distribution vendor’s base setup?

With RPMs this is still manageable, you can track if packages are installed which are not present in the existing channels. But did someone install something with pip? Or any other way?

Take my case: an outdated version of a library was called instead of a much, much more recent one. If there would have been a serious security issue with the library in the meantime, I would have been exposed although my update management did not report any library to be updated.

I learned my lesson to be more critical with cloud images, checking them in more detail in the future to avoid having nasty surprises during production. And I can just recommend that you do that as well.

[Howto] Launch traefik as a docker container in a secure way

Traefik is a great reverse proxy solution, and a perfect tool to direct traffic in container environments. However, to do that, it needs access to docker – and that is very dangerous and must be tightly secured!

The problem: access to the docker socket

Containers offer countless opportunities to improve the deployment and management of services. However, having multiple containers on one system, often re-deploying them on the fly, requires a dynamic way of routing traffic to them. Additionally, there might be reasons to have a front end reverse proxy to sort the traffic properly anyway.

In comes traefik – “the cloud native edge router”. Among many supported backends it knows how to listen to docker and create dynamic routes on the fly when new containers come up.

To do so traefik needs access to the docker socket. Many people decide to just provide that as a volume to traefik. This usually does not work because SELinux prevents it for a reason. The apparent workaround for many is to run traefik in a privileged container. But that is a really bad idea:

Docker currently does not have any Authorization controls. If you can talk to the docker socket or if docker is listening on a network port and you can talk to it, you are allowed to execute all docker commands. […]
At which point you, or any user that has these permissions, have total control on your system.

http://www.projectatomic.io/blog/2014/09/granting-rights-to-users-to-use-docker-in-fedora/

The solution: a docker socket proxy

But there are ways to securely provide traefik the access it needs – without exposing too much permissions. One way is to provide limited access to the docker socket via tcp via another container which cannot be reached from the outside that easily.

Meet Tecnativa’s docker-socket-proxy:

What?
This is a security-enhaced proxy for the Docker Socket.
Why?
Giving access to your Docker socket could mean giving root access to your host, or even to your whole swarm, but some services require hooking into that socket to react to events, etc. Using this proxy lets you block anything you consider those services should not do.

https://github.com/Tecnativa/docker-socket-proxy/blob/master/README.md

It is a container which connects to the docker socket and exports the API features in a secured and configurable way via TCP. At container startup it is configured with booleans to which API sections access is granted.

So basically you set up a docker proxy to support your proxy for docker containers. Well…

How to use it

The docker socket proxy is a container itself. Thus it needs to be launched as a privileged container with access to the docker socket. Also, it must not publish any ports to the outside. Instead it should run on a dedicated docker network shared with the traefik container. The Ansible code to launch the container that way is for example:

- name: ensure privileged docker socket container
  docker_container:
    name: dockersocket4traefik
    image: tecnativa/docker-socket-proxy
    log_driver: journald
    env:
      CONTAINERS: 1
    state: started
    privileged: yes
    exposed_ports:
      - 2375
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:z"
    networks:
      - name: dockersocket4traefik_nw

Note the env right in the middle: that is where the exported permissions are configured. CONTAINERS: 1  provides access to container relevant information. There are also SERVICES: 1 and SWARM: 1 to manage access to docker services and swarm.

Traefik needs to have access to the same network. Also, the traefik configuration needs to point to the docker container via tcp:

[docker]
endpoint = "tcp://dockersocket4traefik:2375"

Conclusion

This setup works surprisingly easy. And it allows traefik to access the docker socket for the things it needs without exposing critical permissions to take over the system. At the same time, full access to the docker socket is restricted to a non-public container, which makes it harder for attackers to exploit it.

If you have a simple container setup and use Ansible to start and stop the containers, I’ve written a role to get the above mentioned setup running.