[Short Tip] Exit bad/broken/locked ssh sessions

Sometimes it happens that SSH connections lock up. For example due to weird SSH server configuration or bad connectivity on your side, suddenly your SSH connection is broken. You cannot send any more comments via the SSH connection. The terminal just doesn’t react.

And that includes the typical exit commands: Ctrl+z or Ctrl+d are not working anymore. So you are only left with the choice to close the terminal – right? In fact, no, you can just exist the SSH session.

The trick is:
Enter+~+.

Why does this work? Because it is one of the defined escape sequences:

The supported escapes (assuming the default ‘~’) are:
~.Disconnect.
~^Z Background ssh.
~# List forwarded connections.
~& Background ssh at logout when waiting for forwarded connection / X11 sessions to terminate.

https://man.openbsd.org/ssh#EXIT_STATUS

To many of you this is probably nothing new – but I never knew that, even after years of using SSH on a daily base, so I had the urge to share this.

Advertisements

[Howto] Rebasing Fedora Silverblue – even from Rawhide to Fedora 30

silverblue-logo

I recently switched to Fedora Silverblue, the immutable desktop version of Fedora. With Silverblue, rebasing is easy – even when I had to downgrade from Rawhide to a stable release!

Fedora Silverblue is an interesting attempt at providing an immutable operating system – targeted at desktop users. Using it on a daily base helps me to get more familiar with the toolset and the ideas behind it which are also used in other projects like Fedora Atomic or Fedora CoreOS.

When Fedora 30 was released I decided to give it a try, went to the Silverblue download page – and unfortunately picked the wrong image: the one for Rawhide.

Rawhide is the rolling release/development branch of Fedora, and is way too unstable for my daily usage. But I only discovered this when I had it already installed and spent quite some time on customizing it.

But Silverblue is an immutable distribution, so switching to a previous version should be no problem, right? And in fact, yes, it is very easy!

Silverblue supports rebasing, switching between different branches. To get a list of available branches, first list the name of the remote source, and afterwards query the available references/branches:

[liquidat@heisenberg ~]$ ostree remote list
fedora
[liquidat@heisenberg ~]$ ostree remote refs fedora
[...]
fedora:fedora/30/x86_64/silverblue
fedora:fedora/30/x86_64/testing/silverblue
fedora:fedora/30/x86_64/updates/silverblue
[...]
fedora:fedora/rawhide/x86_64/silverblue
fedora:fedora/rawhide/x86_64/workstation

The list is quite long, and does list multiple operating system versions.

In my case I was on the rawhide branch and tried to rebase to version 30. That however failed:

[liquidat@heisenberg ~]$ rpm-ostree rebase fedora/30/x86_64/silverblue
1 metadata, 0 content objects fetched; 569 B transferred in 4 seconds
Checking out tree 7420c3a... done
Enabled rpm-md repositories: rawhide
Updating metadata for 'rawhide'... done
rpm-md repo 'rawhide'; generated: 2019-05-13T08:01:20Z
Importing rpm-md... done
⠁  
Forbidden base package replacements:
  libgcc 9.1.1-1.fc30 -> 9.1.1-1.fc31 (rawhide)
  libgomp 9.1.1-1.fc30 -> 9.1.1-1.fc31 (rawhide)
This likely means that some of your layered packages have requirements on newer or older versions of some base packages. `rpm-ostree cleanup -m` may help. For more details, see: https://githResolving dependencies... done
error: Some base packages would be replaced

The problem was that I had installed additional packages in the meantime. Note that there are multiple ways to install packages in Silverblue:

– Flatpak apps: this is the primary way that apps get installed on Silverblue.
– Containers: which can be installed and used for development purposes.
– Toolbox containers: a special kind of container that are tailored to be used as a software development environment.

The other method of installing software on Silverblue is package layering. This is different from the other methods, and goes against the general principle of immutability. Package layering adds individual packages to the Silverblue system, and in so doing modifies the operating system.

https://docs.fedoraproject.org/en-US/fedora-silverblue/getting-started/

While Flatpak in itself is a pretty cool solution to the Linux desktop packaging problem it usually comes with sandboxed environments, making it less usable for integrated tools and libraries.

For that reason it is still possible to install RPMs on top, in a layered form. This however might result in dependency issues when the underlying image is supposed to change.

This is exactly what happened here: I had additional software installed, which depended on some specific versions of the underlying image. So I had to remove those:

[liquidat@heisenberg ~]$ rpm-ostree uninstall fedora-workstation-repositories golang pass vim zsh

Afterwards it was easy to rebase the entire system onto a different branch or – in my case – a different version of the same branch:

[liquidat@heisenberg ~]$ rpm-ostree rebase fedora/30/x86_64/silverblue
1 metadata, 0 content objects fetched; 569 B transferred in 2 seconds
Staging deployment... done
Freed: 47,4 MB (pkgcache branches: 0)
  liberation-fonts-common 1:2.00.3-3.fc30 -> 1:2.00.5-1.fc30
  [...]
Downgraded:
  GConf2 3.2.6-26.fc31 -> 3.2.6-26.fc30
  [...]
Removed:
  fedora-repos-rawhide-31-0.2.noarch
  [...]
Added:
  PackageKit-gstreamer-plugin-1.1.12-5.fc30.x86_64
  [...]
Run "systemctl reboot" to start a reboot

And that’s it – after a short systemctl reboot the machine was back, running Fedora 30. And since ostree works with images the reboot went smooth and quick, long sessions of installing/updating software during shutdown or reboot are not necessary with such a setup!

In conclusion I must say that I am pretty impressed – both by the concept as well as the execution on the concept, how well Silverblue works in a day to day situation even as a desktop. My next step will be to test it on a Laptop on the ride, and see if other problems come up there.

[Howto] Using include directive with ssh client configuration

A SSH client configuration makes accessing servers much easier and mo convenient. Until recently the configuration was done in one single file which could be problematic. But newer versions support includes to read configuration from multiple places.

SSH is the default way to access servers remotely – Linux and other UNIX systems, and since recently Windows as well.

One feature of the OpenSSH client is to configure often used parameters for SSH connections in a central config file, ~/.ssh/config. This comes in especially handy when multiple remote servers require different parameters: varying ports, other user names, different SSH keys, and so on. It also provides the possibility to define aliases for host names to avoid the necessity to type in the FQDN each time. Since such a configuration is directly read by the SSH client other tools wich are using the SSH client in the background – like Ansible – can benefit from the configuration as well.

A typical configuration of such a config file can look like this:

Host webapp
    HostName webapp.example.com
    User mycorporatelogin
    IdentityFile ~/.ssh/id_corporate
    IdentitiesOnly yes
Host github
    HostName github.com
    User mygithublogin
    IdentityFile ~/.ssh/id_ed25519
Host gitlab
    HostName gitlab.corporate.com
    User mycorporatelogin
    IdentityFile ~/.ssh/id_corporate
Host myserver
    HostName myserver.example.net
    Port 1234
    User myuser
    IdentityFile ~/.ssh/id_ed25519
Host aws-dev
    HostName 12.24.33.66
    User ec2-user
    IdentityFile ~/.ssh/aws.pem
    IdentitiesOnly yes
    StrictHostKeyChecking no
Host azure-prod
    HostName 4.81.234.19
    User azure-prod
    IdentityFile ~/.ssh/azure_ed25519

While this is very handy and helps a lot to maintain sanity even with very different and strange SSH configurations, a single huge file is hard to manage.

Cloud environments for example change constantly, so it makes sense to update/rebuild the configuration regularly. There are scripts out there, but they either just overwrite existing configuration, or do entirely work on an extra file which is referenced in each SSH client call with ssh -F .ssh/aws-config, or they require to mark sections in the .ssh/config like "### AZURE-SSH-CONFIG BEGIN ###". All attempts are either clumsy or error prone.

Another use case is where parts of the SSH configuration is managed by configuration management systems or by software packages for example by a company – again that requires changes to a single file and might alter or remove existing configuration for your others services and servers. After all, it is not uncommon to use your more-or-less private Github account for your company work so that you have mixed entries in your .ssh/config.

The underneath problem of managing more complex software configurations in single files is not unique to OpenSSH, but more or less common across many software stacks which are configured in text files. Recently it became more and more common to write software in a way that configuration is not read as a single file, but that all files from a certain directory are read in. Examples for this include:

  • sudo with the directory /etc/sudoers.d/
  • Apache with /etc/httpd/conf.d
  • Nginx with /etc/nginx/conf.d/
  • Systemd with /etc/systemd/system.conf.d/*
  • and so on…

Initially such an approach was not possible with the SSH client configuration in OpenSSH but there was a bug reported even including a patch quite some years ago. Luckily, almost three years ago OpenSSH version 7.3 was released and that version did come with the solution:

* ssh(1): Add an Include directive for ssh_config(5) files.


https://www.openssh.com/txt/release-7.3

So now it is possible to add one or even multiple files and directories from where additional configuration can be loaded.

Include

Include the specified configuration file(s). Multiple pathnames may be specified and each pathname may contain glob(7) wildcards and, for user configurations, shell-like `~’ references to user home directories. Files without absolute paths are assumed to be in ~/.ssh if included in a user configuration file or /etc/ssh if included from the system configuration file. Include directive may appear inside a Match or Host block to perform conditional inclusion.

https://www.freebsd.org/cgi/man.cgi?ssh_config(5)

The following .ssh/config file defines a sub-directory from where additional configuration can be read in:

$ cat ~/.ssh/config 
Include ~/.ssh/conf.d/*

Underneath ~/.ssh/conf.d there can be additional files, each containing one or more host definitions:

$ ls ~/.ssh/conf.d/
corporate.conf
github.conf
myserver.conf
aws.conf
azure.conf
$ cat ~/.ssh/conf.d/aws.conf
Host aws-dev
    HostName 12.24.33.66
    User ec2-user
    IdentityFile ~/.ssh/aws.pem
    IdentitiesOnly yes
    StrictHostKeyChecking no

This feature made managing SSH configuration for me much easier, and I only have few use cases and mainly require it to keep a simple overview over things. For more flexible (aka cloud based) setups this is crucial and can make things way easier.

Note that the additional config files should only contain host definitions! General SSH configuration should be inside ~/.ssh/config and should be before the include directive: any configuration provided after a “Host” keyword is interpreted as part of that exact host definition – until the next host block or until the next “Match” keyword.

[Howto] ara – making Ansible runs easier to read and understand

Ara is a simple web server showing detailed information about Ansible runs. It is helpful in understanding and troubleshooting Ansible runs.

Background

Ansible runs, especially on the command line, do only provide limited information. Details about used variables, the timing of each task or other information are only available using additional plugins, but the details provided by them are usually narrowed to a use case.

A better way to provide information about Ansible runs is to collect the data and provide them in a web framework. That is what Ansible Tower (or AWX, the upstream project to Tower) does for example: collecting detailed data and providing them in the jobs overview.

But there are situations where a fully fledged Tower is too much, or where a comparing overview of the various runs is needed. This is where ara comes in:

ARA Records Ansible playbook runs and makes the recorded data available and intuitive for users and systems.
It makes your Ansible playbooks easier to understand and troubleshoot.

https://ara.recordsansible.org/

ara was originally developed by people of the OpenStack community, and still today has strong ties with it. It does not replace Ansible Tower at all, since it does not manage the execution at all. It complements the information and overview part, and in a way more competes with the logging solutions which can be connected to Ansible Tower.

How to install

The installation of ara is pretty straight forward and described in the documentation: the software is basically installed via pip, afterwards the server can be started as a local running instance. The connection between Ansible and ara is done via action and callback plugins.

The installation of the ara package is quickly done. Note that on systems with both Python 2 and 3 you need to pick the right pip version:

$ pip3 install --user ara
...
$ python3 -m ara.setup.action_plugins                                                                                                   /home/liquidat/.local/lib/python3.7/site-packages/ara/plugins/actions
$ python3  -m ara.setup.callback_plugins                                                                                                /home/liquidat/.local/lib/python3.7/site-packages/ara/plugins/callbacks

Notice that the binaries end up in ~/.local/bin. If that is not part of the $PATH variable, the server executable to start ara needs to be addressed directly, like ~/.local/bin/ara-manage runserver:

$ ~/.local/bin/ara-manage runserver                                                                                                      * Serving Flask app "ara" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
2019-05-06 02:45:49,156 INFO werkzeug:  * Running on http://127.0.0.1:9191/ (Press CTRL+C to quit)
2019-05-06 02:45:55,915 INFO werkzeug: 127.0.0.1 - - [06/May/2019 02:45:55] "GET / HTTP/1.1" 302 -

The web page can be accessed by pointing a web browser towards http://127.0.0.1:9191/. Since Ansible is not connected yet to ara no data are shown:

As mentioned, to connect ara to Ansible a callback plugin is used. There are different ways available to tell Ansible to use a callback plugin, the easiest is to set up a ansible.cfg with the appropriate data:

$ python3 -m ara.setup.ansible | tee -a ansible.cfg                                                                                        
[defaults]
callback_plugins=/home/rwolters/.local/lib/python3.7/site-packages/ara/plugins/callbacks
action_plugins=/home/rwolters/.local/lib/python3.7/site-packages/ara/plugins/actions

Note here that this creates a new section named [defaults]. Check if your ansible.cfg already has a section called [defaults] and if so merge the entries manually. Now call a few playbooks and check the results:

ara provides easy access to all existing runs, making it possible to easily compare different runs with each other. At the same time detailed information are provided for individual runs, making it easy to figure out what actually happened.

Summary

ara is an interesting attempt at better displaying the information from Ansible runs. It helps analyzing what is happening in each run, where problems might be hidden and so on.

If you use Ansible Tower already the information are available to you anyway. If you like the way how it is presented in ara you can even use both at the same time.

[Howto] Adding SSH keys to Ansible Tower via tower-cli [Update]

Ansible Logo

The tool tower-cli is often used to pre-configure Ansible Tower in a scripted way. It provides a convenient way to boot-strap a Tower configuration. But adding SSH keys as machine credentials is far from easy.

Boot-strapping Ansible Tower can become necessary for testing and QA environments where the same setup is created and destroyed multiple times. Other use cases are when multiple Tower installations need to be configured in the same way or share at least a larger part of the configuration.

One of the necessary tasks in such setups is to create machine credentials in Ansible Tower so that Ansible is able to connect properly to a target machine. In a Linux environment, this is often done via SSH keys.

However, tower-cli calls the Tower API in the background – and JSON POST data need to be in one line. But SSH keys come in multiple lines, so providing the file via a $(cat ssh_file) does not work:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs="{\"username\":\"ansible\",\"ssh_key_data\":\"$(cat .ssh/id_rsa)\",\"become_method\":\"sudo\"}"

Multiple workarounds can be found on the net, like manually editing the file to remove the new lines or creating a dedicated variables file containing the SSH key. There is even a bug report discussing that.

But for my use case I needed to read an existing SSH file directly, and did not want to add another manual step or create an additional variables file. The trick is a rather complex piece of SED:

$(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' /home/ansible/.ssh/id_rsa)

This basically reads in the entire file (instead of just line by line), removes the new lines and replaces them with \n. To be precise:

  • we first create a label "a"
  • append the next line to the pattern space ("N")
  • find out if this is the last line or not ("$!"), and if not
  • branch back to label a ("ba")
  • after that, we search for the new lines ("\r{0,1}")
  • and replace them with the string for a new line, "\n"

Note that this needs to be accompanied with proper line endings and quotation marks. The full call of tower-cli with the sed command inside is:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs="{\"username\":\"ansible\",\"ssh_key_data\":\"$(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' /home/ansible/.ssh/id_rsa)\n\",\"become_method\":\"sudo\"}"

Note all the escaped quotations marks.

Update

Another way to add the keys is to provide yaml in the shell command:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs='username: ansible
become_method: sudo
ssh_key_data: |
'"$(sed 's/^/    /' /home/ansible/.ssh/id_rsa)"

This method is appealing since the corresponding sed call is a little bit easier to understand. But make sure to indent the variables exactly like shown above.

Thanks to the @ericzolf of the Red Hat Automation Community of Practice hinting me to that solution. If you are interested in the Red Hat Communities of Practice, you can read more about them in the blog “Communities of practice: Straight from the open source”.

[Howto] Fix ldap “protocol error” in Gitea (and other Go based apps)

I prefer self hosted solution for some tasks. But this also means that I have to troubleshoot my problems on my own. Recently a go-ldap error gave me a headache. Here is the analysis of the protocol error – and how to solve it.

For certain projects I prefer a self hosted Git server. Solutions like Gitea, the fast developing and striving fork of Gogs, make this painless and easy to do – especially in a containerized environment.

My users are managed in a FreeIPA, and Gitea connects to it via LDAP. And this is a constant source for trouble. Gitea is written in Go, and the go ldap libraries seem to be far from perfect.

For example, after a recent update of my environment, login at Gitea stopped working:

[...gitea/models/user.go:1544 SyncExternalUsers()] [E] LDAP Search failed unexpectedly! (LDAP Result Code 2 "Protocol Error": )

The FreeIPA server at the same time showed indeed malformed requests:

[170978469] fd=112 slot=112 connection from 172.18.0.6 to 172.18.0.2
[171199824] op=0 BIND dn="uid=system,cn=sysaccounts,cn=etc,dc=bayz,dc=de" method=128 version=3
[223472706] op=0 RESULT err=0 tag=97 nentries=0 etime=0.0052434415 dn="uid=system,cn=sysaccounts,cn=etc,dc=bayz,dc=de"
[223738210] op=1 SRCH base="cn=users,cn=accounts,dc=bayz,dc=de" scope=2 filter="(&(objectClass=person)(uid=rwo))" attrs=ALL
[225467030] op=1 RESULT err=0 tag=101 nentries=1 etime=0.0001797298
[226078299] op=2 BIND dn="uid=rwo,cn=users,cn=accounts,dc=bayz,dc=de" method=128 version=3
[278423889] op=2 RESULT err=0 tag=97 nentries=0 etime=0.0052380180 dn="uid=rwo,cn=users,cn=accounts,dc=bayz,dc=de"
[278705323] op=3 SRCH base="(null)" scope=2 filter="(&(objectClass=person)(uid=rwo))", invalid attribute request
[278722888] op=3 RESULT err=2 tag=101 nentries=0 etime=0.0000084787
[279051788] op=-1 fd=112 closed - B1

Since LDAP login still worked fine with other tools I assumed a problem in the new Gitea version and filled a bug report. Other users with the same problem joined soon after, but no one was able to provide a solution.

After some research I figured out that the problem appeared to be related to an update in the FreeIPA server: a security update in the underlying 389 server lead to protocol errors when empty attributes were part of the request.

An updated version of the go-ldap library was supposed to fix this – and indeed, after Gitea updated the library other users reported that the issue was fixed for them.

However, not for me: I still had the problem, and got frustrated over this for weeks.

It took me another evening of research until I found the important missing detail: a Grafana user had the same problem. The updated library did not help there either. But reducing the number of empty attributes by providing values for default, thus otherwise empty attributes did the trick:

For me the error occurs when I have less than 4 attributes.

https://github.com/grafana/grafana/issues/14432#issuecomment-452025337

In the end they figured out that one empty attribute was ok to be sent, but not two. With this information, the fix of my problem was easy: I previously had not set the attribute for First Name and Surname. I added those and immediately was able to login again.

Gitea with LDAP attributes

So: if you ever run into the same problem with Go and LDAP, check if you are indeed sending more than one empty attribute!

This was the second LDAP problem I encountered using Gitea. Due to this experience I do not really feel comfortable with using this combination – and will not be surprised if it breaks again with the next update.

On the other hand LDAP is a rather complicated protocol and probably a bit overkill for a simple use case like this. If I ever re-do my setup I might change over to mail based authentication.

Of debugging Ansible Tower and underlying cloud images

Ansible Logo

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Background – Isolated Nodes

Ansible Tower has a nice feature called “isolated nodes”. Those are dedicated Tower instances which can manage nodes in separated environments – basically an Ansible Tower Proxy.

An Isolated Node is an Ansible Tower node that contains a small piece of software for running playbooks locally to manage a set of infrastructure. It can be deployed behind a firewall/VPC or in a remote datacenter, with only SSH access available. When a job is run that targets things managed by the isolated node, the job and its environment will be pushed to the isolated node over SSH, where it will run as normal.

Ansible Tower Feature Spotlight: Instance Groups and Isolated Nodes

Isolated nodes are especially handy when you setup your automation in security sensitive environments. Think of DMZs here, of network separation and so on.

I was fooling around with a clustered Tower installation on RHEL 7 VMs in a cloud environment when I run into trouble though.

My problem – Isolated node unavailable

Isolated nodes – like instance groups – have a status inside Tower: if things are problematic, they are marked as unavailable. And this is what happened with my instance isonode.remote.example.com running in my lab environment:

Ansible Tower showing an instance node as unavailable

I tried to turn it “off” and “on” again with the button in the control interface. It made the node available, it was even able to executed jobs – but it became quickly unavailable soon after.

Analysis

So what happened? The Tower logs showed a Python error:

# tail -f /var/log/tower/tower.log
fatal: [isonode.remote.example.com]: FAILED! => {"changed": false,
"module_stderr": "Shared connection to isonode.remote.example.com
closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n
File \"/var/lib/awx/.ansible/tmp/ansible-tmp-1552400585.04
-60203645751230/AnsiballZ_awx_capacity.py\", line 113, in <module>\r\n
_ansiballz_main()\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 105, in
_ansiballz_main\r\n    invoke_module(zipped_mod, temp_path,
ANSIBALLZ_PARAMS)\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 48, in
invoke_module\r\n    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 74, in
<module>\r\n  File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\",
line 60, in main\r\n  File
\"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 27, in
get_cpu_capacity\r\nAttributeError: 'module' object has no attribute
'cpu_count'\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}

PLAY RECAP *********************************************************************
isonode.remote.example.com : ok=0    changed=0    unreachable=0    failed=1  

Apparently a Python function was missing. If we check the code we see that indeed in line 27 of file awx_capacity.py the function psutil.cpu_count() is called:

def get_cpu_capacity():
    env_forkcpu = os.getenv('SYSTEM_TASK_FORKS_CPU', None)
    cpu = psutil.cpu_count()

Support for this function was added in version 2.0 of psutil:

2014-03-10
Enhancements
424: [Windows] installer for Python 3.X 64 bit.
427: number of logical and physical CPUs (psutil.cpu_count()).

psutil history

Note the date here: 2014-03-10 – pretty old! I check the version of the installed package, and indeed the version was pre-2.0:

$ rpm -q --queryformat '%{VERSION}\n' python-psutil
1.2.1

To be really sure and also to ensure that there was no weird function backporting, I checked the function call directly on the Tower machine:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> import psutil as module
>>> functions = inspect.getmembers(module, inspect.isfunction)
>>> functions
[('_assert_pid_not_reused', <function _assert_pid_not_reused at
0x7f9eb10a8d70>), ('_deprecated', <function deprecated at 0x7f9eb38ec320>),
('_wraps', <function wraps at 0x7f9eb414f848>), ('avail_phymem', <function
avail_phymem at 0x7f9eb0c32ed8>), ('avail_virtmem', <function avail_virtmem at
0x7f9eb0c36398>), ('cached_phymem', <function cached_phymem at
0x7f9eb10a86e0>), ('cpu_percent', <function cpu_percent at 0x7f9eb0c32320>),
('cpu_times', <function cpu_times at 0x7f9eb0c322a8>), ('cpu_times_percent',
<function cpu_times_percent at 0x7f9eb0c326e0>), ('disk_io_counters',
<function disk_io_counters at 0x7f9eb0c32938>), ('disk_partitions', <function
disk_partitions at 0x7f9eb0c328c0>), ('disk_usage', <function disk_usage at
0x7f9eb0c32848>), ('get_boot_time', <function get_boot_time at
0x7f9eb0c32a28>), ('get_pid_list', <function get_pid_list at 0x7f9eb0c4b410>),
('get_process_list', <function get_process_list at 0x7f9eb0c32c08>),
('get_users', <function get_users at 0x7f9eb0c32aa0>), ('namedtuple',
<function namedtuple at 0x7f9ebc84df50>), ('net_io_counters', <function
net_io_counters at 0x7f9eb0c329b0>), ('network_io_counters', <function
network_io_counters at 0x7f9eb0c36500>), ('phymem_buffers', <function
phymem_buffers at 0x7f9eb10a8848>), ('phymem_usage', <function phymem_usage at
0x7f9eb0c32cf8>), ('pid_exists', <function pid_exists at 0x7f9eb0c32140>),
('process_iter', <function process_iter at 0x7f9eb0c321b8>), ('swap_memory',
<function swap_memory at 0x7f9eb0c327d0>), ('test', <function test at
0x7f9eb0c32b18>), ('total_virtmem', <function total_virtmem at
0x7f9eb0c361b8>), ('used_phymem', <function used_phymem at 0x7f9eb0c36050>),
('used_virtmem', <function used_virtmem at 0x7f9eb0c362a8>), ('virtmem_usage',
<function virtmem_usage at 0x7f9eb0c32de8>), ('virtual_memory', <function
virtual_memory at 0x7f9eb0c32758>), ('wait_procs', <function wait_procs at
0x7f9eb0c32230>)]

Searching for a package origin

So how to solve this issue? My first idea was to get this working by updating the entire code part to the multiprocessor lib:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> cpu = multiprocessing.cpu_count()
>>> cpu
4

But while I was filling a bug report I wondered why RHEL shipped such an ancient library. After all, RHEL 7 was released in June 2014, and psutil had cpu_count available since early 2014! And indeed, a quick search for the package via the Red Hat package search showed a weird result: python-psutil was never part of base RHEL 7! It was only shipped as part of some very, very old OpenStack channels:

access.redhat.com package search, results for python-psutil

Newer OpenStack channels in fact come along with newer versions of python-psutil.

So how did this outdated package end up on this RHEL 7 image? Why was it never updated?

The cloud image is to blame! The package was installed on it – most likely during the creation of the image: python-psutil is needed for OpenStack Heat, so I assume that these RHEL 7 images where once created via OpenStack and then used as the default image in this demo environment.

And after the initial creation of the image the Heat packages were forgotten. In the meantime the image was updated to newer RHEL versions, snapshots were created as new defaults and so on. But since the package in question was never part of the main RHEL repos, it was never changed or removed. It just stayed there. Waiting, apparently, for me 😉

Conclusion

This issue showed me how tricky cloud images can be. Think about your own cloud images: have you really checked all all of them and verified that no package, no start up script, no configuration was changed from the Linux distribution vendor’s base setup?

With RPMs this is still manageable, you can track if packages are installed which are not present in the existing channels. But did someone install something with pip? Or any other way?

Take my case: an outdated version of a library was called instead of a much, much more recent one. If there would have been a serious security issue with the library in the meantime, I would have been exposed although my update management did not report any library to be updated.

I learned my lesson to be more critical with cloud images, checking them in more detail in the future to avoid having nasty surprises during production. And I can just recommend that you do that as well.