[Howto] Using toolbox in Fedora / RHEL 8 for easy management of CLI tools

Running CLI tools like ansible often requires a specific environment with dependencies on the core operating system libraries. That makes it hard to run different versions in parallel – or test the newest updates. And it might clutter the OS. Toolbox offers simple container management to avoid these shortcomings.

Running CLI tools like ansible often requires a specific environment with dependencies on the core operating system libraries. That makes it hard to run different versions in parallel – or test the newest updates. And it might clutter the OS. Toolbox offers simple container management to avoid these shortcomings.

The recent development of Linux distributions has seen a shift away from all-purpose distributions towards stable core distributions with limited packages and additional sand-boxed tooling running on top to enable management of applications. One of the most advanced distributions here is for sure Fedora Silverblue, but even the enterprise distribution Red Hat Enterprise Linux 8 brings a lot of changes which aim into the right direction. Technologies in this context are for example rpm-ostree for the management of immutable OS images and Flatpak for the management of GUI applications. Additionally, RHEL 8 comes along with so called app-streams – and of course there is always the option of using containers with for example podman.

In this blog post I want to focus on the last one: using containers to manage your CLI tools, thus keeping them independent of your operating system packaging and libraries. With Fedora and RHEL, there is tooling provided which makes this even easier: Toolbox.

The rational

The basic idea for using containers, and especially Toolbox, is similar to the one about Flatpak: it solves many problems of the Linux packaging problem. This means essentially:

  • Independence from OS libraries and their versions
  • Sand-boxing, meaning better protection of the OS
  • Multi-version support
  • Less OS clutter through isolated installation of dependencies
  • Easy to recreate environments (think of “works on my machine”)
  • Immutable environments possible

Think of it that way: with complex applications, behavior sometimes depends on certain versions of some libraries. When those are managed by the OS packaging system, it is hard to keep them up2date or just in the same version across multiple machines, not to speak about multiple distributions. Also, I don’t want my OS to be cluttered with weird dependencies which I might not even trust just to justify a weird application’s requirements. And I might want to install different versions of a tool to test them, – with different libraries as well, which is often impossible with OS package management.

Toolbox

In comes Toolbox:

Toolbox is a tool that offers a familiar package based environment for developing and debugging software that runs fully unprivileged using Podman.

The toolbox container is a fully mutable container; when you see yum install ansible for example, that’s something you can do inside your toolbox container, without affecting the base operating system.

Toolbox on Github

While Toolbox is particularly interesting for immutable systems like Fedora Silverblue, it even makes sense to run it on other distributions. I started using it on my regular Fedora for example just to have certain tools available in certain versions for tests.

And why use Toolbox, and not just the usual container tools? Toolbox takes care of volume mounting and all the other necessary bits of container management, and enables you to just use a very basic set of commands to create – and reuse – your tool containers. It is simpler and easier than always typing in fully fledged podman or docker commands all the time.

You can read more about Toolbox in the Fedora Silverblue Toolbox docs or the Red Hat Enterprise Linux 8 Toolbox docs.

Getting started

It is very easy to get started with Toolbox. First, it needs to be installed on the system. For example, on Fedora 31, this can be done via:

$ sudo dnf install toolbox

After that, you are good to go. Since the idea is to have re-usable containers, let’s create the first. In my example I want to have a container with the newest Ansible version to run some automation. So we just create a new container called ansible:

$ toolbox create --container ansible
Image required to create toolbox container.
Download registry.fedoraproject.org/f31/fedora-toolbox:31 (500MB)? [y/N]: y
Created container: ansible

As you see, a base image for my distribution was downloaded, and the container created. Next, let’s access it and look around:

$ toolbox enter --container ansible

Welcome to the Toolbox; a container where you can install and run
all your tools.

 - Use DNF in the usual manner to install command line tools.
 - To create a new tools container, run 'toolbox create'.

For more information, see the documentation.

⬢[liquidat@toolbox ~]$

We are greeted with a short message and then dropped to a shell. Note the bubble at the start of the command prompt – a nice touch to differentiate if you are inside a toolbox or not. Next, let’s look at our environment:

⬢[liquidat@toolbox ~]$ pwd
/home/liquidat
⬢[liquidat@toolbox ~]$ ls
bin  development  documents  downloads  ...
⬢[liquidat@toolbox ~]$ ls /
README.md  bin  boot  dev  etc  home  lib  lib64  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
⬢[liquidat@toolbox ~]$ cat /README.md 
# Toolbox — Unprivileged development environment

[Toolbox](https://github.com/debarshiray/toolbox) is a tool that offers a
[...]

As you see, the toolbox has actual access to the file system. That way we can use the tools just like normal shell tools, interact with things we have in our environment. However, at the same time we have limited access to the root system since we see the container root system (as identified by the readme), not the host root system.

Getting my first tool ready

As mentioned I’d like to have a container with the newest Ansible. Let’s install it:

⬢[liquidat@toolbox ~]$ pip install --user ansible
Collecting ansible
Using cached https://files.pythonhosted.org/packages/ae/b7/c717363f767f7af33d90af9458d5f1e0960db9c2393a6c221c2ce97ad1aa/ansible-2.9.6.tar.gz
Collecting jinja2 (from ansible)
[...]
Running setup.py install for ansible … done
Successfully installed MarkupSafe-1.1.1 PyYAML-5.3 ansible-2.9.6 cffi-1.14.0 cryptography-2.8 jinja2-2.11.1 pycparser-2.20
⬢[liquidat@toolbox ~]$ ansible --version
ansible 2.9.6
config file = /home/liquidat/.ansible.cfg
configured module search path = ['/home/liquidat/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/liquidat/.local/lib/python3.7/site-packages/ansible
executable location = /home/liquidat/.local/bin/ansible
python version = 3.7.6 (default, Jan 30 2020, 09:44:41) [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]

As you see, Ansible was properly installed. And with this we are already done – we have our first tool ready, name “ansible”.

Using our tool

Now let’s assume I use the container for some things, exit it – and want to reuse it later on. This is no problem at all, since that is exactly what Toolbox was built for. And we have a name, which makes it fairly easy to remember how to access it. But even if we do not remember the name, we can easily list all available tools:

$ toolbox list
IMAGE ID      IMAGE NAME                                        CREATED
64e68e194389  registry.fedoraproject.org/f31/fedora-toolbox:31  2 weeks ago

CONTAINER ID  CONTAINER NAME  CREATED         STATUS             IMAGE NAME
8ec117845e06  ansible         47 minutes ago  Up 47 minutes ago  registry.fedoraproject.org/f31/fedora-toolbox:31
$ toolbox enter -c ansible
⬢[liquidat@toolbox ~]$ ansible --version
ansible 2.9.6
  config file = /home/liquidat/.ansible.cfg
  configured module search path = ['/home/liquidat/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/liquidat/.local/lib/python3.7/site-packages/ansible
  executable location = /home/liquidat/.local/bin/ansible
  python version = 3.7.6 (default, Jan 30 2020, 09:44:41) [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]

As you see the container is in the same state as we left it: Ansible is still installed in the proper way, and ready to be used. And we can do this now with all kinds of other tools: be it another version of Ansible, or even some daemon we want to experiment with. It can all be easily installed and run and re-used, without worrying of cluttering the OS, or having the wrong library versions installed, or not being able to update some library because of a system dependency.

Summary

Toolbox is an interesting approach to simplify container management to fool around with CLI based tools. If you have an immutable environment like Fedora Silverblue, it might become a crucial piece in your daily operations since it is a pain to install additional packages on top of Silverblue’s ostree infrastructure. But even for “normal” distributions it is worth a try!

[Howto] Get a Python virtual environment running on RHEL 8

RHEL 8 has a new way how Python is installed and handled. How do you use it properly then, especially when multiple versions are installed? Read on to learn how to properly set up a virtual environment nevertheless.

RHEL 8 has a new way how Python is installed and handled. How do you use it properly then, especially when multiple versions are installed? Read on to learn how to properly set up a virtual environment nevertheless.

Red Hat Enterprise Linux 8 was released in May this year – and comes with a lot of changes. Think of a really modern OS here. Among those changes is also that Python is, well different: it is included, for sure. But at the same time, it isn’t.

The important piece is anyway that, when you work with Python in development environments or for example when you are dealing with Ansible, it makes sense to run everything in a Python virtual environment.

Here is how this can be best done in RHEL 8:

First, install the Python 3.6 appstream:

$ sudo yum install -y python36

Afterwards, set up a python virtual environment:

$ python3.6 -m venv myvirtual_venv

And that’s it already. Activate it with:

$ source myvirtual_venv/bin/activate

In case you are dealing with SELinux bindings, it might make sense to link those into your virtual environment:

$ cd myvirtual_venv/lib/python3.6/site-packages/
$ ln -s /usr/lib64/python3.6/site-packages/selinux
$ ln -s /usr/lib64/python3.6/site-packages/_selinux.cpython-36m-x86_64-linux-gnu.so

When in the future different versions of Python are offered via appstreams, make sure to pick the right selinux bindings when you link them into your virtual environment.

Another way to work with selinux libs is to create the virtual environment by using system packages:

$ python3.6 -m venv --system-site-packages myvirtual_venv

[Short Tip] Handling “can’t concat str to bytes” error in Ansible’s uri module

Ansible Logo

When working with web services, especially REST APIs, Ansible can be of surprising help when you need to automate those, our want to integrate them into your automation.

However, today I run into a strange Python bug while I tried to use the uri module:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: TypeError: can't concat str to bytes
fatal: [localhost]: FAILED! => {"changed": false, "content": "", "elapsed": 0, "msg": "Status code was -1 and not [200]: An unknown error occurred: can't concat str to bytes", "redirected": false, "status": -1, "url": "https://www.ansible.com"}

This drove me almost nuts because it happened on all kinds of machines I tested, even with Ansible’s devel upstream version. It was even independent of the service I targeted – the error happened way earlier. And a playbook to showcase this was suspiciously short and simple:

---                                 
- name: Show concat str byte error  
  hosts: localhost                  
  connection: local                 
  gather_facts: no                  
                                    
  tasks:                            
    - name: call problematic URL call
      uri:                          
        url: "https://www.ansible.com"
        method: POST                
        body:                       
          name: "myngfw"

When I was about to fill an issue at Ansible’s Github page I thought again and wondered that this is too simple: I couldn’t imagine that I was the only one hitting this problem. I realized that the error had to be on my side. And thus meant that something was missing.

And indeed: the body_format option was not explicitly stated, so Ansible assumed “raw”, while my body data were provided in json format. A simple

        body_format: json

solved my problems.

Never dare to ask me how long it took me to figure this one out. And that from the person who write and entire how to about how to provide payload with the Ansible URI module….

Of debugging Ansible Tower and underlying cloud images

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Ansible Logo

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Background – Isolated Nodes

Ansible Tower has a nice feature called “isolated nodes”. Those are dedicated Tower instances which can manage nodes in separated environments – basically an Ansible Tower Proxy.

An Isolated Node is an Ansible Tower node that contains a small piece of software for running playbooks locally to manage a set of infrastructure. It can be deployed behind a firewall/VPC or in a remote datacenter, with only SSH access available. When a job is run that targets things managed by the isolated node, the job and its environment will be pushed to the isolated node over SSH, where it will run as normal.

Ansible Tower Feature Spotlight: Instance Groups and Isolated Nodes

Isolated nodes are especially handy when you setup your automation in security sensitive environments. Think of DMZs here, of network separation and so on.

I was fooling around with a clustered Tower installation on RHEL 7 VMs in a cloud environment when I run into trouble though.

My problem – Isolated node unavailable

Isolated nodes – like instance groups – have a status inside Tower: if things are problematic, they are marked as unavailable. And this is what happened with my instance isonode.remote.example.com running in my lab environment:

Ansible Tower showing an instance node as unavailable

I tried to turn it “off” and “on” again with the button in the control interface. It made the node available, it was even able to executed jobs – but it became quickly unavailable soon after.

Analysis

So what happened? The Tower logs showed a Python error:

# tail -f /var/log/tower/tower.log
fatal: [isonode.remote.example.com]: FAILED! => {"changed": false,
"module_stderr": "Shared connection to isonode.remote.example.com
closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n
File \"/var/lib/awx/.ansible/tmp/ansible-tmp-1552400585.04
-60203645751230/AnsiballZ_awx_capacity.py\", line 113, in <module>\r\n
_ansiballz_main()\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 105, in
_ansiballz_main\r\n    invoke_module(zipped_mod, temp_path,
ANSIBALLZ_PARAMS)\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 48, in
invoke_module\r\n    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 74, in
<module>\r\n  File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\",
line 60, in main\r\n  File
\"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 27, in
get_cpu_capacity\r\nAttributeError: 'module' object has no attribute
'cpu_count'\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}

PLAY RECAP *********************************************************************
isonode.remote.example.com : ok=0    changed=0    unreachable=0    failed=1  

Apparently a Python function was missing. If we check the code we see that indeed in line 27 of file awx_capacity.py the function psutil.cpu_count() is called:

def get_cpu_capacity():
    env_forkcpu = os.getenv('SYSTEM_TASK_FORKS_CPU', None)
    cpu = psutil.cpu_count()

Support for this function was added in version 2.0 of psutil:

2014-03-10
Enhancements
424: [Windows] installer for Python 3.X 64 bit.
427: number of logical and physical CPUs (psutil.cpu_count()).

psutil history

Note the date here: 2014-03-10 – pretty old! I check the version of the installed package, and indeed the version was pre-2.0:

$ rpm -q --queryformat '%{VERSION}\n' python-psutil
1.2.1

To be really sure and also to ensure that there was no weird function backporting, I checked the function call directly on the Tower machine:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> import psutil as module
>>> functions = inspect.getmembers(module, inspect.isfunction)
>>> functions
[('_assert_pid_not_reused', <function _assert_pid_not_reused at
0x7f9eb10a8d70>), ('_deprecated', <function deprecated at 0x7f9eb38ec320>),
('_wraps', <function wraps at 0x7f9eb414f848>), ('avail_phymem', <function
avail_phymem at 0x7f9eb0c32ed8>), ('avail_virtmem', <function avail_virtmem at
0x7f9eb0c36398>), ('cached_phymem', <function cached_phymem at
0x7f9eb10a86e0>), ('cpu_percent', <function cpu_percent at 0x7f9eb0c32320>),
('cpu_times', <function cpu_times at 0x7f9eb0c322a8>), ('cpu_times_percent',
<function cpu_times_percent at 0x7f9eb0c326e0>), ('disk_io_counters',
<function disk_io_counters at 0x7f9eb0c32938>), ('disk_partitions', <function
disk_partitions at 0x7f9eb0c328c0>), ('disk_usage', <function disk_usage at
0x7f9eb0c32848>), ('get_boot_time', <function get_boot_time at
0x7f9eb0c32a28>), ('get_pid_list', <function get_pid_list at 0x7f9eb0c4b410>),
('get_process_list', <function get_process_list at 0x7f9eb0c32c08>),
('get_users', <function get_users at 0x7f9eb0c32aa0>), ('namedtuple',
<function namedtuple at 0x7f9ebc84df50>), ('net_io_counters', <function
net_io_counters at 0x7f9eb0c329b0>), ('network_io_counters', <function
network_io_counters at 0x7f9eb0c36500>), ('phymem_buffers', <function
phymem_buffers at 0x7f9eb10a8848>), ('phymem_usage', <function phymem_usage at
0x7f9eb0c32cf8>), ('pid_exists', <function pid_exists at 0x7f9eb0c32140>),
('process_iter', <function process_iter at 0x7f9eb0c321b8>), ('swap_memory',
<function swap_memory at 0x7f9eb0c327d0>), ('test', <function test at
0x7f9eb0c32b18>), ('total_virtmem', <function total_virtmem at
0x7f9eb0c361b8>), ('used_phymem', <function used_phymem at 0x7f9eb0c36050>),
('used_virtmem', <function used_virtmem at 0x7f9eb0c362a8>), ('virtmem_usage',
<function virtmem_usage at 0x7f9eb0c32de8>), ('virtual_memory', <function
virtual_memory at 0x7f9eb0c32758>), ('wait_procs', <function wait_procs at
0x7f9eb0c32230>)]

Searching for a package origin

So how to solve this issue? My first idea was to get this working by updating the entire code part to the multiprocessor lib:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> cpu = multiprocessing.cpu_count()
>>> cpu
4

But while I was filling a bug report I wondered why RHEL shipped such an ancient library. After all, RHEL 7 was released in June 2014, and psutil had cpu_count available since early 2014! And indeed, a quick search for the package via the Red Hat package search showed a weird result: python-psutil was never part of base RHEL 7! It was only shipped as part of some very, very old OpenStack channels:

access.redhat.com package search, results for python-psutil

Newer OpenStack channels in fact come along with newer versions of python-psutil.

So how did this outdated package end up on this RHEL 7 image? Why was it never updated?

The cloud image is to blame! The package was installed on it – most likely during the creation of the image: python-psutil is needed for OpenStack Heat, so I assume that these RHEL 7 images where once created via OpenStack and then used as the default image in this demo environment.

And after the initial creation of the image the Heat packages were forgotten. In the meantime the image was updated to newer RHEL versions, snapshots were created as new defaults and so on. But since the package in question was never part of the main RHEL repos, it was never changed or removed. It just stayed there. Waiting, apparently, for me 😉

Conclusion

This issue showed me how tricky cloud images can be. Think about your own cloud images: have you really checked all all of them and verified that no package, no start up script, no configuration was changed from the Linux distribution vendor’s base setup?

With RPMs this is still manageable, you can track if packages are installed which are not present in the existing channels. But did someone install something with pip? Or any other way?

Take my case: an outdated version of a library was called instead of a much, much more recent one. If there would have been a serious security issue with the library in the meantime, I would have been exposed although my update management did not report any library to be updated.

I learned my lesson to be more critical with cloud images, checking them in more detail in the future to avoid having nasty surprises during production. And I can just recommend that you do that as well.

[Short Tip] Provide dictionaries as default in Ansible variables

Ansible Logo

Ansible uses the Jinja2 template engine to handle variables. This includes the default filter, which sets a default value if a referenced variable is not explicitly defined somewhere else.

With Ansible it might happen that instead of a skalar variable a key-value is needed, a dictionary. If you just paste the plain text in there, you might run into trouble:

fatal: [test.example.com]: FAILED! => {"changed": false, "msg": "argument env is of type and we were unable to convert to dict: dictionary requested, could not parse JSON or key=value"}

The key-value pair needs to be properly formatted:

"{{ my_variable|default({'key':'value'}) }}"

Thanks to @bcoca for his post about this.