[Howto] Using Ansible and Ansible Tower with shared Roles

Ansible Logo

Roles are a neat way in Ansible to make playbooks and everything related to them re-usable. If used with Tower, they can be even more powerful.

(I published this post originally at ansible.com/blog .)

Roles are an essential part of Ansible, and help in structuring your automation content. The idea is to have clearly defined roles for dedicated tasks. During your automation code, the roles will be called by the Ansible Playbooks.

Since roles usually have a well defined purpose, they make it easy to reuse your code for yourself, but also in your team. And you can even share roles with the global community. In fact, the Ansible community created Ansible Galaxy as a central place to display, search and view Ansible roles from thousands of people.

So what does a role look like? Basically it is a predefined structure of folders and files to hold your automation code. There is a folder for your templates, a folder to keep files with tasks, one for handlers, another one for your default variables, and so on:

tasks/ 
handlers/ 
files/ 
templates/ 
vars/ 
defaults/ 
meta/

In folders which contain Ansible code – like tasks, handlers, vars, defaults – there are main.yml files. Those contain the relevant Ansible bits. In case of the tasks directory, they often include other yaml files within the same directory. Roles even provide ways to test your automation code – in an automated fashion, of course.

This post will show how roles can be shared with others, be used in your projects and how this works with Red Hat Ansible Tower.

Share Roles via Repositories

Roles can be part of your project repository. They usually sit underneath a dedicated roles/ directory. But keeping roles in your own repository makes it hard to share them with others, to be reused and improved by them. If someone works on a different team, or on a different project, they might not have access to your repository – or they may use their own anyway. So even if you send them a copy of your role, they could add it to their own repository, making it hard to exchange improvements, bug fixes and changes across totally different repositories.

For that reason, a better way is to keep a role in its own repository. That way it can be easily shared and improved. However, to be available to a playbook, the role still needs to be included. Technically there are multiple ways to do that.

For example there can be a global roles directory outside your project where all roles are kept. This can be referenced in ansible.cfg. However, this requires that all developer setups and also the environment in which the automation is finally executed have the same global directory structure. This is not very practical.

When Git is used as the version control system, there is also the possibility of importing roles from other repositories via Git submodules, or even using Git subtrees. However, this requires quite some knowledge about advanced Git features by each and everyone using it – so it is far from simple.

The best way to make shared roles available to your playbooks is to use a function built into Ansible itself: by using the command ansible-galaxy , ansible galaxy can read a file specifying which external roles need to be imported for a successful Ansible run: requirements.yml. It lists external roles and their sources. If needed, it can also point to a specific version:

# from GitHub
- src: https://github.com/bennojoy/nginx 
# from GitHub, overriding the name and specifying a tag 
- src: https://github.com/bennojoy/nginx 
  version: master 
  name: nginx_role 
# from Bitbucket 
- src: git+http://bitbucket.org/willthames/git-ansible-galaxy 
  version: v1.4 # from galaxy 
- src: yatesr.timezone

The file can be used via the command ansible-galaxy. It reads the file and downloads all specified roles to the appropriate path:

ansible-galaxy install -r roles/requirements.yml 
- extracting nginx to /home/rwolters/ansible/roles/nginx 
- nginx was installed successfully 
- extracting nginx_role to 
/home/rwolters/ansible/roles/nginx_role 
- nginx_role (master) was installed successfully 
...

The output also highlights when a specific version was downloaded. You will find a copy of each role in your roles/directory – so make sure that you do not accidentally add the downloaded roles to your repository! The best option is to add them to the .gitignore file.

This way, roles can be imported into the project and are available to all playbooks while they are still shared via a central repository. Changes to the role need to be made in the dedicated repository – which ensures that no light-minded and project specific changes are done in the role.

At the same time the version attribute in requirements.ymlensures that the used role can be pinned to a certain release tag value, commit hash, or branch name. This is useful in case the development of a role is quickly moving forward, but your project has longer development cycles.

Using Roles in Ansible Tower

If you use automation on larger, enterprise scales you most likely will start using Ansible Tower sooner or later. So how do roles work with Ansible Tower? In fact – just like mentioned above. Each time Ansible Tower checks out a project it looks for a roles/requirements.yml. If such a file is present, a new version of each listed role is copied to the local checkout of the project and thus available to the relevant playbooks.

That way shared roles can easily be reused in Ansible Tower – it is built in right from the start!

Best Practices and Things to Keep in Mind

There are a few best practices around sharing of Ansible roles that make your life easier. The first is the naming and location of the roles directory. While it is possible to name the directory any way via the roles_path in ansible.cfg, we strongly recommend to stick to the directory name roles, sitting in the root of your project directory. Do not choose another name for it or move it to some subdirectory.

The same is true for requirements.yml: have one requirements.yml only, and keep it at roles/requirements.yml. While it is technically possible to have multiple files and spread them across your project, this will not work when the project is imported into Ansible Tower.

Also, if the roles are not only shared among multiple users, but are also developed with others or not by you at all, it might make sense to pin the role to the actual commit you’ve tested your setup against. That way you will avoid unwanted changes in the role behaviour.

More Information

Find, reuse, and share the best Ansible content on Ansible Galaxy.

Learn more about roles on Ansible Docs.

Advertisements

Ansible and Ansible Tower special variables

Ansible Logo

Ansible and Ansible Tower provide a powerful variable system. At the same time, there are some variables reserved to one or the other, which cannot be used by others, but can be helpful. This post lists all reserved and magic variables and also important keywords.

Ansible Variables

Variables in Ansible are a powerful tool to influence and control your automation execution. In fact, I’ve dedicated a fare share of posts to the topic over the years:

The official documentation of Ansible variables is also quite comprehensive.

The variable system is in fact so powerful that Ansible uses it itself. There are certain variables which are reserved, the so called magic variables.

The given documentation lists many of them – but is missing the Tower ones. For that reason this post list all magic variables in Ansible and Ansible Tower with references to more information.

Note that the variables and keywords might be different for different Ansible versions. The lists provided here are for Ansible 2.8 which is the current release and als shipped in Fedora – and Tower 3.4/3.5.

Reserved & Magic Variables

Magic Variables

The following list shows true magic variables. They are reserved internally and are overwritten by Ansible if needed. A “(D)” highlights that the variable is deprecated.

ansible_check_mode
ansible_dependent_role_names
ansible_diff_mode
ansible_forks
ansible_inventory_sources
ansible_limit
ansible_loop
ansible_loop_var
ansible_play_batch (D)
ansible_play_hosts (D)
ansible_play_hosts_all
ansible_play_role_names
ansible_playbook_python
ansible_role_names
ansible_run_tags
ansible_search_path
ansible_skip_tags
ansible_verbosity
ansible_version
group_names
groups
hostvars
inventory_hostname
inventory_hostname_short
inventory_dir
inventory_file
omit
play_hosts (D)
ansible_play_name
playbook_dir
role_name
role_names
role_path

Source: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html

Facts

Facts are not magic variables because they are not internal. But they are collected during facts gathering or execution of the setup module, so it helps to keep them in mind. There are two “main” variables related to facts, and a lot of other variables depending on what the managed node has to offer. Since those are different from system to system, it is tricky to list them all. But they can be easily identified by the leading ansible_.

ansible_facts
ansible_local
ansible_*

Sources: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html & docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html

Connection Variables

Connection variables control the way Ansible connects to target machines: what connection plugin to use, etc.

ansible_become_user
ansible_connection
ansible_host
ansible_python_interpreter
ansible_user

Soource: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html

Ansible Tower

Tower has its own set of magic variables which are used internally to control the execution of the automation. Note that those variables can optionally start with awx_ instead of tower_.

tower_job_id
tower_job_launch_type
tower_job_template_id
tower_job_template_name
tower_user_id
tower_user_name
tower_schedule_id
tower_schedule_name
tower_workflow_job_id
tower_workflow_job_name

Source: docs.ansible.com/ansible-tower/latest/html/userguide/job_templates.html

Keywords

Keywords are strictly speaking not variables. In fact, you can even set a variable named as a key word. Instead, they are the parts of a playbook that make a playbook work: think of the keys hosts, tasks, name or even the parameters of a module.

It is just important to keep those keywords in mind – and it certainly helps when you name your variables in a way that they are not mixed up with keywords by chance.

The following lists shows all keywords by where they can appear. Note that some keywords are listed multiple times because they can be used at different places.

Play

any_errors_fatal
become
become_flags
become_method
become_user
check_mode
collections
connection
debugger
diff
environment
fact_path
force_handlers
gather_facts
gather_subset
gather_timeout
handlers
hosts
ignore_errors
ignore_unreachable
max_fail_percentage
module_defaults
name
no_log
order
port
post_tasks
pre_tasks
remote_user
roles
run_once
serial
strategy
tags
tasks
vars
vars_files
vars_prompt

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Role

any_errors_fatal
become
become_flags
become_method
become_user
check_mode
collections
connection
debugger
delegate_facts
delegate_to
diff
environment
ignore_errors
ignore_unreachable
module_defaults
name
no_log
port
remote_user
run_once
tags
vars
when

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Block

always
any_errors_fatal
become
become_flags
become_method
become_user
block
check_mode
collections
connection
debugger
delegate_facts
delegate_to
diff
environment
ignore_errors
ignore_unreachable
module_defaults
name
no_log
port
remote_user
rescue
run_once
tags
vars
when

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Task

action
any_errors_fatal
args
async
become
become_flags
become_method
become_user
changed_when
check_mode
collections
connection
debugger
delay
delegate_facts
delegate_to
diff
environment
failed_when
ignore_errors
ignore_unreachable
local_action
loop
loop_control
module_defaults
name
no_log
notify
poll
port
register
remote_user
retries
run_once
tags
until
vars
when
with_<lookup_plugin>

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

[Howto] Using include directive with ssh client configuration

A SSH client configuration makes accessing servers much easier and more convenient. Until recently the configuration was done in one single file which could be problematic. But newer versions support includes to read configuration from multiple places.

SSH is the default way to access servers remotely – Linux and other UNIX systems, and since recently Windows as well.

One feature of the OpenSSH client is to configure often used parameters for SSH connections in a central config file, ~/.ssh/config. This comes in especially handy when multiple remote servers require different parameters: varying ports, other user names, different SSH keys, and so on. It also provides the possibility to define aliases for host names to avoid the necessity to type in the FQDN each time. Since such a configuration is directly read by the SSH client other tools wich are using the SSH client in the background – like Ansible – can benefit from the configuration as well.

A typical configuration of such a config file can look like this:

Host webapp
    HostName webapp.example.com
    User mycorporatelogin
    IdentityFile ~/.ssh/id_corporate
    IdentitiesOnly yes
Host github
    HostName github.com
    User mygithublogin
    IdentityFile ~/.ssh/id_ed25519
Host gitlab
    HostName gitlab.corporate.com
    User mycorporatelogin
    IdentityFile ~/.ssh/id_corporate
Host myserver
    HostName myserver.example.net
    Port 1234
    User myuser
    IdentityFile ~/.ssh/id_ed25519
Host aws-dev
    HostName 12.24.33.66
    User ec2-user
    IdentityFile ~/.ssh/aws.pem
    IdentitiesOnly yes
    StrictHostKeyChecking no
Host azure-prod
    HostName 4.81.234.19
    User azure-prod
    IdentityFile ~/.ssh/azure_ed25519

While this is very handy and helps a lot to maintain sanity even with very different and strange SSH configurations, a single huge file is hard to manage.

Cloud environments for example change constantly, so it makes sense to update/rebuild the configuration regularly. There are scripts out there, but they either just overwrite existing configuration, or do entirely work on an extra file which is referenced in each SSH client call with ssh -F .ssh/aws-config, or they require to mark sections in the .ssh/config like "### AZURE-SSH-CONFIG BEGIN ###". All attempts are either clumsy or error prone.

Another use case is where parts of the SSH configuration is managed by configuration management systems or by software packages for example by a company – again that requires changes to a single file and might alter or remove existing configuration for your others services and servers. After all, it is not uncommon to use your more-or-less private Github account for your company work so that you have mixed entries in your .ssh/config.

The underneath problem of managing more complex software configurations in single files is not unique to OpenSSH, but more or less common across many software stacks which are configured in text files. Recently it became more and more common to write software in a way that configuration is not read as a single file, but that all files from a certain directory are read in. Examples for this include:

  • sudo with the directory /etc/sudoers.d/
  • Apache with /etc/httpd/conf.d
  • Nginx with /etc/nginx/conf.d/
  • Systemd with /etc/systemd/system.conf.d/*
  • and so on…

Initially such an approach was not possible with the SSH client configuration in OpenSSH but there was a bug reported even including a patch quite some years ago. Luckily, almost three years ago OpenSSH version 7.3 was released and that version did come with the solution:

* ssh(1): Add an Include directive for ssh_config(5) files.


https://www.openssh.com/txt/release-7.3

So now it is possible to add one or even multiple files and directories from where additional configuration can be loaded.

Include

Include the specified configuration file(s). Multiple pathnames may be specified and each pathname may contain glob(7) wildcards and, for user configurations, shell-like `~’ references to user home directories. Files without absolute paths are assumed to be in ~/.ssh if included in a user configuration file or /etc/ssh if included from the system configuration file. Include directive may appear inside a Match or Host block to perform conditional inclusion.

https://www.freebsd.org/cgi/man.cgi?ssh_config(5)

The following .ssh/config file defines a sub-directory from where additional configuration can be read in:

$ cat ~/.ssh/config 
Include ~/.ssh/conf.d/*

Underneath ~/.ssh/conf.d there can be additional files, each containing one or more host definitions:

$ ls ~/.ssh/conf.d/
corporate.conf
github.conf
myserver.conf
aws.conf
azure.conf
$ cat ~/.ssh/conf.d/aws.conf
Host aws-dev
    HostName 12.24.33.66
    User ec2-user
    IdentityFile ~/.ssh/aws.pem
    IdentitiesOnly yes
    StrictHostKeyChecking no

This feature made managing SSH configuration for me much easier, and I only have few use cases and mainly require it to keep a simple overview over things. For more flexible (aka cloud based) setups this is crucial and can make things way easier.

Note that the additional config files should only contain host definitions! General SSH configuration should be inside ~/.ssh/config and should be before the include directive: any configuration provided after a “Host” keyword is interpreted as part of that exact host definition – until the next host block or until the next “Match” keyword.

[Howto] ara – making Ansible runs easier to read and understand

Ara is a simple web server showing detailed information about Ansible runs. It is helpful in understanding and troubleshooting Ansible runs.

Background

Ansible runs, especially on the command line, do only provide limited information. Details about used variables, the timing of each task or other information are only available using additional plugins, but the details provided by them are usually narrowed to a use case.

A better way to provide information about Ansible runs is to collect the data and provide them in a web framework. That is what Ansible Tower (or AWX, the upstream project to Tower) does for example: collecting detailed data and providing them in the jobs overview.

But there are situations where a fully fledged Tower is too much, or where a comparing overview of the various runs is needed. This is where ara comes in:

ARA Records Ansible playbook runs and makes the recorded data available and intuitive for users and systems.
It makes your Ansible playbooks easier to understand and troubleshoot.

https://ara.recordsansible.org/

ara was originally developed by people of the OpenStack community, and still today has strong ties with it. It does not replace Ansible Tower at all, since it does not manage the execution at all. It complements the information and overview part, and in a way more competes with the logging solutions which can be connected to Ansible Tower.

How to install

The installation of ara is pretty straight forward and described in the documentation: the software is basically installed via pip, afterwards the server can be started as a local running instance. The connection between Ansible and ara is done via action and callback plugins.

The installation of the ara package is quickly done. Note that on systems with both Python 2 and 3 you need to pick the right pip version:

$ pip3 install --user ara
...
$ python3 -m ara.setup.action_plugins                                                                                                   /home/liquidat/.local/lib/python3.7/site-packages/ara/plugins/actions
$ python3  -m ara.setup.callback_plugins                                                                                                /home/liquidat/.local/lib/python3.7/site-packages/ara/plugins/callbacks

Notice that the binaries end up in ~/.local/bin. If that is not part of the $PATH variable, the server executable to start ara needs to be addressed directly, like ~/.local/bin/ara-manage runserver:

$ ~/.local/bin/ara-manage runserver                                                                                                      * Serving Flask app "ara" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
2019-05-06 02:45:49,156 INFO werkzeug:  * Running on http://127.0.0.1:9191/ (Press CTRL+C to quit)
2019-05-06 02:45:55,915 INFO werkzeug: 127.0.0.1 - - [06/May/2019 02:45:55] "GET / HTTP/1.1" 302 -

The web page can be accessed by pointing a web browser towards http://127.0.0.1:9191/. Since Ansible is not connected yet to ara no data are shown:

As mentioned, to connect ara to Ansible a callback plugin is used. There are different ways available to tell Ansible to use a callback plugin, the easiest is to set up a ansible.cfg with the appropriate data:

$ python3 -m ara.setup.ansible | tee -a ansible.cfg                                                                                        
[defaults]
callback_plugins=/home/rwolters/.local/lib/python3.7/site-packages/ara/plugins/callbacks
action_plugins=/home/rwolters/.local/lib/python3.7/site-packages/ara/plugins/actions

Note here that this creates a new section named [defaults]. Check if your ansible.cfg already has a section called [defaults] and if so merge the entries manually. Now call a few playbooks and check the results:

ara provides easy access to all existing runs, making it possible to easily compare different runs with each other. At the same time detailed information are provided for individual runs, making it easy to figure out what actually happened.

Summary

ara is an interesting attempt at better displaying the information from Ansible runs. It helps analyzing what is happening in each run, where problems might be hidden and so on.

If you use Ansible Tower already the information are available to you anyway. If you like the way how it is presented in ara you can even use both at the same time.

[Howto] Adding SSH keys to Ansible Tower via tower-cli [Update]

Ansible Logo

The tool tower-cli is often used to pre-configure Ansible Tower in a scripted way. It provides a convenient way to boot-strap a Tower configuration. But adding SSH keys as machine credentials is far from easy.

Boot-strapping Ansible Tower can become necessary for testing and QA environments where the same setup is created and destroyed multiple times. Other use cases are when multiple Tower installations need to be configured in the same way or share at least a larger part of the configuration.

One of the necessary tasks in such setups is to create machine credentials in Ansible Tower so that Ansible is able to connect properly to a target machine. In a Linux environment, this is often done via SSH keys.

However, tower-cli calls the Tower API in the background – and JSON POST data need to be in one line. But SSH keys come in multiple lines, so providing the file via a $(cat ssh_file) does not work:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs="{\"username\":\"ansible\",\"ssh_key_data\":\"$(cat .ssh/id_rsa)\",\"become_method\":\"sudo\"}"

Multiple workarounds can be found on the net, like manually editing the file to remove the new lines or creating a dedicated variables file containing the SSH key. There is even a bug report discussing that.

But for my use case I needed to read an existing SSH file directly, and did not want to add another manual step or create an additional variables file. The trick is a rather complex piece of SED:

$(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' /home/ansible/.ssh/id_rsa)

This basically reads in the entire file (instead of just line by line), removes the new lines and replaces them with \n. To be precise:

  • we first create a label "a"
  • append the next line to the pattern space ("N")
  • find out if this is the last line or not ("$!"), and if not
  • branch back to label a ("ba")
  • after that, we search for the new lines ("\r{0,1}")
  • and replace them with the string for a new line, "\n"

Note that this needs to be accompanied with proper line endings and quotation marks. The full call of tower-cli with the sed command inside is:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs="{\"username\":\"ansible\",\"ssh_key_data\":\"$(sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' /home/ansible/.ssh/id_rsa)\n\",\"become_method\":\"sudo\"}"

Note all the escaped quotations marks.

Update

Another way to add the keys is to provide yaml in the shell command:

tower-cli credential create --name "Example Credentials" \
                     --organization "Default" --credential-type "Machine" \
                     --inputs='username: ansible
become_method: sudo
ssh_key_data: |
'"$(sed 's/^/    /' /home/ansible/.ssh/id_rsa)"

This method is appealing since the corresponding sed call is a little bit easier to understand. But make sure to indent the variables exactly like shown above.

Thanks to the @ericzolf of the Red Hat Automation Community of Practice hinting me to that solution. If you are interested in the Red Hat Communities of Practice, you can read more about them in the blog “Communities of practice: Straight from the open source”.

Of debugging Ansible Tower and underlying cloud images

Ansible Logo

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Background – Isolated Nodes

Ansible Tower has a nice feature called “isolated nodes”. Those are dedicated Tower instances which can manage nodes in separated environments – basically an Ansible Tower Proxy.

An Isolated Node is an Ansible Tower node that contains a small piece of software for running playbooks locally to manage a set of infrastructure. It can be deployed behind a firewall/VPC or in a remote datacenter, with only SSH access available. When a job is run that targets things managed by the isolated node, the job and its environment will be pushed to the isolated node over SSH, where it will run as normal.

Ansible Tower Feature Spotlight: Instance Groups and Isolated Nodes

Isolated nodes are especially handy when you setup your automation in security sensitive environments. Think of DMZs here, of network separation and so on.

I was fooling around with a clustered Tower installation on RHEL 7 VMs in a cloud environment when I run into trouble though.

My problem – Isolated node unavailable

Isolated nodes – like instance groups – have a status inside Tower: if things are problematic, they are marked as unavailable. And this is what happened with my instance isonode.remote.example.com running in my lab environment:

Ansible Tower showing an instance node as unavailable

I tried to turn it “off” and “on” again with the button in the control interface. It made the node available, it was even able to executed jobs – but it became quickly unavailable soon after.

Analysis

So what happened? The Tower logs showed a Python error:

# tail -f /var/log/tower/tower.log
fatal: [isonode.remote.example.com]: FAILED! => {"changed": false,
"module_stderr": "Shared connection to isonode.remote.example.com
closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n
File \"/var/lib/awx/.ansible/tmp/ansible-tmp-1552400585.04
-60203645751230/AnsiballZ_awx_capacity.py\", line 113, in <module>\r\n
_ansiballz_main()\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 105, in
_ansiballz_main\r\n    invoke_module(zipped_mod, temp_path,
ANSIBALLZ_PARAMS)\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 48, in
invoke_module\r\n    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 74, in
<module>\r\n  File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\",
line 60, in main\r\n  File
\"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 27, in
get_cpu_capacity\r\nAttributeError: 'module' object has no attribute
'cpu_count'\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}

PLAY RECAP *********************************************************************
isonode.remote.example.com : ok=0    changed=0    unreachable=0    failed=1  

Apparently a Python function was missing. If we check the code we see that indeed in line 27 of file awx_capacity.py the function psutil.cpu_count() is called:

def get_cpu_capacity():
    env_forkcpu = os.getenv('SYSTEM_TASK_FORKS_CPU', None)
    cpu = psutil.cpu_count()

Support for this function was added in version 2.0 of psutil:

2014-03-10
Enhancements
424: [Windows] installer for Python 3.X 64 bit.
427: number of logical and physical CPUs (psutil.cpu_count()).

psutil history

Note the date here: 2014-03-10 – pretty old! I check the version of the installed package, and indeed the version was pre-2.0:

$ rpm -q --queryformat '%{VERSION}\n' python-psutil
1.2.1

To be really sure and also to ensure that there was no weird function backporting, I checked the function call directly on the Tower machine:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> import psutil as module
>>> functions = inspect.getmembers(module, inspect.isfunction)
>>> functions
[('_assert_pid_not_reused', <function _assert_pid_not_reused at
0x7f9eb10a8d70>), ('_deprecated', <function deprecated at 0x7f9eb38ec320>),
('_wraps', <function wraps at 0x7f9eb414f848>), ('avail_phymem', <function
avail_phymem at 0x7f9eb0c32ed8>), ('avail_virtmem', <function avail_virtmem at
0x7f9eb0c36398>), ('cached_phymem', <function cached_phymem at
0x7f9eb10a86e0>), ('cpu_percent', <function cpu_percent at 0x7f9eb0c32320>),
('cpu_times', <function cpu_times at 0x7f9eb0c322a8>), ('cpu_times_percent',
<function cpu_times_percent at 0x7f9eb0c326e0>), ('disk_io_counters',
<function disk_io_counters at 0x7f9eb0c32938>), ('disk_partitions', <function
disk_partitions at 0x7f9eb0c328c0>), ('disk_usage', <function disk_usage at
0x7f9eb0c32848>), ('get_boot_time', <function get_boot_time at
0x7f9eb0c32a28>), ('get_pid_list', <function get_pid_list at 0x7f9eb0c4b410>),
('get_process_list', <function get_process_list at 0x7f9eb0c32c08>),
('get_users', <function get_users at 0x7f9eb0c32aa0>), ('namedtuple',
<function namedtuple at 0x7f9ebc84df50>), ('net_io_counters', <function
net_io_counters at 0x7f9eb0c329b0>), ('network_io_counters', <function
network_io_counters at 0x7f9eb0c36500>), ('phymem_buffers', <function
phymem_buffers at 0x7f9eb10a8848>), ('phymem_usage', <function phymem_usage at
0x7f9eb0c32cf8>), ('pid_exists', <function pid_exists at 0x7f9eb0c32140>),
('process_iter', <function process_iter at 0x7f9eb0c321b8>), ('swap_memory',
<function swap_memory at 0x7f9eb0c327d0>), ('test', <function test at
0x7f9eb0c32b18>), ('total_virtmem', <function total_virtmem at
0x7f9eb0c361b8>), ('used_phymem', <function used_phymem at 0x7f9eb0c36050>),
('used_virtmem', <function used_virtmem at 0x7f9eb0c362a8>), ('virtmem_usage',
<function virtmem_usage at 0x7f9eb0c32de8>), ('virtual_memory', <function
virtual_memory at 0x7f9eb0c32758>), ('wait_procs', <function wait_procs at
0x7f9eb0c32230>)]

Searching for a package origin

So how to solve this issue? My first idea was to get this working by updating the entire code part to the multiprocessor lib:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> cpu = multiprocessing.cpu_count()
>>> cpu
4

But while I was filling a bug report I wondered why RHEL shipped such an ancient library. After all, RHEL 7 was released in June 2014, and psutil had cpu_count available since early 2014! And indeed, a quick search for the package via the Red Hat package search showed a weird result: python-psutil was never part of base RHEL 7! It was only shipped as part of some very, very old OpenStack channels:

access.redhat.com package search, results for python-psutil

Newer OpenStack channels in fact come along with newer versions of python-psutil.

So how did this outdated package end up on this RHEL 7 image? Why was it never updated?

The cloud image is to blame! The package was installed on it – most likely during the creation of the image: python-psutil is needed for OpenStack Heat, so I assume that these RHEL 7 images where once created via OpenStack and then used as the default image in this demo environment.

And after the initial creation of the image the Heat packages were forgotten. In the meantime the image was updated to newer RHEL versions, snapshots were created as new defaults and so on. But since the package in question was never part of the main RHEL repos, it was never changed or removed. It just stayed there. Waiting, apparently, for me 😉

Conclusion

This issue showed me how tricky cloud images can be. Think about your own cloud images: have you really checked all all of them and verified that no package, no start up script, no configuration was changed from the Linux distribution vendor’s base setup?

With RPMs this is still manageable, you can track if packages are installed which are not present in the existing channels. But did someone install something with pip? Or any other way?

Take my case: an outdated version of a library was called instead of a much, much more recent one. If there would have been a serious security issue with the library in the meantime, I would have been exposed although my update management did not report any library to be updated.

I learned my lesson to be more critical with cloud images, checking them in more detail in the future to avoid having nasty surprises during production. And I can just recommend that you do that as well.

[Howto] Launch traefik as a docker container in a secure way

Traefik is a great reverse proxy solution, and a perfect tool to direct traffic in container environments. However, to do that, it needs access to docker – and that is very dangerous and must be tightly secured!

The problem: access to the docker socket

Containers offer countless opportunities to improve the deployment and management of services. However, having multiple containers on one system, often re-deploying them on the fly, requires a dynamic way of routing traffic to them. Additionally, there might be reasons to have a front end reverse proxy to sort the traffic properly anyway.

In comes traefik – “the cloud native edge router”. Among many supported backends it knows how to listen to docker and create dynamic routes on the fly when new containers come up.

To do so traefik needs access to the docker socket. Many people decide to just provide that as a volume to traefik. This usually does not work because SELinux prevents it for a reason. The apparent workaround for many is to run traefik in a privileged container. But that is a really bad idea:

Docker currently does not have any Authorization controls. If you can talk to the docker socket or if docker is listening on a network port and you can talk to it, you are allowed to execute all docker commands. […]
At which point you, or any user that has these permissions, have total control on your system.

http://www.projectatomic.io/blog/2014/09/granting-rights-to-users-to-use-docker-in-fedora/

The solution: a docker socket proxy

But there are ways to securely provide traefik the access it needs – without exposing too much permissions. One way is to provide limited access to the docker socket via tcp via another container which cannot be reached from the outside that easily.

Meet Tecnativa’s docker-socket-proxy:

What?
This is a security-enhaced proxy for the Docker Socket.
Why?
Giving access to your Docker socket could mean giving root access to your host, or even to your whole swarm, but some services require hooking into that socket to react to events, etc. Using this proxy lets you block anything you consider those services should not do.

https://github.com/Tecnativa/docker-socket-proxy/blob/master/README.md

It is a container which connects to the docker socket and exports the API features in a secured and configurable way via TCP. At container startup it is configured with booleans to which API sections access is granted.

So basically you set up a docker proxy to support your proxy for docker containers. Well…

How to use it

The docker socket proxy is a container itself. Thus it needs to be launched as a privileged container with access to the docker socket. Also, it must not publish any ports to the outside. Instead it should run on a dedicated docker network shared with the traefik container. The Ansible code to launch the container that way is for example:

- name: ensure privileged docker socket container
  docker_container:
    name: dockersocket4traefik
    image: tecnativa/docker-socket-proxy
    log_driver: journald
    env:
      CONTAINERS: 1
    state: started
    privileged: yes
    exposed_ports:
      - 2375
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:z"
    networks:
      - name: dockersocket4traefik_nw

Note the env right in the middle: that is where the exported permissions are configured. CONTAINERS: 1  provides access to container relevant information. There are also SERVICES: 1 and SWARM: 1 to manage access to docker services and swarm.

Traefik needs to have access to the same network. Also, the traefik configuration needs to point to the docker container via tcp:

[docker]
endpoint = "tcp://dockersocket4traefik:2375"

Conclusion

This setup works surprisingly easy. And it allows traefik to access the docker socket for the things it needs without exposing critical permissions to take over the system. At the same time, full access to the docker socket is restricted to a non-public container, which makes it harder for attackers to exploit it.

If you have a simple container setup and use Ansible to start and stop the containers, I’ve written a role to get the above mentioned setup running.