[Short Tip] Flatten nested dict/list structures in Ansible with json_query

A few days ago I was asked how to best deal with structures in Ansible which are mixing dictionaries and lists. json_query can help here!

Ansible Logo

A few days ago I was asked how to best deal with structures in Ansible which are mixing dictionaries and lists. Basically, the following example was provided and the questioned remained how to deal with this – for example how to flatten it:

    myhash:
      cloud1:
        region1:
          - name: "city1"
          - size: "large"
          - param: "alpha"
        region2:
          - name: "city2"
          - size: "small"
          - param: "beta"
      cloud2:
        region1:
          - name: "city1"
          - size: "large"
          - param: "gamma"

I was wondering a lot how to deal with this – after all dict2items only deals with dicts and fails when it reaches the lists in there. I also fooled around with the map filter, but most of my results also required some previous knowledge about the data structure, were only acting by providing “cloud1.region1” or similar.

The solution was the json_query filter: it is based on jmespath and can deal with the above mentioned structure by list and object projections:

  tasks:
  - name: Projections using json_query
    debug:
      msg: "Item value is: {{ item }}"
    loop: "{{ myhash|json_query(projection_query)|list }}"
    vars:
      projection_query: "*.*[]"

And indeed, the loop does create a simplified output of all the elements in this nested structure:

TASK [Projections using json_query] **********************************************************
ok: [localhost] => (item=[{'name': 'city1'}, {'size': 'large'}, {'param': 'alpha'}]) => {
    "msg": "Item value is: [{'name': 'city1'}, {'size': 'large'}, {'param': 'alpha'}]"
}
ok: [localhost] => (item=[{'name': 'city2'}, {'size': 'small'}, {'param': 'beta'}]) => {
    "msg": "Item value is: [{'name': 'city2'}, {'size': 'small'}, {'param': 'beta'}]"
}
ok: [localhost] => (item=[{'name': 'city1'}, {'size': 'large'}, {'param': 'gamma'}]) => {
    "msg": "Item value is: [{'name': 'city1'}, {'size': 'large'}, {'param': 'gamma'}]"
}

Of course, some knowledge is still needed to make this work: you need to know if you are projecting on a list or on a dictionary. So if your data structure changes on that level between executions, you might need something else.

Image by Andrew Martin from Pixabay

[Howto] Using the new Podman API

Podman is a daemonless container engine to develop, run and manage OCI containers. In a recent version the API was rewritten and now offers a REST interface as well as a docker compatible endpoint.

Podman is a daemonless container engine to develop, run and manage OCI containers. In a recent version the API was rewritten and now offers a REST interface as well as a docker compatible endpoint.

In case you never heard of Podman before, it is certainly worth a look. Besides offering a more secure drop-in-replacement for many docker functions, it can also manage pods and thus provides a container experience more aligned with what Kubernetes uses. It even can understand Kubernetes yaml (see podman-play-kube), easing the transition from single host container development over to fully fledged container management environments. Last but not least it is among the tools supporting newest features in the container space like cgroups v2.

Background: Podman API

Of course Podman is not perfect – due to the focus on Kubernetes yaml there is no support for docker-compose files (though alternatives exist), networking and routing based on names is not as simple as on Docker (read more about Podman container networking) and last but not least, the API was different – making it hard to migrate solutions dependent on the docker API.

This changed: recently, a new API was merged:

The new API is a simpler implementation based on HTTP/REST. We provide two basic groups of endpoints. The first one is for libpod; the second is for Docker compatibility, to ease adoption. 

New API coming for Podman

So how can I access the new API and fool around with it?

If you are familiar with Podman, or read carefully, the first question is: where is this API running if Podman is daemonless? And in fact, an API service needs to be started explicitly:

$ podman system service --timeout 5000

This starts the API on a UNIX socket. Other options, like a TCP socket or to run this without a timeout are also possible, the documentation provides examples.

How to use the Docker API endpoint

Let’s use the Docker API endpoint. To talk to a UNIX socket based REST API a recent curl (version >= 7.40) is quite helpful:

$ curl --unix-socket /$XDG_RUNTIME_DIR/podman/podman.sock http://localhost/images/json
[{"Containers":1,"Created":1583300892,"Id":"8c2e0da7c436e45be5ebf2adf26b41d13939190bd186214a4d45c30485071f9f","Labels":{"license":"MIT","name":"fedora","vendor":"Fedora Project","version":"31"},"ParentId":...

Note that here we are speaking to the rootless container, thus the unix domain socket is in the user runtime directory. Also, localhost has to be provided in the URL for very recent curl versions, otherwise it does not output anything!

The answer is a JSON listing, which is not easily readable. Simplify it with the help of Python (and silence curl info with the silent flag):

$ curl -s --unix-socket /$XDG_RUNTIME_DIR/podman/podman.sock http://localhost/containers/json|python -m json.tool
[
    {
        "Id": "4829e030ab1beb83db07dbc5e51481cb66562f57b79dd9eb3069dfcde91019ed",
        "Names": [
            "/87faf76aea6a-infra"
...

So what can you do with the API? Podman tries to recreate most of the docker API, so you can basically use the docker API documentation to see what should be possible. Note though that not all API endpoints are supported since Podman does not provide all functions Docker offers.

How to use the Podman API endpoint

As mentioned the API does provide two endpoints: the Docker endpoint, and a Podman specific endpoint. This second API is necessary for multiple reasons: first, Podman has functions which are alien to Docker and thus not part of the Docker API. The pod function is the most notable here. Another reason is that an independent API enables the Podman developers to further innovate in their own way and velocity, and to change the API when needed or wanted.

The API for Podman can be reached via curl as mentioned above. However, there are two notable differences: first, the Podman endpoint is marked via an additional “podman” string in the API URI, and second the Podman API is always versioned. To list the images as shown above, but via podman’s own API, the following call is necessary:

$ curl -s --unix-socket /$XDG_RUNTIME_DIR/podman/podman.sock http://localhost/v1.24/libpod/images/json
[{"Id":"8c2e0da7c436e45be5ebf2adf26b41d13939190bd186214a4d45c30485071f9f","RepoTags":["registry.fedoraproject.org/fedora:latest"],"Created":1583300892,"Size":199632198,"Labels":{"license":"MIT","name":"fedora","vendor":"Fedora ...

For pods, the endpoint is for example /pods instead of /images:

$ curl -s --unix-socket /$XDG_RUNTIME_DIR/podman/podman.sock http://localhost/v1.24/libpod/pods/json|python -m json.tool
[
    {
        "Cgroup": "user.slice",
        "Containers": [
            {
                "Id": "1510dca23d2d15ae8be1eeadcdbfb660cbf818a69d5780705cd6535d97a4a578",
                "Names": "wonderful_ardinghelli",
                "Status": "running"
            },
            {
                "Id": "6c05c20a42e6987ac9f78b277a9d9152ab37dd05e3bfd5ec9e675979eb93bf0e",
                "Names": "eff81a37b4b8-infra",
                "Status": "running"
            }
        ],
        "Created": "2020-04-19T21:45:17.838549003+02:00",
        "Id": "eff81a37b4b85e92916613239001cddc2ba42f3595236586f7462492be0ac5fc",
        "InfraId": "6c05c20a42e6987ac9f78b277a9d9152ab37dd05e3bfd5ec9e675979eb93bf0e",
        "Name": "testme",
        "Namespace": "",
        "Status": "Running"
    }
]

Currently there is no documentation of the API available – or at least none of the level of the current Docker API documentation. But hopefully that will change soon.

Takeaways

Podman providing a Docker API is a great step for people who are dependent on the Docker API but nevertheless want switch to Podman. But providing a unique, but simple to consume REST API for Podman itself is equally great because it makes it easy to integrate Podman processes into existing tools and frameworks.

Just don’t forget that the API is still in development!

Featured image by Magnascan from Pixabay

Getting Started with Ansible Security Automation: Investigation Enrichment

Last November we introduced Ansible security automation as our answer to the lack of integration across the IT security industry. Let’s have a closer look at one of the scenarios where Ansible can facilitate typical operational challenges of security practitioners.

Last November we introduced Ansible security automation as our answer to the lack of integration across the IT security industry. Let’s have a closer look at one of the scenarios where Ansible can facilitate typical operational challenges of security practitioners.

A big portion of security practitioners’ daily activity is dedicated to investigative tasks. Enrichment is one of those tasks, and could be both repetitive and time-consuming, making it a perfect candidate for automation. Streamlining these processes can free up their analysts to focus on more strategic tasks, accelerate the response in time-sensitive situations and reduce human errors. However, in many large organizations , the multiple security solutions aspect of these activities are not integrated with each other. Hence, different teams may be in charge of different aspects of IT security, sometimes with no processes in common.

That often leads to manual work and interaction between people of different teams which can be error-prone and above all, slow. So when something suspicious happens and further attention is needed, security teams spend a lot of valuable time operating on many different security solutions and coordinating work with other teams, instead of focusing on the suspicious activity directly.

In this blog post we have a closer look at how Ansible can help to overcome these challenges and support investigation enrichment activities. In the following example we’ll see how Ansible can be used to enable programmatic access to information like logs coming from technologies that may not be integrated into a SIEM. As an example we’ll use enterprise firewalls and intrusion detection and protection systems (IDPS).

Simple Demo Setup

To showcase the aforementioned scenario we created a simplified, very basic demo setup to showcase the interactions. This setup includes two security solutions providing information about suspicious traffic, as well as a SIEM: we use a Check Point Next Generation Firewall (NGFW) and a Snort IDPS as security solutions providing information. The SIEM to gather and analyze those data is IBM QRadar.

Also, from a machine called “attacker” we will simulate a potential attack pattern on the target machine on which the IDPS is running.

Roland blog 1

This is just a basic demo setup, a real world setup of an Ansible security automation integration would look different, and can feature other vendors and technologies.

Logs: crucial, but distributed

Now imagine you are a security analyst in an enterprise. You were just informed of an anomaly in an application, showing  suspicious log activities. For example, we have a little demo where we curl a certain endpoint of the web server which we conveniently called “web_attack_simulation”:

$ sudo grep web_attack /var/log/httpd/access_log
172.17.78.163 - - [22/Sep/2019:15:56:49 +0000] "GET /web_attack_simulation HTTP/1.1" 200 22 "-" "curl/7.29.0"
...

As a security analyst you know that anomalies can be the sign of a potential threat. You have to determine if this is a false positive, that can be simply dismissed or an actual threat which requires a series of remediation activities to be stopped. Thus you need to collect more data points – like from the firewall and the IDS. Going through the logs of the firewall and IDPS manually takes a lot of time. In large organizations, the security analyst might not even have the necessary access rights and needs to contact the teams that each are responsible for both the enterprise firewall and the IDPS, asking them to manually go through the respective logs and directly check for anomalies on their own and then reply with the results. This could imply a phone call, a ticket, long explanations, necessary exports or other actions consuming valuable time.

It is common in large organisations to centralise event management on a SIEM and use it as the primary dashboard for investigations. In our demo example the SIEM is QRadar, but the steps shown here are valid for any SIEM. To properly analyze security-related events there are multiple steps necessary: the security technologies in question – here the firewall and the IDPS – need to be configured to stream their logs to the SIEM in the first place. But the SIEM also needs to be configured to help ensure that those logs are parsed in the correct way and meaningful events are generated. Doing this manually is time-intensive and requires in-depth domain knowledge. Additionally it might require privileges a security analyst does not have.

But Ansible allows security organizations to create pre-approved automation workflows in the form of playbooks. Those can even be maintained centrally and shared across different teams to enable security workflows at the press of a button. 

Why don’t we add those logs to QRadar permanently? This could create alert fatigue, where too much data in the system generates too many events, and analysts might miss the crucial events. Additionally, sending all logs from all systems easily consumes a huge amount of cloud resources and network bandwidth.

So let’s write such a playbook to first configure the log sources to send their logs to the SIEM. We start the playbook with Snort and configure it to send all logs to the IP address of the SIEM instance:

---
- name: Configure snort for external logging
  hosts: snort
  become: true
  vars:
    ids_provider: "snort"
    ids_config_provider: "snort"
    ids_config_remote_log: true
    ids_config_remote_log_destination: "192.168.3.4"
    ids_config_remote_log_procotol: udp
    ids_install_normalize_logs: false

  tasks:
    - name: import ids_config role
      include_role:
        name: "ansible_security.ids_config"

Note that here we only have one task, which imports an existing role. Roles are an essential part of Ansible, and help in structuring your automation content. Roles usually encapsulate the tasks and other data necessary for a clearly defined purpose. In the case of the above shown playbook, we use the role ids_config, which manages the configuration of various IDPS. It is provided as an example by the ansible-security team. This role, like others mentioned in this blog post, are provided as a guidance to help customers that may not be accustomed to Ansible to become productive faster. They are not necessarily meant as a best practise or a reference implementation.

Using this role we only have to note a few parameters, the domain knowledge of how to configure Snort itself is hidden away. Next, we do the very same thing with the Check Point firewall. Again an existing role is re-used, log_manager:

- name: Configure Check Point to send logs to QRadar
  hosts: checkpoint

  tasks:
    - include_role:
        name: ansible_security.log_manager
        tasks_from: forward_logs_to_syslog
      vars:
        syslog_server: "192.168.3.4"
        checkpoint_server_name: "gw-2d3c54"
        firewall_provider: checkpoint

With these two snippets we are already able to reach out to two security solutions in an automated way and reconfigure them to send their logs to a central SIEM.

We can also automatically configure the SIEM to accept those logs and sort them into corresponding streams in QRadar:

- name: Add Snort log source to QRadar
  hosts: qradar
  collections:
    - ibm.qradar

  tasks:
    - name: Add snort remote logging to QRadar
      qradar_log_source_management:
        name: "Snort rsyslog source - 192.168.14.15"
        type_name: "Snort Open Source IDS"
        state: present
        description: "Snort rsyslog source"
        identifier: "ip-192-168-14-15"

- name: Add Check Point log source to QRadar
  hosts: qradar
  collections:
    - ibm.qradar

  tasks:
    - name: Add Check Point remote logging to QRadar
      qradar_log_source_management:
        name: "Check Point source - 192.168.23.24"
        type_name: "Check Point FireWall-1"
        state: present
        description: "Check Point log source"
        identifier: "192.168.23.24"

Here we do use Ansible Content Collections: the new method of distributing, maintaining and consuming automation content. Collections can contain roles, but also modules and other code necessary to enable automation of certain environments. In our case the collection for example contains a role, but also the necessary modules and connection plugins to interact with QRadar.

Without any further intervention by the security analyst, Check Point logs start to appear in the QRadar log overview. Note that so far no logs are sent from Snort to QRadar: Snort does not know yet that this traffic is noteworthy! We will come to this in a few moments.

roland blog 2

Remember, taking the perspective of a security analyst: now we have more data at our disposal. We have a better understanding of what could be the cause of the anomaly in the application behaviour. Logs from the firewall are shown, who is sending traffic to whom. But this is still not enough data to fully qualify what is going on.

Fine-tuning the investigation

Given the data at your disposal you decide to implement a custom signature on the IDPS to get alert logs if a specific pattern is detected.

In a typical situation, implementing a new rule would require another interaction with the security operators in charge of Snort who would likely have to manually configure multiple instances. But luckily we can again use an Ansible Playbook to achieve the same goal without the need for time consuming manual steps or interactions with other team members.

There is also the option to have a set of playbooks for customer specific situations pre-create. Since the language of Ansible is YAML, even team members with little knowledge can contribute to the playbooks, making it possible to have agreed upon playbooks ready to be used by the analysts.

Again we reuse a role, ids_rule. Note that this time some  understanding of Snort rules is required to make the playbook work. Still, the actual knowledge of how to manage Snort as a service across various target systems is shielded away by the role.

---
- name: Add Snort rule
  hosts: snort
  become: yes

  vars:
    ids_provider: snort

  tasks:
    - name: Add snort web attack rule
      include_role:
        name: "ansible_security.ids_rule"
      vars:
        ids_rule: 'alert tcp any any -> any any (msg:"Attempted Web Attack"; uricontent:"/web_attack_simulation"; classtype:web-application-attack; sid:99000020; priority:1; rev:1;)'
        ids_rules_file: '/etc/snort/rules/local.rules'
        ids_rule_state: present

Finish the offense

Moments after the playbook is executed, we can check in QRadar if we see alerts. And indeed, in our demo setup this is the case:

roland blog 3

With this  information on  hand, we can now finally check all offenses of this type, and verify that they are all coming only from one single host – here the attacker.

From here we can move on with the investigation. For our demo we assume that the behavior is intentional, and thus close the offense as false positive.

Rollback!

Last but not least, there is one step which is often overlooked, but is crucial: rolling back all the changes! After all, as discussed earlier, sending all logs into the SIEM all the time is resource-intensive.

With Ansible the rollback is quite easy: basically the playbooks from above can be reused, they just need to be slightly altered to not create log streams, but remove them again. That way, the entire process can be fully automated and at the same time  made as resource friendly as possible.

Takeaways and where to go next

It happens that the job of a CISO and her team is difficult even if they have in place all necessary tools, because the tools don’t integrate with each other. When there is a security threat, an analyst has to perform an investigation, chasing all relevant pieces of information across the entire infrastructure, consuming valuable time to understand what’s going on and ultimately perform any sort of remediation.

Ansible security automation is designed to help enable integration and interoperability of security technologies to support security analysts’ ability to investigate and remediate security incidents faster.

As next steps there are plenty of resources to follow up on the topic:

Credits

This post was originally released on ansible.com/blog: GETTING STARTED WITH ANSIBLE SECURITY AUTOMATION: INVESTIGATION ENRICHMENT

Header image by Alexas_Fotos from Pixabay.

Ansible and Ansible Tower special variables

Ansible and Ansible Tower provide a powerful variable system. At the same time, there are some variables reserved to one or the other, which cannot be used by others, but can be helpful. This post lists all reserved and magic variables and also important keywords.

Ansible Logo

Ansible and Ansible Tower provide a powerful variable system. At the same time, there are some variables reserved to one or the other, which cannot be used by others, but can be helpful. This post lists all reserved and magic variables and also important keywords.

Ansible Variables

Variables in Ansible are a powerful tool to influence and control your automation execution. In fact, I’ve dedicated a fare share of posts to the topic over the years:

The official documentation of Ansible variables is also quite comprehensive.

The variable system is in fact so powerful that Ansible uses it itself. There are certain variables which are reserved, the so called magic variables.

The given documentation lists many of them – but is missing the Tower ones. For that reason this post list all magic variables in Ansible and Ansible Tower with references to more information.

Note that the variables and keywords might be different for different Ansible versions. The lists provided here are for Ansible 2.8 which is the current release and als shipped in Fedora – and Tower 3.4/3.5.

Reserved & Magic Variables

Magic Variables

The following list shows true magic variables. They are reserved internally and are overwritten by Ansible if needed. A “(D)” highlights that the variable is deprecated.

ansible_check_mode
ansible_dependent_role_names
ansible_diff_mode
ansible_forks
ansible_inventory_sources
ansible_limit
ansible_loop
ansible_loop_var
ansible_play_batch (D)
ansible_play_hosts (D)
ansible_play_hosts_all
ansible_play_role_names
ansible_playbook_python
ansible_role_names
ansible_run_tags
ansible_search_path
ansible_skip_tags
ansible_verbosity
ansible_version
group_names
groups
hostvars
inventory_hostname
inventory_hostname_short
inventory_dir
inventory_file
omit
play_hosts (D)
ansible_play_name
playbook_dir
role_name
role_names
role_path

Source: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html

Facts

Facts are not magic variables because they are not internal. But they are collected during facts gathering or execution of the setup module, so it helps to keep them in mind. There are two “main” variables related to facts, and a lot of other variables depending on what the managed node has to offer. Since those are different from system to system, it is tricky to list them all. But they can be easily identified by the leading ansible_.

ansible_facts
ansible_local
ansible_*

Sources: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html & docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html

Connection Variables

Connection variables control the way Ansible connects to target machines: what connection plugin to use, etc.

ansible_become_user
ansible_connection
ansible_host
ansible_python_interpreter
ansible_user

Soource: docs.ansible.com/ansible/latest/reference_appendices/special_variables.html

Ansible Tower

Tower has its own set of magic variables which are used internally to control the execution of the automation. Note that those variables can optionally start with awx_ instead of tower_.

tower_job_id
tower_job_launch_type
tower_job_template_id
tower_job_template_name
tower_user_id
tower_user_name
tower_schedule_id
tower_schedule_name
tower_workflow_job_id
tower_workflow_job_name

Source: docs.ansible.com/ansible-tower/latest/html/userguide/job_templates.html

Keywords

Keywords are strictly speaking not variables. In fact, you can even set a variable named as a key word. Instead, they are the parts of a playbook that make a playbook work: think of the keys hosts, tasks, name or even the parameters of a module.

It is just important to keep those keywords in mind – and it certainly helps when you name your variables in a way that they are not mixed up with keywords by chance.

The following lists shows all keywords by where they can appear. Note that some keywords are listed multiple times because they can be used at different places.

Play

any_errors_fatal
become
become_flags
become_method
become_user
check_mode
collections
connection
debugger
diff
environment
fact_path
force_handlers
gather_facts
gather_subset
gather_timeout
handlers
hosts
ignore_errors
ignore_unreachable
max_fail_percentage
module_defaults
name
no_log
order
port
post_tasks
pre_tasks
remote_user
roles
run_once
serial
strategy
tags
tasks
vars
vars_files
vars_prompt

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Role

any_errors_fatal
become
become_flags
become_method
become_user
check_mode
collections
connection
debugger
delegate_facts
delegate_to
diff
environment
ignore_errors
ignore_unreachable
module_defaults
name
no_log
port
remote_user
run_once
tags
vars
when

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Block

always
any_errors_fatal
become
become_flags
become_method
become_user
block
check_mode
collections
connection
debugger
delegate_facts
delegate_to
diff
environment
ignore_errors
ignore_unreachable
module_defaults
name
no_log
port
remote_user
rescue
run_once
tags
vars
when

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Task

action
any_errors_fatal
args
async
become
become_flags
become_method
become_user
changed_when
check_mode
collections
connection
debugger
delay
delegate_facts
delegate_to
diff
environment
failed_when
ignore_errors
ignore_unreachable
local_action
loop
loop_control
module_defaults
name
no_log
notify
poll
port
register
remote_user
retries
run_once
tags
until
vars
when
with_<lookup_plugin>

Source: docs.ansible.com/ansible/latest/reference_appendices/playbooks_keywords.html

Of debugging Ansible Tower and underlying cloud images

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Ansible Logo

Recently I was experimenting with Tower’s isolated nodes feature – but somehow it did not work in my environment. Debugging told me a lot about Ansible Tower – and also why you should not trust arbitrary cloud images.

Background – Isolated Nodes

Ansible Tower has a nice feature called “isolated nodes”. Those are dedicated Tower instances which can manage nodes in separated environments – basically an Ansible Tower Proxy.

An Isolated Node is an Ansible Tower node that contains a small piece of software for running playbooks locally to manage a set of infrastructure. It can be deployed behind a firewall/VPC or in a remote datacenter, with only SSH access available. When a job is run that targets things managed by the isolated node, the job and its environment will be pushed to the isolated node over SSH, where it will run as normal.

Ansible Tower Feature Spotlight: Instance Groups and Isolated Nodes

Isolated nodes are especially handy when you setup your automation in security sensitive environments. Think of DMZs here, of network separation and so on.

I was fooling around with a clustered Tower installation on RHEL 7 VMs in a cloud environment when I run into trouble though.

My problem – Isolated node unavailable

Isolated nodes – like instance groups – have a status inside Tower: if things are problematic, they are marked as unavailable. And this is what happened with my instance isonode.remote.example.com running in my lab environment:

Ansible Tower showing an instance node as unavailable

I tried to turn it “off” and “on” again with the button in the control interface. It made the node available, it was even able to executed jobs – but it became quickly unavailable soon after.

Analysis

So what happened? The Tower logs showed a Python error:

# tail -f /var/log/tower/tower.log
fatal: [isonode.remote.example.com]: FAILED! => {"changed": false,
"module_stderr": "Shared connection to isonode.remote.example.com
closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n
File \"/var/lib/awx/.ansible/tmp/ansible-tmp-1552400585.04
-60203645751230/AnsiballZ_awx_capacity.py\", line 113, in <module>\r\n
_ansiballz_main()\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 105, in
_ansiballz_main\r\n    invoke_module(zipped_mod, temp_path,
ANSIBALLZ_PARAMS)\r\n  File \"/var/lib/awx/.ansible/tmp/ansible-tmp
-1552400585.04-60203645751230/AnsiballZ_awx_capacity.py\", line 48, in
invoke_module\r\n    imp.load_module('__main__', mod, module, MOD_DESC)\r\n
File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 74, in
<module>\r\n  File \"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\",
line 60, in main\r\n  File
\"/tmp/ansible_awx_capacity_payload_6p5kHp/__main__.py\", line 27, in
get_cpu_capacity\r\nAttributeError: 'module' object has no attribute
'cpu_count'\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}

PLAY RECAP *********************************************************************
isonode.remote.example.com : ok=0    changed=0    unreachable=0    failed=1  

Apparently a Python function was missing. If we check the code we see that indeed in line 27 of file awx_capacity.py the function psutil.cpu_count() is called:

def get_cpu_capacity():
    env_forkcpu = os.getenv('SYSTEM_TASK_FORKS_CPU', None)
    cpu = psutil.cpu_count()

Support for this function was added in version 2.0 of psutil:

2014-03-10
Enhancements
424: [Windows] installer for Python 3.X 64 bit.
427: number of logical and physical CPUs (psutil.cpu_count()).

psutil history

Note the date here: 2014-03-10 – pretty old! I check the version of the installed package, and indeed the version was pre-2.0:

$ rpm -q --queryformat '%{VERSION}\n' python-psutil
1.2.1

To be really sure and also to ensure that there was no weird function backporting, I checked the function call directly on the Tower machine:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect
>>> import psutil as module
>>> functions = inspect.getmembers(module, inspect.isfunction)
>>> functions
[('_assert_pid_not_reused', <function _assert_pid_not_reused at
0x7f9eb10a8d70>), ('_deprecated', <function deprecated at 0x7f9eb38ec320>),
('_wraps', <function wraps at 0x7f9eb414f848>), ('avail_phymem', <function
avail_phymem at 0x7f9eb0c32ed8>), ('avail_virtmem', <function avail_virtmem at
0x7f9eb0c36398>), ('cached_phymem', <function cached_phymem at
0x7f9eb10a86e0>), ('cpu_percent', <function cpu_percent at 0x7f9eb0c32320>),
('cpu_times', <function cpu_times at 0x7f9eb0c322a8>), ('cpu_times_percent',
<function cpu_times_percent at 0x7f9eb0c326e0>), ('disk_io_counters',
<function disk_io_counters at 0x7f9eb0c32938>), ('disk_partitions', <function
disk_partitions at 0x7f9eb0c328c0>), ('disk_usage', <function disk_usage at
0x7f9eb0c32848>), ('get_boot_time', <function get_boot_time at
0x7f9eb0c32a28>), ('get_pid_list', <function get_pid_list at 0x7f9eb0c4b410>),
('get_process_list', <function get_process_list at 0x7f9eb0c32c08>),
('get_users', <function get_users at 0x7f9eb0c32aa0>), ('namedtuple',
<function namedtuple at 0x7f9ebc84df50>), ('net_io_counters', <function
net_io_counters at 0x7f9eb0c329b0>), ('network_io_counters', <function
network_io_counters at 0x7f9eb0c36500>), ('phymem_buffers', <function
phymem_buffers at 0x7f9eb10a8848>), ('phymem_usage', <function phymem_usage at
0x7f9eb0c32cf8>), ('pid_exists', <function pid_exists at 0x7f9eb0c32140>),
('process_iter', <function process_iter at 0x7f9eb0c321b8>), ('swap_memory',
<function swap_memory at 0x7f9eb0c327d0>), ('test', <function test at
0x7f9eb0c32b18>), ('total_virtmem', <function total_virtmem at
0x7f9eb0c361b8>), ('used_phymem', <function used_phymem at 0x7f9eb0c36050>),
('used_virtmem', <function used_virtmem at 0x7f9eb0c362a8>), ('virtmem_usage',
<function virtmem_usage at 0x7f9eb0c32de8>), ('virtual_memory', <function
virtual_memory at 0x7f9eb0c32758>), ('wait_procs', <function wait_procs at
0x7f9eb0c32230>)]

Searching for a package origin

So how to solve this issue? My first idea was to get this working by updating the entire code part to the multiprocessor lib:

# python
Python 2.7.5 (default, Sep 12 2018, 05:31:16) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing
>>> cpu = multiprocessing.cpu_count()
>>> cpu
4

But while I was filling a bug report I wondered why RHEL shipped such an ancient library. After all, RHEL 7 was released in June 2014, and psutil had cpu_count available since early 2014! And indeed, a quick search for the package via the Red Hat package search showed a weird result: python-psutil was never part of base RHEL 7! It was only shipped as part of some very, very old OpenStack channels:

access.redhat.com package search, results for python-psutil

Newer OpenStack channels in fact come along with newer versions of python-psutil.

So how did this outdated package end up on this RHEL 7 image? Why was it never updated?

The cloud image is to blame! The package was installed on it – most likely during the creation of the image: python-psutil is needed for OpenStack Heat, so I assume that these RHEL 7 images where once created via OpenStack and then used as the default image in this demo environment.

And after the initial creation of the image the Heat packages were forgotten. In the meantime the image was updated to newer RHEL versions, snapshots were created as new defaults and so on. But since the package in question was never part of the main RHEL repos, it was never changed or removed. It just stayed there. Waiting, apparently, for me 😉

Conclusion

This issue showed me how tricky cloud images can be. Think about your own cloud images: have you really checked all all of them and verified that no package, no start up script, no configuration was changed from the Linux distribution vendor’s base setup?

With RPMs this is still manageable, you can track if packages are installed which are not present in the existing channels. But did someone install something with pip? Or any other way?

Take my case: an outdated version of a library was called instead of a much, much more recent one. If there would have been a serious security issue with the library in the meantime, I would have been exposed although my update management did not report any library to be updated.

I learned my lesson to be more critical with cloud images, checking them in more detail in the future to avoid having nasty surprises during production. And I can just recommend that you do that as well.