[Short Tip] What not to forget when controlling Windows Servers via Ansible Tower

Ansible Logo

Ansible does support Windows with an entire set of modules. Thus it is also possible to run Ansible playbooks targeting Windows systems right from Ansible Tower. However, since Windows does works via WinRM and not SSH, the appropriate variables must be set in the definition of the inventory of the machine:

WindowsInventoryAnsible

The given screenshot shows the variables for Ansible 1.9. For 2.0 and above the variables are a bit different. Also, keep in mind that you might need to create an additional set of credentials.

[Howto] Introduction to Ansible variables

Ansible Logo To become more flexible, Ansible offers the possibility to use variables in loops, but also to use information the target system provides.

Background

Ansible uses variables to enable more flexibility in playbooks and roles. They can be used to loop through a set of given values, access various information like the hostname of a system and replace certain strings in templates by system specific values. Variables are provided through the inventory, by variable files, overwritten on the command line and set in Tower.

But note that variable names have some restrictions in Ansible:

Variable names should be letters, numbers, and underscores. Variables should always start with a letter.

If the variable in question is a dictionary, a key/value pair, it can be referenced both by bracket and dot notation:

foo.bar
foo['bar']

The question is however: how to use variables? And how to get them in the first place?

Variables and loops

Loops are arguably among the most common use cases of variables in general – the same is true for Ansible. While they are not using variables provided externally, I’d like to start with loops to give a first idea how to use them. To copy a set of files it is either possible to write a task for each file or to just loop through them:

tasks:
  - name: copy files
    copy: src={{ item }} dest=/tmp/{{ item }}
    with_items:
      - alha
      - beta

This already shows simple basics: variables can be used in module arguments, and they are referenced via curly brackets {{ }}.

Variables and templates

More importantly, variables can be used to substitute keys in configuration files with system or run-time specific values. Imagine a file which containing the actual host name simply copied over from a central source to the clients. In case of one hundred machines there would be one hundred copies of the configuration file on the server, all almost identical except for the corresponding host name. That is a waste of time and space.

It’s better to have one copy of the configuration file, with a variable as place-maker instead of the host names – this is where templates come into play:

$ cat template.j2
My host name is {{ ansible_hostname }}.

The Ansible module to use templates and activate variable substitution is the template module:

tasks:
  - name: copy template
    template: src=template.j2 dest="/tmp/abcapp.conf"

When this task is run against a set of systems, the all get a file called abcapp.conf containing the individual host name of the given system.

This is again a simple example, but shows the basics: in templates variables are also referenced by {{ }}. Templates can become much more sophisticated, but I will try to cover that in another post.

Using variables in conditions

Variables can be used as conditions, thus ensuring that certain tasks are only run when for example on a given host the requested variable is set to a certain value:

tasks:
  - name: install Apache on Solaris
    pkg5: name=web/server/apache-24
    when: ansible_os_family == "Solaris"

  - name: install Apache on RHEL
    yum:  name=httpd
    when: ansible_os_family == "RedHat"

In this case, the first task is only applied on Red Hat Enterprise Linux servers, while the second is only run on Solaris machines.

Getting variables from the system

Using variables is one thing, but they need to be defined first. I have covered so far some use cases of variables, but not where to get them.

Ansible already defines a rich set of variables, individual for each system: whenever Ansible is run on a system all facts and information about the system are gathered and set as a variable. The available variables can be output via the setup module:

$ ansible neon -m setup
neon | success >> {
    "ansible_facts": {
        "ansible_all_ipv4_addresses": [
            "192.168.122.203"
        ], 
        "ansible_all_ipv6_addresses": [
            "fe80::5054:ff:feba:9db3"
        ], 
        "ansible_architecture": "x86_64", 
        "ansible_bios_date": "04/01/2014", 
...

All these variables can be used in templates, but also in the playbooks themselves as mentioned above in the conditions example. Roles can take advantage of these variables as well, of course.

Getting variables from the command line

Another way to define variables is to call Ansible playbooks with the option --extra-vars:

$ ansible-playbook --extra-vars "cli_var=production" system-setup.yml

The reference is again via the {{ }} brackets:

$ cat template.j2
environment: {{ cli_var }}.

Setting variables in playbooks

A more direct way is to define variables in playbooks directly, with the key vars:

...
hosts: all 
vars:
  play_var: bar 

tasks:
...

This can be extended by including an actual yaml file containing variables:

...
hosts: all
include_vars: setupvariables.yml

tasks:
...

Including further files with more variables comes in extremly handy when the variables for each system are kept in a system specific file, like $HOSTNAME.yml, because the included variable file can again be a variable:

...
hosts: all
include_vars: "{{ ansible_hostname }}.yml"

tasks:
...

In this case, the playbook reads in specific variables for each corresponding host – this way it is possible to set for example different virtual host names for Apache servers, or different backup systems in case of separated data centers.

Setting variables in the inventory

Sometimes it might make more sense to define specific variables in the already set up inventory:

[clients]
helium intevent_var=helium_123
neon invent_var=bar

The inventory more or less treats any argument which is not Ansible specific as a host variable.

It is also possible to set variables for entire host groups:

[clients]
helium
neon

[clients:vars]
invent_var=group-foo

Setting variables in the inventory makes sense when different teams use the same set of Ansible roles and playbooks but have different machine setups which need specific treatment at each location.

Setting variables on a system: local facts

Variables can also be set by dumping specialized files onto the system: when Ansible accesses a remote system it checks for the directory /etc/ansible/facts.d and all files ending in .fact are read. The files can be of various formats – INI, JSON, or even executables which return JSON code. For example:

$ cat /etc/ansible/facts.d/variables.fact 
[system]
foo=bar
dim=dum

The variables can be displayed via the setup module:

$ ansible neon -m setup
...
      "type": "loopback"
  }, 
  "ansible_local": {
      "variables": {
          "system": {
              "dim": "dum", 
              "foo": "bar"
          }
     }
  }, 
  "ansible_machine": "x86_64", 
...

Using the results of tasks: registered variables

The results of a task during a playbook run can also be stored inside a variable – together with conditionals this enables a playbook to react to the results of the given task with other tasks.

For example, given that an httpd service needs to run. If the service does not run, the entire server should be powered down immediately to ensure that no data corruption takes place. The corresponding playbook checks the httpd service, ignores errors but instead analyses the result and powers down the machine if the service cannot be started:

---
- name: register example
  hosts: all 
  sudo: yes

  tasks:
    - name: start service
      service: name=httpd state=started
      ignore_errors: True
      register: service_result

    - name: shutdown
      command: "shutdown -h +1m"
      when: service_result | failed

Another example would be to trigger a function to remove the host from the loadbalancer. Or to check if the database is available and would otherwise immediately close down the front firewall.

Accessing variables of other hosts

Last but not least it might be interesting to access the variables (read: facts) of other hosts. This can of course only be done if the facts are actually available.

If this is given, the data can be accessed via the hostvars key. The value for the key is the name given in the inventory.

Groups from within the inventory can also be accessed. That makes it possible to loop through a list of hosts in a group and for example gather all IP addresses or host names or other data.

The following template first accesses the host name of the machine “tower” and afterwards collects all the epoch data of all machines:

Tower name is: {{ hostvars['tower']['ansible_hostname'] }}
The epochs of the clients are:
{% for host in groups['clients'] %}
- {{ hostvars[host]['ansible_date_time']['epoch'] }}
{% endfor %}

The code for the loop is actual line statement of the Jinja2 template engine. The engine also offers if statements and other common operations.

This comes in handy in simple cases like filling the /etc/hosts, but can also empower admins to for example fill in the data of a loadbalancer or a firewall configuration.

Conclusion

Varialbes are a very powerful feature within Ansible to enrich the functionality it provides. Together with templates and the Jinja2 template and language engine the possibilities are almost endless. And sooner or later every admin will leave the simple Ansible calls and playbooks behind and start diving into variables, templates, and loops through variables of other hosts. To make system automation even easier.

[Short Tip] verify YAML in Shell via Python one-liner [Update]

python logo

Today the question came up how to verify YAML files easily. Of course, there are many very good online parser. But I was wondering if it is possible to do it simply in Bash/ZSH, using a Python one-liner. Here is the code:

$ python -c 'import yaml,sys;yaml.safe_load(sys.stdin)' < yamltest.txt

It throws an exception if the file is not a proper (aka importable) YAML file. Otherwise it just returns with a 0 exit code.

Please note that I am not sure how tolerant yaml.safe_load is. And note that PyYAML needs to be installed.

Update:
Updated to avoid cat abuse – safe the kittens! Thanks to ichor!

So you think offline systems need no updates?

offlineOften customers run offline systems and claim that such machines do not need updates since they are offline. But this is a fallacy: updates do not only close security holes but also deliver bug fixes – and they can be crucial.

Background

Recently a customer approached me with questions regarding an upgrade of a server. During the discussion, the customer mentioned that the system never got upgrades:

“It is an offline system, there is no need.”

That’s a common misconception. And a dangerous one.

Many people think that updates are only important to fix security issues, and that bugfixes are not really worth considering – after all, the machine works, right?

Wrong!

Software is never perfect. Errors happen. And while security issues might be uncomfortable, bugs in the program code can be a much more serious issue than “mere” security problems.

Example One: Xerox

To pick an example, almost each company out there has one type of system which hardly ever gets updated: copy machines. These days they are connected to the internet and can e-mail scanned documents. They are usually never updated, after all it just works, right?

In 2013 it was discovered that many Xerox WorkCentres had a serious software bug, causing them to alter scanned numbers. It took quite some weeks and analysis until finally a software update fixed the issue. During that time it turned out that the bug was at least 8 years old. So millions and millions of faulty scans have been produced over the years. In some cases the originals were destroyed in the meantime. It can hardly be estimated what impact that will have, but for sure it’s huge and will accompany us for a long time. And it was estimated that even today many scanners are still not patched – because it is not common to patch such systems. Offline, right?

So yes, a security issue might expose your data to the world. But it’s worse when the data is wrong to begin with.

Example two: Jails

Another example hit the news just recently: the US Washington State Department of Correction released inmates too early – due to a software bug. Again the software bug was present for years and years, releasing inmates too early all the time.

Example three: Valve

While Valve’s systems are often per definition online, the Valve Steam for Linux bug showed that all kinds of software can contain, well, all kinds of bugs: if you moved the folder of your Steam client, it could actually delete your entire (home) directory. Just like that. And again: this bug did not happen all the time, but only in certain situations and after quite some time.

# deletes your files when the variable is not set anymore
rm -rf "$STEAMROOT/"*

Example four: Office software

Imagine you have a bug in your calculating software – so that numbers are not processed or displayed correctly. The possible implications are endless. Two famous bugs which shows that bugfixes are worth considering are the MS Office multiplication bug from 2007 and the MS Office sum bug from a year later.

Example five: health

Yet another example surfaced in 2000 when a treatment planning system at a radiotherapy department was found to calculate wrong treatment times for patients and thus the patients were exposed to much more radiation than was good for them. It took quite some time until the bug was discovered – too lat for some patients whose

“deaths were probably radiation related”.

Conclusion

So, yes, security issues are harmful. They must be taken serious, and a solid and well designed security concept should be applied. Multiple layers, different zones, role based access, update often, etc.

But systems which are secured by air gaps need to be updated as well. The above mentioned examples do not show bugs in highly specific applications, but also in software components used in thousands and millions of machines. So administrators should at least spend few seconds reading into each update and check if its relevant. Otherwise you might ignore that you corrupt your data over years and years without realizing it – until its too late.

[Howto] Adopting Ansible Galaxy roles for Solaris

Ansible LogoIt is pretty easy to manage Solaris with Ansible. However, the Ansible roles available at Ansible Galaxy usually target Linux based OS only. Luckily, adopting them is rather simple.

Background

As mentioned earlier Solaris machines can be managed via Ansible pretty well: it works out of the box, and many already existing modules are incredible helpful in managing Solaris installations.

At the same time, the Ansible Best Practices guide strongly recommends using roles to organize your IT with Ansible. Many roles are already available at the Ansible Galaxy ready to be used by the admin in need. Ansible Galaxy is a central repository for various roles written by the community.

However, Ansible Galaxy only recently added support for Solaris. There are currently hardly any roles with Solaris platform support available.

Luckily expanding existing Ansible roles towards Solaris is not that hard.

Example: Apache role

For example, the Apache role from geerlingguy is one of the highest rated roles on Ansible Galaxy. It installs Apache, starts the service, has support for vhosts and custom ports and is above all pretty well documented. Yet, there is no Solaris support right now… Although geerlingguy just accepted a pull request, so it won’t be long until the new version will surface at Ansible Galaxy.

The best way to adopt a given role for another OS is to extend the current role for an additional OS – in contrast to deleting the original OS support an replacing it by new, again OS specific configuration. This keeps the role re-usable on other OS and enables the community to maintain and improve a shared, common role.

With a bit of knowledge about how services are started and stopped on Linux as well as on Solaris, one major difference quickly comes up: on Linux usually the name of the controlled service is the exact same name as the one of the binary behind the service. The same name string is also part of the path to the usr files of the program and for example to the configuration files. On Solaris that is often not the case!

So the best is to check the given role if it starts or stops the service at any given point, if a variable is used there, and if this variable is used somewhere else but for example to create a path name or identify a binary.

The given example indeed controls a service. Thus we add another variable, the service name:

tasks/main.yml
@@ -41,6 +41,6 @@
 - name: Ensure Apache has selected state and enabled on boot.
   service:
-    name: "{{ apache_daemon }}"
+    name: "{{ apache_service }}"
     state: "{{ apache_state }}"
     enabled: yes

Next, we need to add the new variable to the existing OS support:

vars/Debian.yml
@@ -1,4 +1,5 @@
 ---
+apache_service: apache2
 apache_daemon: apache2
 apache_daemon_path: /usr/sbin/
 apache_server_root: /etc/apache2
vars/RedHat.yml
@@ -1,4 +1,5 @@
 ---
+apache_service: httpd
 apache_daemon: httpd
 apache_daemon_path: /usr/sbin/
 apache_server_root: /etc/httpd

Now would be a good time to test the role – it should work on the suported platforms.

The next step is to add the necessary variables for Solaris. The best way is to copy an already existing variable file and to modify it afterwards to fit Solaris:

vars/Solaris.yml
@@ -0,0 +1,19 @@
+---
+apache_service: apache24
+apache_daemon: httpd
+apache_daemon_path: /usr/apache2/2.4/bin/
+apache_server_root: /etc/apache2/2.4/
+apache_conf_path: /etc/apache2/2.4/conf.d
+
+apache_vhosts_version: "2.2"
+
+__apache_packages:
+  - web/server/apache-24
+  - web/server/apache-24/module/apache-ssl
+  - web/server/apache-24/module/apache-security
+
+apache_ports_configuration_items:
+  - regexp: "^Listen "
+    line: "Listen {{ apache_listen_port }}"
+  - regexp: "^#?NameVirtualHost "
+    line: "NameVirtualHost *:{{ apache_listen_port }}"

This specific role provides two playbooks to setup and configure each supported platform. The easiest way to create these two files for a new platform is again to copy existing ones and to modify them afterwards according to the specifics of Solaris.

The configuration looks like:

tasks/configure-Solaris.yml
@@ -0,0 +1,19 @@
+---
+- name: Configure Apache.
+  lineinfile:
+    dest: "{{ apache_server_root }}/conf/{{ apache_daemon }}.conf"
+    regexp: "{{ item.regexp }}"
+    line: "{{ item.line }}"
+    state: present
+  with_items: apache_ports_configuration_items
+  notify: restart apache
+
+- name: Add apache vhosts configuration.
+  template:
+    src: "vhosts-{{ apache_vhosts_version }}.conf.j2"
+    dest: "{{ apache_conf_path }}/{{ apache_vhosts_filename }}"
+    owner: root
+    group: root
+    mode: 0644
+  notify: restart apache
+  when: apache_create_vhosts

The setup thus can look like:

tasks/setup-Solaris.yml
@@ -0,0 +1,6 @@
+---
+- name: Ensure Apache is installed.
+  pkg5:
+    name: "{{ item }}"
+    state: installed
+  with_items: apache_packages

Last but not least, the platform support must be activated in the main/task.yml file:

tasks/main.yml
@@ -15,6 +15,9 @@
 - include: setup-Debian.yml
   when: ansible_os_family == 'Debian'
 
+- include: setup-Solaris.yml
+  when: ansible_os_family == 'Solaris'
+
 # Figure out what version of Apache is installed.
 - name: Get installed version of Apache.
   shell: "{{ apache_daemon_path }}{{ apache_daemon }} -v"

When you now run the role on a Solaris machine, it should install Apache right away.

Conclusion

Adopting a given role from Ansible Galaxy for Solaris is rather easy – if the given role is already prepared for multi OS support. In such cases adding another role is a trivial task.

If the role is not prepared for multi OS support, try to get in contact with the developers, often they appreciate feedback and multi OS support pull requests.

[Short Tip] Debug Spamassassin within Amavisd

920839987_135ba34fff
Filtering e-mail for spam and viruses can be done efficiently with Amavisd-New. Besides its own technologies to identify and filter out Spam it can also make use of Spamassassin and its results. However, since Amavisd starts Spamassassin itself, it sometimes is hard to debug when something is not working.

For example in a recent case I saw that the Bayes database was not used when checking for spam characteristics, although the database was properly trained with ham and spam.

Thus first I checked Spamassassin itself:

$ su -s /bin/bash mailuser -c "spamassassin -D -t < ExampleSpam.eml 2>&1"  | tee sa.out

That worked well, the Bayes database was queried, results were shown.

Next, I added $sa_debug = '1,all'; to the Amavisd configuration and run Amavisd in debug mode:

$ amavisd -c /etc/amavisd/amavisd.conf debug

And that showed the problem: one of the Bayes files had wrong permissions. After fixing those, the filter run again properly.

[Howto] Solaris 11 on KVM

solarisRecently I had to test a few things on Solaris 11 and wondered how well it works virtualized with KVM. It does – with a few tweaks.

Preface

Testing various different versions of operating systems is easy these days thanks to virtualization. However, I’m mainly used to Linux variants and hardly ever install any other kind of UNIX based OS. Thus I was curious if an installation of Solaris 11 on KVM / libvirt works.

For the test I actually used virt-manager since it does provide neat defaults during the VM setup. But the same comments and lessons learned are true for the command line tool as well.

Setting up the VM

virt-manager usually does not provide Solaris as an operating system type by default in the VM setup dialog. You first have to click on “OS Type”, “Show all OS options” as shown here:
virt-manager Solaris guest picker

Note, a Solaris 11 should have at least 2 GB RAM, otherwise the installation and also booting might take very long or run into their very own problems.

The installation runs through – although quite some errors clutter the screen (see below).

Errors and problems

As soon as the machine is started several error messages are shown:

WARNING: /pci@0,0/pci1af4,1100@6,1 (uhci1): No SOF interrupts have been received, this USB UHCI host controller is unusable
WARNING: /pci@0,0/pci1af4,1100@6,2 (uhci2): No SOF interrupts have been received, this USB UHCI host controller is unusable

This shows that something is wrong with the interrupts and thus withe the “hardware” of the machine – or at least with the way the guest machine discovers the hardware.

Additionally, even if DHCP is configured, the machine is unable to obtain the networking configuration. A fixed IP address and gateway do not help here, either. The host system might even report that it provides DHCP data, but the guest system continues to request these:

Dez 23 11:11:05 liquidat dnsmasq-dhcp[13997]: DHCPDISCOVER(virbr0) 52:54:00:31:31:4b
Dez 23 11:11:05 liquidat dnsmasq-dhcp[13997]: DHCPOFFER(virbr0) 192.168.122.205 52:54:00:31:31:4b
Dez 23 11:11:09 liquidat dnsmasq-dhcp[13997]: DHCPDISCOVER(virbr0) 52:54:00:31:31:4b
Dez 23 11:11:09 liquidat dnsmasq-dhcp[13997]: DHCPOFFER(virbr0) 192.168.122.205 52:54:00:31:31:4b
Dez 23 11:11:17 liquidat dnsmasq-dhcp[13997]: DHCPDISCOVER(virbr0) 52:54:00:31:31:4b
Dez 23 11:11:17 liquidat dnsmasq-dhcp[13997]: DHCPOFFER(virbr0) 192.168.122.205 52:54:00:31:31:4b
...

Also, when the machine is shutting down and ready to be powered off, the CPU usage spikes to 100 %.

The solution: APIC

The solution for the “hardware” problems mentioned above and also for the networking trouble is to deactivate a APIC feature inside the VM: x2APIC, Intel’s programmable interrupt controller. Some more details about the problem can be found in the Red Hat Bugzilla entry #1040500.

To apply the fix the virtual machine definition needs to be edited to disable the feature. The xml definition can be edited with the command sudo virsh edit with the machine name as command line option, the change needs to be done in the section cpu as shown below. Make sure tha VM is stopped before the changes are done.

$ sudo virsh edit krypton
...
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Broadwell</model>
    <feature policy='disable' name='x2apic'/>
  </cpu>
$ sudo virsh start krypton

After this changes Solaris does not report any interrupt problems anymore and DHCP works without flaws. Note however that the CPU still spikes at power off. If anyone knows a solution to that problem I would be happy to hear about it and add it to this post.