Containers are meant to keep processes contained. But there are ways to gather information about the host – like the actual execution environment you are running in.
Containers are pretty good at keeping everyone and everything inside their boundaries – thanks to SELinux, namespaces and so on. But they are not perfect. Thanks to a recent Azure security flaw I was made aware of a nice trick via the /proc/ file system to figure out what container runtime the container is running in.
The idea is that the started container inherits some /proc/ entries – among the entry for /proc/self/. If we are able to load a malicious container, we can use this knowledge to execute the container host binary and by that get information about the runtime.
As an example, let’s take a Fedora container image with podman (and all it’s library dependencies) installed. We can run it and check the version of crun inside:
With this nice little feature we now have insight into the version used – and as it was shown in the detailed write-up of the Azure security problem mentioned above, this insight can be crucial to identify and use the right CVEs to start lateral movement.
Of course this requires us to be able to load containers of our choice, and to have the right libraries inside the container. This example is simplified because I knew about the target system and there was a container available with everything I needed. For a more realistic attack container image, check out whoc.
Cilium is a networking plugin for Kubernetes based on eBPF. If you want to give it a try, Minikube is a good option to get started.
Background
I just started with Isovalent – and since I am very much a beginner regarding everything related to Kubernetes I decided to get some hands-on experience with the technology I am going to work with for the foreseeable future.
Isovalent’s offering is an Enterprise version of Cilium which basically manages and secures connections between containers and adds observability to it. It all runs on eBPF and thus is pretty performant. eBPF can run sandboxed programs in Linux kernel space without the need to recompile the kernel; A tiny bit like a “Kernel VM”. I always wanted to get my hands dirty with eBPF anyway, and Cilium is a very good way to approach it. But where to start? The answer is: with a small Kubernetes setup based on Minikube, a tiny Kubernetes distribution for testing and fooling around which leaves your main system almost unchanged.
Preparing the environment
Minikube runs itself in a tightly confined environment to not disturb your other systems. This abstraction is done via containers or VMs realized via so called “drivers”. Drivers are available for Docker, VMWare, KVM, Podman and others. I decided to go with the KVM driver, so the virtualization bits need to be installed:
Note in the above commands that the last command only works in Nushell and has to be slightly adjusted for Bash or Zsh.
Next we have to install Minikube itself:
❯ curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-latest.x86_64.rpm
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 15.1M 100 15.1M 0 0 8304k 0 0:00:01 0:00:01 --:--:-- 8300k
❯ sudo rpm -Uvh minikube-latest.x86_64.rpm
Verifying... ################################# [100%]
Preparing... ################################# [100%]
Updating / installing...
1:minikube-1.22.0-0 ################################# [100%]
Also, to install and manage Cilium easily it makes sense to use the Cilium CLI. Unfortunately the CLI is currently not available as a RPM package for Fedora, so we have to install the binary and move it to /usr/local/bin:
❯ curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/latest/download/cilium-linux-amd64.tar.gz{,.sha256sum}
[...]
❯ sha256sum --check cilium-linux-amd64.tar.gz.sha256sum
cilium-linux-amd64.tar.gz: OK
❯ sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
Starting Minikube with CNI
We now need to start up our Kubernetes cluster, and it needs to be in a way that we can install and use Cilium in it. So we set the network configuration to CNI:
❯ minikube start --network-plugin=cni
😄 minikube v1.22.0 on Fedora 34
✨ Automatically selected the kvm2 driver. Other choices: podman, ssh
💾 Downloading driver docker-machine-driver-kvm2:
> docker-machine-driver-kvm2....: 65 B / 65 B [----------] 100.00% ? p/s 0s
> docker-machine-driver-kvm2: 11.47 MiB / 11.47 MiB 100.00% 12.50 MiB p/s
❗ With --network-plugin=cni, you will need to provide your own CNI. See --cni flag as a user-friendly alternative
💿 Downloading VM boot image ...
> minikube-v1.22.0.iso.sha256: 65 B / 65 B [-------------] 100.00% ? p/s 0s
> minikube-v1.22.0.iso: 242.95 MiB / 242.95 MiB [ 100.00% 20.05 MiB p/s 12s
👍 Starting control plane node minikube in cluster minikube
🔥 Creating kvm2 VM (CPUs=2, Memory=6000MB, Disk=20000MB) ...
🐳 Preparing Kubernetes v1.21.2 on Docker 20.10.6 ...
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
Installing Cilium into Kubernetes
Since Cilium CLI is already installed, it is fairly easy to install Cilium into the cluster itself. The installation is done into the current kubectl context, so make sure you are running in the right context for example with kubectl get nodes. Afterwards, fire up the installation:
❯ cilium install
🔮 Auto-detected Kubernetes kind: minikube
✨ Running "minikube" validation checks
✅ Detected minikube version "1.22.0"
ℹ️ using Cilium version "v1.10.2"
🔮 Auto-detected cluster name: minikube
🔮 Auto-detected IPAM mode: cluster-pool
🔮 Auto-detected datapath mode: tunnel
🔮 Custom datapath mode: tunnel
🔑 Generating CA...
2021/07/13 14:09:33 [INFO] generate received request
2021/07/13 14:09:33 [INFO] received CSR
2021/07/13 14:09:33 [INFO] generating key: ecdsa-256
2021/07/13 14:09:33 [INFO] encoded CSR
2021/07/13 14:09:33 [INFO] signed certificate with serial number 122640105911298337607907666763746132599853501126
🔑 Generating certificates for Hubble...
2021/07/13 14:09:33 [INFO] generate received request
2021/07/13 14:09:33 [INFO] received CSR
2021/07/13 14:09:33 [INFO] generating key: ecdsa-256
2021/07/13 14:09:33 [INFO] encoded CSR
2021/07/13 14:09:33 [INFO] signed certificate with serial number 459020519400202498147292503280351877404424824247
🚀 Creating Service accounts...
🚀 Creating Cluster roles...
🚀 Creating ConfigMap...
🚀 Creating Agent DaemonSet...
🚀 Creating Operator Deployment...
⌛ Waiting for Cilium to be installed...
⌛ Waiting for Cilium to become ready before restarting unmanaged pods...
♻️ Restarting unmanaged pods...
♻️ Restarted unmanaged pod kube-system/coredns-558bd4d5db-8s4f6
♻️ Restarted unmanaged pod kubernetes-dashboard/dashboard-metrics-scraper-7976b667d4-ctq4p
♻️ Restarted unmanaged pod kubernetes-dashboard/kubernetes-dashboard-6fcdf4f6d-5wkbx
✅ Cilium was successfully installed! Run 'cilium status' to view installation health
The installation went through flawlessly. But does it really work? As mentioned in the last line of the above listing, we can check the status of Cilium easily:
We can even get one step further and check the connectivity of the cluster – after all, Cilium is all about proper networking:
❯ cilium connectivity test
ℹ️ Single-node environment detected, enabling single-node connectivity test
ℹ️ Monitor aggregation detected, will skip some flow validation steps
[...]
..
✅ All 11 tests (76 actions) successful, 0 tests skipped, 0 scenarios skipped.
As you see Cilium creates a set of pods and a service in a dedicated namespace and runs tests on them afterwards.
Interacting with the Cilium agent
Let’s have a first look at our installed Cilium environment by running a few commands on the local Cilium agent. First we have to figure out the name of the actual Cilium pod:
❯ minikube kubectl -- -n kube-system get pods -l k8s-app=cilium
NAME READY STATUS RESTARTS AGE
cilium-8hx2v 1/1 Running 0 35m
With the name of the pod we can now reach into the pod and execute the Cilium command right inside, for example querying the list of endpoints:
This list is long, detailed and only really makes sense on a wide monitor. But it already tells us a lot about the current enforcement of ingress and egress policies (here they are not enforced as of yet).
But there is more: since Cilium is eBPF based, we can go one layer deeper, and for example look at the policy related eBPF maps:
❯ minikube kubectl -- -n kube-system exec cilium-8hx2v -- cilium bpf policy get --all
Defaulted container "cilium-agent" out of: cilium-agent, ebpf-mount (init), clean-cilium-state (init)
/sys/fs/bpf/tc/globals/cilium_policy_00208:
POLICY DIRECTION LABELS (source:key[=value]) PORT/PROTO PROXY PORT BYTES PACKETS
Allow Ingress reserved:unknown ANY NONE 16959 183
Allow Ingress reserved:host ANY NONE 1098509 4452
Allow Egress reserved:unknown ANY NONE 393706 4204
[...]
Note that the policy number is related to the endpoint ID in the Cilium endpoint list above.
We now have a running Cilium setup which can be used to run tests and examples!
Next: write and enforce policies, add observability
Doing a policy enforcement test goes beyond of the scope of this blog post – but it certainly is worth a look in the future. Also with all the data already shown above it makes sense to make a deep-dive into the topic of observation in the future.
The same is true for observability: if you wonder how deep the rabbit hole really is there is Hubble which provides serious observability into the Kubernetes network, services and security, comes with a UI and can be quickly installed since it is tightly integrated with Cilium.
And if you have stories to share around eBPF, Cilium and similar topics I am finally getting an idea of what you are talking about. 😉
In my new position I will be a technical marketing manager and thus working on technical content, messaging and enablement. With Cilium Enterprise Isovalent offers an eBPF based solution for Kubernetes networking, observability, and security – and since I am rather new to Kubernetes, I expect a steep learning curve.
I am looking forward to the challenges ahead of me, and will drop a blog post about it once in a while =)
Last November we introduced Ansible security automation as our answer to the lack of integration across the IT security industry. Let’s have a closer look at one of the scenarios where Ansible can facilitate typical operational challenges of security practitioners.
Last November we introduced Ansible security automation as our answer to the lack of integration across the IT security industry. Let’s have a closer look at one of the scenarios where Ansible can facilitate typical operational challenges of security practitioners.
A big portion of security practitioners’ daily activity is dedicated to investigative tasks. Enrichment is one of those tasks, and could be both repetitive and time-consuming, making it a perfect candidate for automation. Streamlining these processes can free up their analysts to focus on more strategic tasks, accelerate the response in time-sensitive situations and reduce human errors. However, in many large organizations , the multiple security solutions aspect of these activities are not integrated with each other. Hence, different teams may be in charge of different aspects of IT security, sometimes with no processes in common.
That often leads to manual work and interaction between people of different teams which can be error-prone and above all, slow. So when something suspicious happens and further attention is needed, security teams spend a lot of valuable time operating on many different security solutions and coordinating work with other teams, instead of focusing on the suspicious activity directly.
In this blog post we have a closer look at how Ansible can help to overcome these challenges and support investigation enrichment activities. In the following example we’ll see how Ansible can be used to enable programmatic access to information like logs coming from technologies that may not be integrated into a SIEM. As an example we’ll use enterprise firewalls and intrusion detection and protection systems (IDPS).
Simple Demo Setup
To showcase the aforementioned scenario we created a simplified, very basic demo setup to showcase the interactions. This setup includes two security solutions providing information about suspicious traffic, as well as a SIEM: we use a Check Point Next Generation Firewall (NGFW) and a Snort IDPS as security solutions providing information. The SIEM to gather and analyze those data is IBM QRadar.
Also, from a machine called “attacker” we will simulate a potential attack pattern on the target machine on which the IDPS is running.
This is just a basic demo setup, a real world setup of an Ansible security automation integration would look different, and can feature other vendors and technologies.
Logs: crucial, but distributed
Now imagine you are a security analyst in an enterprise. You were just informed of an anomaly in an application, showing suspicious log activities. For example, we have a little demo where we curl a certain endpoint of the web server which we conveniently called “web_attack_simulation”:
As a security analyst you know that anomalies can be the sign of a potential threat. You have to determine if this is a false positive, that can be simply dismissed or an actual threat which requires a series of remediation activities to be stopped. Thus you need to collect more data points – like from the firewall and the IDS. Going through the logs of the firewall and IDPS manually takes a lot of time. In large organizations, the security analyst might not even have the necessary access rights and needs to contact the teams that each are responsible for both the enterprise firewall and the IDPS, asking them to manually go through the respective logs and directly check for anomalies on their own and then reply with the results. This could imply a phone call, a ticket, long explanations, necessary exports or other actions consuming valuable time.
It is common in large organisations to centralise event management on a SIEM and use it as the primary dashboard for investigations. In our demo example the SIEM is QRadar, but the steps shown here are valid for any SIEM. To properly analyze security-related events there are multiple steps necessary: the security technologies in question – here the firewall and the IDPS – need to be configured to stream their logs to the SIEM in the first place. But the SIEM also needs to be configured to help ensure that those logs are parsed in the correct way and meaningful events are generated. Doing this manually is time-intensive and requires in-depth domain knowledge. Additionally it might require privileges a security analyst does not have.
But Ansible allows security organizations to create pre-approved automation workflows in the form of playbooks. Those can even be maintained centrally and shared across different teams to enable security workflows at the press of a button.
Why don’t we add those logs to QRadar permanently? This could create alert fatigue, where too much data in the system generates too many events, and analysts might miss the crucial events. Additionally, sending all logs from all systems easily consumes a huge amount of cloud resources and network bandwidth.
So let’s write such a playbook to first configure the log sources to send their logs to the SIEM. We start the playbook with Snort and configure it to send all logs to the IP address of the SIEM instance:
Note that here we only have one task, which imports an existing role. Roles are an essential part of Ansible, and help in structuring your automation content. Roles usually encapsulate the tasks and other data necessary for a clearly defined purpose. In the case of the above shown playbook, we use the role ids_config, which manages the configuration of various IDPS. It is provided as an example by the ansible-security team. This role, like others mentioned in this blog post, are provided as a guidance to help customers that may not be accustomed to Ansible to become productive faster. They are not necessarily meant as a best practise or a reference implementation.
Using this role we only have to note a few parameters, the domain knowledge of how to configure Snort itself is hidden away. Next, we do the very same thing with the Check Point firewall. Again an existing role is re-used, log_manager:
- name: Configure Check Point to send logs to QRadar
hosts: checkpoint
tasks:
- include_role:
name: ansible_security.log_manager
tasks_from: forward_logs_to_syslog
vars:
syslog_server: "192.168.3.4"
checkpoint_server_name: "gw-2d3c54"
firewall_provider: checkpoint
With these two snippets we are already able to reach out to two security solutions in an automated way and reconfigure them to send their logs to a central SIEM.
We can also automatically configure the SIEM to accept those logs and sort them into corresponding streams in QRadar:
- name: Add Snort log source to QRadar
hosts: qradar
collections:
- ibm.qradar
tasks:
- name: Add snort remote logging to QRadar
qradar_log_source_management:
name: "Snort rsyslog source - 192.168.14.15"
type_name: "Snort Open Source IDS"
state: present
description: "Snort rsyslog source"
identifier: "ip-192-168-14-15"
- name: Add Check Point log source to QRadar
hosts: qradar
collections:
- ibm.qradar
tasks:
- name: Add Check Point remote logging to QRadar
qradar_log_source_management:
name: "Check Point source - 192.168.23.24"
type_name: "Check Point FireWall-1"
state: present
description: "Check Point log source"
identifier: "192.168.23.24"
Here we do use Ansible Content Collections: the new method of distributing, maintaining and consuming automation content. Collections can contain roles, but also modules and other code necessary to enable automation of certain environments. In our case the collection for example contains a role, but also the necessary modules and connection plugins to interact with QRadar.
Without any further intervention by the security analyst, Check Point logs start to appear in the QRadar log overview. Note that so far no logs are sent from Snort to QRadar: Snort does not know yet that this traffic is noteworthy! We will come to this in a few moments.
Remember, taking the perspective of a security analyst: now we have more data at our disposal. We have a better understanding of what could be the cause of the anomaly in the application behaviour. Logs from the firewall are shown, who is sending traffic to whom. But this is still not enough data to fully qualify what is going on.
Fine-tuning the investigation
Given the data at your disposal you decide to implement a custom signature on the IDPS to get alert logs if a specific pattern is detected.
In a typical situation, implementing a new rule would require another interaction with the security operators in charge of Snort who would likely have to manually configure multiple instances. But luckily we can again use an Ansible Playbook to achieve the same goal without the need for time consuming manual steps or interactions with other team members.
There is also the option to have a set of playbooks for customer specific situations pre-create. Since the language of Ansible is YAML, even team members with little knowledge can contribute to the playbooks, making it possible to have agreed upon playbooks ready to be used by the analysts.
Again we reuse a role, ids_rule. Note that this time some understanding of Snort rules is required to make the playbook work. Still, the actual knowledge of how to manage Snort as a service across various target systems is shielded away by the role.
---
- name: Add Snort rule
hosts: snort
become: yes
vars:
ids_provider: snort
tasks:
- name: Add snort web attack rule
include_role:
name: "ansible_security.ids_rule"
vars:
ids_rule: 'alert tcp any any -> any any (msg:"Attempted Web Attack"; uricontent:"/web_attack_simulation"; classtype:web-application-attack; sid:99000020; priority:1; rev:1;)'
ids_rules_file: '/etc/snort/rules/local.rules'
ids_rule_state: present
Finish the offense
Moments after the playbook is executed, we can check in QRadar if we see alerts. And indeed, in our demo setup this is the case:
With this information on hand, we can now finally check all offenses of this type, and verify that they are all coming only from one single host – here the attacker.
From here we can move on with the investigation. For our demo we assume that the behavior is intentional, and thus close the offense as false positive.
Rollback!
Last but not least, there is one step which is often overlooked, but is crucial: rolling back all the changes! After all, as discussed earlier, sending all logs into the SIEM all the time is resource-intensive.
With Ansible the rollback is quite easy: basically the playbooks from above can be reused, they just need to be slightly altered to not create log streams, but remove them again. That way, the entire process can be fully automated and at the same time made as resource friendly as possible.
Takeaways and where to go next
It happens that the job of a CISO and her team is difficult even if they have in place all necessary tools, because the tools don’t integrate with each other. When there is a security threat, an analyst has to perform an investigation, chasing all relevant pieces of information across the entire infrastructure, consuming valuable time to understand what’s going on and ultimately perform any sort of remediation.
Ansible security automation is designed to help enable integration and interoperability of security technologies to support security analysts’ ability to investigate and remediate security incidents faster.
As next steps there are plenty of resources to follow up on the topic:
Traefik is a great reverse proxy solution, and a perfect tool to direct traffic in container environments. However, to do that, it needs access to docker – and that is very dangerous and must be tightly secured!
The problem: access to the docker socket
Containers offer countless opportunities to improve the deployment and management of services. However, having multiple containers on one system, often re-deploying them on the fly, requires a dynamic way of routing traffic to them. Additionally, there might be reasons to have a front end reverse proxy to sort the traffic properly anyway.
In comes traefik – “the cloud native edge router”. Among many supported backends it knows how to listen to docker and create dynamic routes on the fly when new containers come up.
To do so traefik needs access to the docker socket. Many people decide to just provide that as a volume to traefik. This usually does not work because SELinux prevents it for a reason. The apparent workaround for many is to run traefik in a privileged container. But that is a really bad idea:
Docker currently does not have any Authorization controls. If you can talk to the docker socket or if docker is listening on a network port and you can talk to it, you are allowed to execute all docker commands. […] At which point you, or any user that has these permissions, have total control on your system.
But there are ways to securely provide traefik the access it needs – without exposing too much permissions. One way is to provide limited access to the docker socket via tcp via another container which cannot be reached from the outside that easily.
What? This is a security-enhaced proxy for the Docker Socket. Why? Giving access to your Docker socket could mean giving root access to your host, or even to your whole swarm, but some services require hooking into that socket to react to events, etc. Using this proxy lets you block anything you consider those services should not do.
It is a container which connects to the docker socket and exports the API features in a secured and configurable way via TCP. At container startup it is configured with booleans to which API sections access is granted.
So basically you set up a docker proxy to support your proxy for docker containers. Well…
How to use it
The docker socket proxy is a container itself. Thus it needs to be launched as a privileged container with access to the docker socket. Also, it must not publish any ports to the outside. Instead it should run on a dedicated docker network shared with the traefik container. The Ansible code to launch the container that way is for example:
Note the env right in the middle: that is where the exported permissions are configured. CONTAINERS: 1 provides access to container relevant information. There are also SERVICES: 1 and SWARM: 1 to manage access to docker services and swarm.
Traefik needs to have access to the same network. Also, the traefik configuration needs to point to the docker container via tcp:
This setup works surprisingly easy. And it allows traefik to access the docker socket for the things it needs without exposing critical permissions to take over the system. At the same time, full access to the docker socket is restricted to a non-public container, which makes it harder for attackers to exploit it.
If you have a simple container setup and use Ansible to start and stop the containers, I’ve written a role to get the above mentioned setup running.