Sometimes during a container or Kubernetes assessment, we get requested to review whether a runtime security tool that a client uses is sufficient for their threat model. This often means reviewing a custom seccomp-bpf profile, AppArmor config, or third party tool that tries to enforce the isolation between a container and the host. There’s two approaches we usually take during these gigs:
There are ways to harden your runtime like dropping capabilities when a container starts but when you have a goal of re-enforcing the isolation between the container and the host towards using a container more like a sandbox, the main option you have is syscall filtering/monitoring. That’s monitoring and blocking which syscalls are allowed to go from the container to the host.
If you look at some of the takes on container runtimes like Nabla, they’re taking a different approach to hardening but effectively doing the same thing – blocking syscalls from a container that could affect the host.
Falco recently became an open source project and is now officially supported by the CNCF which is great. I’ve tried not to gush too hard over Sysdig in the past but I’ve liked their tools.
Falco is a runtime security tool designed to leverage Sysdig’s events monitoring library to add a layer of security on top of your containers and your host/Nodes. It uses a combination of syscall and network filtering that matches against a rule list that you can define. Compared to seccomp-bpf or AppArmour, this can let you create a more expressive policy about what you want to allow or block in your container environment.
Here are some examples of the command the log generated:
/ # sudo cat /etc/shadow Warning Sensitive file opened for reading by non-trusted program (user=root program=cat command=cat /etc/shadow file=/etc/shadow parent=sudo gparent=zsh ggparent=<NA> gggparent=<NA> container_id=host image=<NA>)
Start a privileged container, mount the host volume, and attach to it
/ # docker run -it --privileged -v /:/host busybox:latest Notice A shell was spawned in a container with an attached terminal (user=root incomplete (id=b08eb99c4c6b) shell=sh parent=<NA> cmdline=sh terminal=34816 container_id=b08eb99c4c6b image=incomplete) Notice Privileged container started (user=<NA> command=container:b08eb99c4c6b eager_ride (id=b08eb99c4c6b) image=busybox:latest) docker run -it --privileged -v /:/host busybox:latest Notice Container with sensitive mount started (user=root command=container:ea83b093eb14 nifty_darwin (id=ea83b093eb14) image=busybox:latest mounts=/:/host::true:rslave)
The last command above is fun because you’re mounting the host volume inside the container and it’s privileged so there’s not much separation from the container to the host. Once you have a shell, running the following will effectively let you take over the host:
/ # chroot /host # cat /etc/shadow
I use this as an example because Falco’s default rules do not alert on you using
chroot and it might be worth
thinking about why. It’s already alerted you that a priv’d container is started,
and that there’s a sensitive host mount, so why keep bugging the user every time.
If you compare this writing a seccomp-bpf profile for Docker with the same effect,
it would be a pretty complex policy that detects
a container running as privileged, determines that it’s a sensitive volume mount,
and decides how to handle it.
In fact, the seccomp-bpf version of preventing this attack boils down to something like this:
"names": [ "chroot" ], "action": "SCMP_ACT_KILL", "args": , "comment": "",
And the default Docker seccomp profile does this (in a slightly different way than my example) so why doesn’t Falco?
The short answer I think is is, “Go for it”. Falco needs to be customized for your environment no matter what so go ahead and start dropping syscalls with a rule like:
- rule: drop_chroot_attempts desc: Custom rule to prevent the chroot syscall from being used condition: syscall.type = chroot and not proc.name in (docker, sysdig, dragent) output: "Yo, somebody's been chrooting. (user=%user.name command=%proc.cmdline container=%container.i d)" priority: WARNING
Now if we run the same commands as before, we can now get more context
/ # docker run -it --privileged -v /:/host busybox:latest Notice A shell was spawned in a container with an attached terminal (user=root incomplete (id=b08eb99c4c6b) shell=sh parent=<NA> cmdline=sh terminal=34816 container_id=b08eb99c4c6b image=incomplete) Notice Privileged container started (user=<NA> command=container:b08eb99c4c6b eager_ride (id=b08eb99c4c6b) image=busybox:latest) / # docker run -it --privileged -v /:/host busybox:latest Notice Container with sensitive mount started (user=root command=container:ea83b093eb14 nifty_darwin (id=ea83b093eb14) image=busybox:latest mounts=/:/host::true:rslave) // # chroot /host Warning Yo, somebody's been chrooting. (user=root command=chroot /host container=1003f9513b76) Warning Yo, somebody's been chrooting. (user=root command=chroot /host container=1003f9513b76)
(someone else can explain to me why there are 2 separate syscalls but that’s besides the point)
My point isn’t that the default policy not blocking the
chroot syscall is a security flaw,
it’s that Falco right now has a different use case than other
runtime security products or tools. Falco is a runtime security product AND a monitoring tool.
If you filter on a syscall in Falco, you’re prone to get a lot more alerts with very little
context. Compare it to seccomp – for seccomp-bpf, you’re blocking syscalls and maybe trying to stop container runtime 0-days.
For Falco, you’re watching for common activities and reacting to them.
One of the challenges that you get when you’re writing rules at a higher level than just
blocking syscalls, is there’s going to be more bypasses and edge cases you have to be concerned about.
When coming up with a mitigation for an attack like wanting to prevent a
compromising the host,
the question becomes “Should I block the chroot syscall or should
I block the scenario where a chroot could cause damage?” I think for Falco, the
expected practice is block the attributes of the activity (privileged mode, host mounts, TTY started)
and not just the syscalls so that your team can make more educated decisions about
the context of the issue.
I’ll go through a few examples that that can bypass Falco’s default rules:
Falco can detect when a privileged container is started as one might expect. It does this with a rule. You can see below for detecting priv’d containers it’s first a condition that references a macro which references a list. This could be interpreted as “If something starts a privileged container that’s no one of the ones from my list of approved images, alert.”
- list: falco_privileged_images items: [ docker.io/sysdig/agent, docker.io/sysdig/falco, docker.io/sysdig/sysdig, gcr.io/google_containers/kube-proxy, docker.io/calico/node, docker.io/rook/toolbox, docker.io/cloudnativelabs/kube-router, docker.io/mesosphere/mesos-slave, docker.io/docker/ucp-agent, sematext_images, k8s.gcr.io/kube-proxy ] - macro: falco_privileged_containers condition: (openshift_image or user_trusted_containers or container.image.repository in (trusted_images) or container.image.repository in (falco_privileged_images) or container.image.repository startswith istio/proxy_ or container.image.repository startswith quay.io/sysdig) - rule: Launch Privileged Container desc: Detect the initial process started in a privileged container. Exceptions are made for known trusted images. condition: > container_started and container and container.privileged=true and not falco_privileged_containers and not user_privileged_containers output: Privileged container started (user=%user.name command=%proc.cmdline %container.info image=%container.image.repository:%container.image.tag) priority: INFO tags: [container, cis, mitre_privilege_escalation, mitre_lateral_movement]
Since we can’t run our
busybox image as privileged anymore (or without someone finding us), lets figure out
a way around it by tagging it to match the “falco_privileged_containers” macro which allows the string “k8s.gcr.io/kube-proxy”
as an image name.
busybox as “k8s.gcr.io/kube-proxy”
docker tag busybox:latest k8s.gcr.io/kube-proxy
Start the container again
docker run -it --privileged -v /:/host k8s.gcr.io/kube-proxy
Success! This will not produce any alerts about you running a privileged container.
It will however create an alert that says
Notice Container with sensitive mount started (user=root command=container:ea83b093eb14 nifty_darwin (id=ea83b093eb14) image=busybox:latest mounts=/:/host::true:rslave)
Lets see if we can’t bypass the sensitive mounts alert then. Here’s the rules:
- list: falco_sensitive_mount_images items: [ docker.io/sysdig/agent, docker.io/sysdig/falco, docker.io/sysdig/sysdig, gcr.io/google_containers/hyperkube, gcr.io/google_containers/kube-proxy, docker.io/calico/node, docker.io/rook/toolbox, docker.io/cloudnativelabs/kube-router, docker.io/consul, docker.io/datadog/docker-dd-agent, docker.io/datadog/agent, docker.io/docker/ucp-agent, docker.io/gliderlabs/logspout, docker.io/netdata/netdata, docker.io/google/cadvisor, docker.io/prom/node-exporter ] - macro: falco_sensitive_mount_containers condition: (user_trusted_containers or container.image.repository in (trusted_images) or container.image.repository in (falco_sensitive_mount_images) or container.image.repository startswith quay.io/sysdig)
busybox as “quay.io/sysdig”
docker tag busybox:latest quay.io/sysdig
Start the container again
docker run -it --privileged -v /:/host quay.io/sysdig
Success! With the bonus that the string “quay.io/sysdig” allows us to run as both privileged and with a sensitive volume.
Now we’re getting into it.
- list: shell_binaries items: [ash, bash, csh, ksh, sh, tcsh, zsh, dash] - macro: shell_procs condition: proc.name in (shell_binaries) - rule: Terminal shell in container desc: A shell was used as the entrypoint/exec point into a container with an attached terminal. condition: > spawned_process and container and shell_procs and proc.tty != 0 and container_entrypoint output: > A shell was spawned in a container with an attached terminal (user=%user.name %container.info shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline terminal=%proc.tty container_id=%container.id image=%container.image.repository) priority: NOTICE tags: [container, shell, mitre_execution]
If you walk up from the bottom you see that it’s basically saying “If it’s a spawned process in a container, and it’s one of the shells from my list, and it creates a TTY and it’s coming from runc/docker-runc then alert.”
Can we just rename our shell to from “bash” to “antish” and bypass it?
FROM ubuntu RUN cp /bin/bash /bin/antish
Build your Dockerfile and name stomp it again
docker build -t quay.io/sysdig .
Start the container again
docker run -it --privileged -v /:/host quay.io/sysdig
Yes! We’ve now bypassed the terminal alert, the sensitive volume alert, and the privileged container alert. If these rules were changed from just warning the user about the activity to actually blocking them, it’d be more interesting but I think the point is made.
To be clear, I’m NOT finding flaws or beating up on Falco. I’m demonstrating how Falco works, some of its intended uses, and to my pentesting friends, what normally goes into an assessment in an environment where Falco is enabled. In the real world, the environments are much more complex and so are the bypasses. We would normally assess these rules the same way if it was an Aqua, Twistlock, or Neuvector product – what do the rules do, and how can we evade them.
Some of these methods are going into a talk I’m hoping to get accepted that shares examples and testing methodology for NCC Group’s container practice.