antiTree | posts and projects
posted on Jul 07, 2025
  • seccomp-diff extracts the real seccomp filters straight from a running container
  • Reverse engineering BPF taught me more about containers and syscalls than I expected
  • seccomp good, seccomp at scale hard

Ever wonder if the seccomp profile Docker or Kubernetes thinks it’s applying is the same one actually enforcing syscalls inside your container? Sure, we all do, right? Well I wrote seccomp-diff, a tool that digs into a live process using ptrace, extracts out the seccomp BPF bytecode and lets you compare it with the default seccomp profile that is applied in the cluster or other containers that are running.

Problem that it solves: Foot Shooting

If you’re building a “sandbox” or “hardened container” or whatever you want to call a container with a restricted runtime, many security guides tell you to build your own custom seccomp profile from scratch. Setting seccomp profiles is undeniably a great way to harden your container in principle, but in practice, it loads up two guns to shoot at your foot.

One (among many) problem is that what your container applies, may not match up to what you expect. Kernel versions, container runtime versions, and capabilities can either miss some common issues or completely bypass what you were expecting. For example, there have been a bunch of security issues related to io_uring but whether or not you block these syscalls is not easy to see inside of your seccomp profiles unless you were looking for that specific system call.

This is why seccomp-diff goes straight to the source by inspecting the process itself, giving you a list of all the system calls that are supported, and provides you with some background information about the system call itself. The output is the ground truth and lets you be accountable for the system calls that you’re allowing.

Features in a Nutshell

  • Extract all the seccomp profiles applied across all containers in your cluster
  • Disassemble the bytecode into readable rules
  • Diff two containers to spot syscall differences at a glance
  • Handy web UI and CLI tools for ops teams and security researchers

I’ve also thrown in a seccomp-dump tool for those that don’t care about containers:

 sudo python seccomp_dump.py --dump 436762
l0000: 20 00 00 00000004        A = [4](ARCH)
l0001: 15 00 04 c000003e        IF ARCH != X86_64: 6(l0006)
l0002: 20 00 00 00000000        A = [0](SYSCALL)
l0003: 35 00 01 40000000        jlt #0x40000000, l5
l0004: 15 00 01 ffffffff        IF SYSCALL != 0xffffffff: KILL(l0006)
l0005: 06 00 00 7ffc0000        RETURN LOG
l0006: 06 00 00 00000000        RETURN KILL

Lessons Learned

This tool was built for a talk at the last Shmoocon in 2025. It was a huge effort to build this and at least initially no LLMs were used to help me out so I learned a ton about decoding seccomp’s BPF bytecode and how containers really interact with the kernel. It’s one thing to say I know how a container is created; it’s another to watch syscalls from containerd, to shim, to runc, and eventually your entrypoint.

If you’ve read this far, let me point out something ironic as a reward. I am starting with the premise that Seccomp in JSON is hard to manage… but what do you think the format is that I return to the web interface to populate the visual diff aspect. That’s right:

This is a tool that bypasses seccomp json, to extract the seccomp-bytecode directly from the process, and return back a seccomp json file. The irony is not lost.

I’m hoping by Summercamp this year I’ll be ready to wrap all of this up into some research around teams that are building custom seccomp profiles.