PodSecurityPolicy(PSP) was one of the first reliable security controls introduced
by Kubernetes. It was an Admission Controller that simply checked whether or not the Pods
being deployed met the minimum level of security expected for that cluster. It was a critical
component in securing a cluster but… it will soon be removed… maybe in a few months.
You may already know this and there’s nothing new in this post if you do. But I’ve told customers, coworkers, my parents, Walmart Greeters… anyone who would listen, that PSPs are dying, and we should all get ahead of it. Mostly because I want them to be as shocked as I was when I first found out.
So here’s the full story and the current state as I understand it:
Some of the most common examples of situations that a PSP could help:
Originally, PSPs were the cornerstone for organizations doing multi-tenant clusters, like giving their developers permission to manage their own workloads and namespaces, but still wanted to limit the risk of a Pod being compromised or having one tenant’s workload affect another’s.
The PSP admission controller was fine, but it wasn’t without it’s problems.
What’s wrong with PSPs that they should be removed? There are a few weakness in the implementation and there are a few weaknesses in the Admission Controller model itself. Here’s a list I’ve been able to compile with the help of @Raesene and the documentation compiled by @TimAllClair via SIG-Auth.
PSPs are applied to the Pod and its creator. A user creates a Pod, the user is the subject. A user creates a deployment, a controller creates the Pod, controller is the subject.
So to what subjects does a PSP apply? The user? The controller? It’s both and it’s even more difficult to determine than my example. It is (usually) back to the original rule: PSPs are applied to the Pod and its creator. But there are just so many edge cases where the creator is unclear.
I think anoyance that is bigger than this is that there will be future APIs that only expand this confusion so it will be less and less obvious which subjects to apply a PSP. New APIs, custom controllers, combinations of different configurations – how do you get a head of that? I’m not sure there’s a good answer in the current Kubernetes design.
As Tim Allclair succinctly writes: “[the] Dual model weakens security”.
What happens when there is a PSP admission controller and then you write another admission controller with contrasting opinions? The common example being something like the Istio sidecar injector which can be used to modify the PodSpec to add a Pod. If the dynamaically injected Pod needs more privs than what your PSP wants, it could bypass the PSP.
I’ve had conversations with @ChaosDatumz and @nfFrenchie about the idea of a weaponized, malicious dynamic admission controller. I hope in the near future, someone sees that value in building this, if just for the lolz during CTFs.
All these controls are great for a Docker runtime but what about gVisor, or Windows, or new CRIs coming out? The PSP controller would have to expand to support different ones going forward.
You’ll notice that the PodSecurityPolicySpec has references to:
These are fine for Linux-based runtimes but have no impact on others. But that doesn’t mean running in one of those runtimes is inherently more or less secure, it’s just that you have inconsistent controls on them.
During the Kubernetes audit performed by Trail of Bits, one of the first findings noted was a bypass to PSP controls for Hostpath restrictions for PersistentVolumeClaims. It says:
“As currently implemented, the PodSecurityPolicy is not granular enough to provide protections for PersistentVolumeClaim volumes. The hostPath volume supports the ability to specify allowed paths for a given Pod to mount. This restriction is not available for the PersistentVolumeClaim, and does not propagate to the hostPath PersistentVolume.” - TOB-K8S-038
This hasn’t been fixed because
PersistentVolumeClaim not a Pod issue technically.
This is slightly beating up on PSPs because in fact, PSPs are doing their job and controlling the PodSpec and nothing else.
PersistentVolumeClaims are out of scope
for PSPs and therefore it’s unenforceable.
There’s a whole slew of security controls you could want that aren’t only in a Pod. What users want is something that prevents a misconfiguration in any type of Kubernetes object that would cause a breakout. But what PSPs actually provide is a single measure of defense on the Pods.
How do you test a PSP before it goes into production? The PSP admission controller will fail close, as in when you go to create a Pod and there is no PSP bound to it, it just won’t start the Pod. This sounds great for security but makes it so you can’t arbitrarily enable the PSP admission controllers by default without also adding a default, unconfined, PSP. (Many service providers do this but it’s too much of a risk to say that all of Kubernetes should do this.) This means you’re not getting 100% test coverage without a lot of extra work and resources to handle when the Pod doesn’t start because of an incorrect PSP.
We could build a PSP validator for CI/CDs, but this goes back to how it’s difficult to get full test coverage – how can we make sure that we’ve found all possible subjects able to create a Pod, how do we know when a PSP will be applied if other mutating admission controllers are changing the objects? It goes on and on.
There’s just no way to test a PSP with certainty.
Yes, the Pod Security Policy API (currently is Beta) is dying. Or will be changed. Or will be replaced. This scramble is what we’re talking about.
@Raesene was the first one to tell me about this as he has a lot of friends in the Kubernetes community. If memory serves, it was over two years ago that he mentioned PSPs being deprecated at which point I said something like “pff!”. I had just started seeing organizations start using it and I was pushing it hard as a general recommendation – there’s no way that it would be removed when it was just starting to get adopted.
This last year at the Container Security Summit, @TimAllClair lead a discussion about the issues with PSPs and their future. At that time, there were not obvious answers.
In it, he and the rest of SIG-AUTH summarizes some of the solutions:
As of today, we’ve seen some documentation on what a replacement PSP might look like but we’re getting close to needing a decision before it’s removed completely.
The Pod Security Policy API is still in Beta and as a result, it should adhere to Kubernetes’s API lifecycle policy that if a Beta feature cannot be moved to Stable (GA) then it should be removed. This is outlined in the Kubernetes blog post, Moving Forward From Beta.
“If you’re using Kubernetes, there’s a good chance that you’re using a beta feature. …If you’re using or generating Kubernetes manifests that use beta APIs like Ingress, you’ll need to plan to revise those. The current APIs are going to be deprecated following a schedule (the 9 months I mentioned earlier) and after a further 9 months those deprecated APIs will be removed. At that point, to stay current with Kubernetes, you should already have migrated.” - Kubernetes.io
That means if all happens without a plan, The PSP admission controller would be removed from Kubernetes 1.22 which is likely to be released this coming summer.
And then let me back up from such a dramatic statement to say that we should all think about PSPs demise and plan for it the best we can, but
the Kubernetes community has worked, and I believe, will continue to work to provide
guidance on what would need to be required to build a Pod Security Policy replacement-ish-thingy. I could imagine a scenario where a SIG decides to provide guidance on what a “secure pod” is supposed to look like, like the Pod Security Standards they’ve already written, and leave it as that. .
It’s not like PSP’s will be ripped out of your servers and we all take to the streets in revolt! Most managed providers have already come up with a plan to either extend their support, or in the case of Azure, provide an implementation of OPA Gatekeeper as part of AKS.
I can’t predict what’s going to happen but in the last SIG-Auth meeting, the new minted SIG-Security group (currently chaired by @iancoldwater and @tabbysable) chimed in to offer to take ownership of this difficult problem. I think they should. And in doing so I think they should remove PSPs completely and provide specific, guidance on an officially suggested solution such as OPA Gatekeeper. It could be one of their first big wins as a group.
But in the mean time, here’s the options that I think most organizations are considering:
We’ve seen many large organizations build their own custom Admission Controllers and custom policy systems. (I mention large organizations not because it’s technically difficult, but it often needs a team that is willing to take the burden of maintenance and support.) On first glance, these seem like a very scary undertaking for an everyday operator, and it is. But also there’s some additional value in going this route. I might even recommend it.
For one, you know exactly what the admission controllers are going to do, how to manage it, and maybe even integrate it with some existing business system that provides a similar logic regarding policy.
Another reason is there’s no end to what kind of customizations you can make to have it fit in with your needs. A dynamic admission controller is just arbitrary code that gets executed when an object is created. Some of them work by modifying the object itself to include certain annotations and configurations like the Istio admission controller. Others simply read the object spec and return a Pass/Fail grade.
It’s a running joke with my coworkers that whenever you say “How do you secure X in kubernetes”, the answer is always a custom admission controller of some sort.
This is by far the most popular and reasonable choice for the everyday operator. OPA is under heavy development and has buy-in from a lot of organizations. I’ve seen this as the most popular solution to helping getting a hold of what’s getting deployed into a shared cluster and not just as an alternative to PSPs.
Some use OPA Gatekeeper as basically a regex version of a PSP but it is becoming more versatile. You can define which image repos to pull from and come up with complicated constraints to apply to your cluster. It’s customizable enough to build a lot of security controls while not needing to write your own admission controllers from scratch.
In reality, many (most) organizations that I’ve worked with have plans to use a Kubernetes security product of some sort. If it’s your Sysdig’s, Twistlock’s, Stackrox, Aqua, of the world (no I don’t have a preference), they all usually have some kind of controller that is similar to what a PSP could provide but most of these products are vastly different with their goals. Some even provide an actual Admission Controller and do exactly what a PSP does. Some offer a runtime solution watching Pods at the syscall level.
I usually recommend staying away from any of these products until you’ve determined that native solutions (e.g. OPA, K-Rail) don’t fit in with your business needs otherwise you run into a Chestertons Fence situation.
It may seem impossible to stay up to speed on Kubernetes, let alone its security controls. But joining in some of the SIGs can keep your interests focused.
Many thanks to @Raesene and @TimAllclair who helped point me towards information and give me some feedback. Any opinions interpreted as snarky are own of course, not theirs.
Also, don’t forget to checkout @Raesene et al’s Kubecon panel you can probably even ask them about PSPs if you wanted. :)
UPDATE 11/28/2020: Thanks to (@jaybeale)[https://twitter.com/jaybeale] and @sethsec for pointing out I was calling it “OPA Gateway” instead of OPA Gatekeeper.