nodeSelector errors when using kube-bench

I’ve recently begun dipping my toes into the world of Kubernetes security, which is a giant can of worms that can feel slightly overwhelming at first. One tool which is particularly useful, at least as an introduction, is the kube-bench tool by Aqua Security. The tool can run against Kubernetes master and worker nodes, checking how closely they match the CIS Kubernetes benchmark. This has been one starting point for me as I try and understand the various security settings you can apply to Kubernetes clusters and how they fit together.

I did run into a minor issue when setting up the benchmark on a test cluster and thought it might be worth a note for anyone who is similarly getting to grips with Kubernetes. When running in a Kubernetes cluster, the README for the software suggests that it be run against master nodes like so:

kubectl run --rm -i -t kube-bench-master --image=aquasec/kube-bench:latest --restart=Never --overrides="{ \"apiVersion\": \"v1\", \"spec\": { \"hostPID\": true, \"nodeSelector\": { \"kubernetes.io/role\": \"master\" }, \"tolerations\": [ { \"key\": \"node-role.kubernetes.io/master\", \"operator\": \"Exists\", \"effect\": \"NoSchedule\" } ] } }" -- master --version 1.8

This didn’t work for me. The command hung, and going to the Kubernetes dashboard displayed the error No nodes are available that match all of the predicates: MatchNodeSelector (58), PodToleratesNodeTaints (4).

The reason for this error (in my case) was the part of the kubectl command which runs the benchmarking software only on master nodes. The kubectl command above looks for master nodes using the nodeSelector syntax, which in the above case has been told to look for the label kubernetes.io/role, with value master. Except in the Kubernetes cluster I was working on, we don’t have any nodes with that label - I’m not sure if this is a quirk of the cluster I was working with or because the default settings have changed since the README was written.

How did I identify and fix this? I ran kubectl describe nodes (it’s worth mentioning I have admin access to this cluster). This can output a huge amount of information depending on your cluster size, so it might be worth storing the output in a file.

I then grepped the output for any mentions of master, and discovered that all the nodes in the file which had the tolerations key of node-role.kubernetes.io/master set also had the label key type set, with a value of controller. As an optimistic punt, I changed the command above to:

kubectl run --rm -i -t kube-bench-master --image=aquasec/kube-bench:latest --restart=Never --overrides="{ \"apiVersion\": \"v1\", \"spec\": { \"hostPID\": true, \"nodeSelector\": { \"type\": \"controller\" }, \"tolerations\": [ { \"key\": \"node-role.kubernetes.io/master\", \"operator\": \"Exists\", \"effect\": \"NoSchedule\" } ] } }" -- master --version 1.8

This worked!

In the general case, it doesn’t seem as if you can rely on nodes having particular default labels. There’s more documentation about the nodeSelector option here - it does say that The value of these [default] labels is cloud provider specific and is not guaranteed to be reliable. So if you’re having trouble with a similar error I would suggest following the same steps as me and figuring out what labels are common to your master nodes (or creating a label that is common to them). I would be interested to know if there is a more elegant general solution to running a container across only master nodes, especially as I only have a basic working knowledge of Kubernetes at the moment.

*****
Written by Feroz Salam on 11 September 2018