Most Kubernetes clusters are deployed insecure by default and hardened reactively — after an incident, a failed audit, or a pentest finding. This checklist is the proactive version. Work through it top-to-bottom on any cluster you run in production.
Items are grouped by area. Each item includes the command or YAML needed to implement it, not just a description.
1. RBAC
1.1 — Audit existing ClusterRoleBindings with cluster-admin
kubectl get clusterrolebindings \
-o jsonpath='{range .items[?(@.roleRef.name=="cluster-admin")]}{.metadata.name}{"\t"}{range .subjects[*]}{.kind}{": "}{.name}{"\n"}{end}{end}'
Every non-system principal bound to cluster-admin is a risk. Replace with a least-privilege ClusterRole scoped to actual needs.
1.2 — Never bind cluster-admin to service accounts
Service accounts used by applications should have the minimum permissions required. Use the Kubernetes RBAC Generator to create properly scoped roles.
# Reader role for a monitoring namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: monitoring-reader
namespace: monitoring
rules:
- apiGroups: [""]
resources: ["pods", "services", "endpoints"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: monitoring-reader
namespace: monitoring
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
roleRef:
kind: Role
name: monitoring-reader
apiGroup: rbac.authorization.k8s.io
1.3 — Disable automounting of service account tokens
Unless a pod needs to call the Kubernetes API, disable token automounting:
spec:
automountServiceAccountToken: false
Set this on all Deployments where application code does not interact with the K8s API. Also set it on ServiceAccount objects directly:
apiVersion: v1
kind: ServiceAccount
metadata:
name: myapp
automountServiceAccountToken: false
1.4 — Use namespaced Roles over ClusterRoles wherever possible
A ClusterRole grants permissions cluster-wide. If your app only runs in one namespace, a namespaced Role + RoleBinding limits blast radius.
2. Network Policies
By default, every pod in a Kubernetes cluster can talk to every other pod. This is the correct starting point: lock it down.
2.1 — Default deny all ingress and egress per namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Apply this to every namespace that runs application workloads. Use the Kubernetes Network Policy Generator to scaffold allow-rules on top.
2.2 — Explicitly allow only required service-to-service traffic
# Allow the API service to reach the database on port 5432
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: postgres
ingress:
- from:
- podSelector:
matchLabels:
app: api
ports:
- protocol: TCP
port: 5432
2.3 — Restrict egress to known external endpoints
If a pod only needs to reach your internal services and a specific external API, lock egress to that:
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes:
- Egress
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/8 # Internal cluster traffic
- to:
- ipBlock:
cidr: 203.0.113.10/32 # Payment gateway IP
ports:
- port: 443
3. Pod Security
3.1 — Enable Pod Security Admission
As of Kubernetes 1.25+, PodSecurityPolicy is removed. Use Pod Security Admission (PSA) instead. Label namespaces:
# Enforce restricted policy in production namespace
kubectl label namespace production \
pod-security.kubernetes.io/enforce=restricted \
pod-security.kubernetes.io/enforce-version=latest
The restricted profile blocks: running as root, privilege escalation, hostPath volumes, and requires dropping all capabilities.
3.2 — Set explicit security context on all containers
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
3.3 — Set resource limits on every container
Containers without limits can be scheduled on any node and consume all available resources. This is also a DoS vector if a pod is compromised.
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
3.4 — Never use hostNetwork, hostPID, or hostIPC
These mount host namespaces into the pod. If compromised, an attacker has direct access to host-level network interfaces, process list, or IPC. Audit:
kubectl get pods --all-namespaces -o json | \
jq '.items[] | select(.spec.hostNetwork==true or .spec.hostPID==true) | .metadata.name'
4. Secrets Management
4.1 — Never store secrets in ConfigMaps
ConfigMaps are not encrypted at rest by default, are not redacted in logs, and are accessible to any principal with get configmaps permission.
4.2 — Enable encryption at rest for etcd
# /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-32-byte-key>
- identity: {}
Pass to kube-apiserver: --encryption-provider-config=/etc/kubernetes/encryption-config.yaml
4.3 — Rotate secrets regularly and use external secret stores
For production clusters, use External Secrets Operator to sync secrets from Vault, AWS Secrets Manager, or GCP Secret Manager. This avoids storing sensitive values in etcd entirely.
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: db-credentials
data:
- secretKey: password
remoteRef:
key: production/db
property: password
5. Image Security
5.1 — Pin image tags to digests in production
image: nginx:latest is a footgun. Use digest pinning:
# Get the digest
docker pull nginx:1.27.3
docker inspect nginx:1.27.3 --format='{{index .RepoDigests 0}}'
# nginx@sha256:abc123...
Then use: image: nginx@sha256:abc123...
5.2 — Scan images in CI before deploying
# GitHub Actions step
- name: Scan image
uses: aquasecurity/trivy-action@master
with:
image-ref: myapp/frontend:${{ github.sha }}
format: table
exit-code: 1
severity: CRITICAL,HIGH
5.3 — Use a private registry with admission webhooks
Only allow images from your internal registry. Use a ValidatingAdmissionWebhook (e.g., Kyverno or OPA Gatekeeper) to enforce this:
# Kyverno policy — deny images not from internal registry
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: Enforce
rules:
- name: validate-registries
match:
any:
- resources:
kinds: [Pod]
validate:
message: "Images must be from registry.example.com"
pattern:
spec:
containers:
- image: "registry.example.com/*"
6. Audit Logging
6.1 — Enable API server audit logging
Add to kube-apiserver flags:
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-maxsize=100
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
6.2 — Define a meaningful audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all secret access at RequestResponse level
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
# Log exec into pods
- level: RequestResponse
verbs: ["create"]
resources:
- group: ""
resources: ["pods/exec", "pods/portforward"]
# Minimal logging for reads
- level: Metadata
verbs: ["get", "list", "watch"]
# Drop noisy system checks
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: ""
resources: ["endpoints", "services"]
6.3 — Ship audit logs to a tamper-resistant external store
Audit logs stored only on the API server node are worthless if that node is compromised. Ship them to an external SIEM (Loki, Elasticsearch, Splunk) immediately.
Quick Wins Summary
If you implement nothing else from this list, do these five things today:
- ›Audit and remove unnecessary
cluster-adminbindings - ›Apply
default-deny-allNetworkPolicy to production namespaces - ›Enable
runAsNonRoot: trueandallowPrivilegeEscalation: falseon all containers - ›Enable etcd encryption at rest for Secrets
- ›Enable API server audit logging with a policy that captures secret access and pod exec events