kubernetes
Kubernetes HPA Generator
Generate a Kubernetes HorizontalPodAutoscaler manifest targeting CPU and/or memory utilisation. Supports custom metrics and scale-down stabilisation.
Kubernetes HorizontalPodAutoscaler
HPA continuously watches pod metrics and adjusts the replica count of a Deployment (or StatefulSet) to match a target utilisation.
How it works
desired_replicas = ceil(current_replicas × (current_metric / target_metric))If 3 pods are running at 90% CPU and target is 70%: desired = ceil(3 × 90/70) = ceil(3.86) = 4 pods
Scale-up vs scale-down
| Scale-up | Scale-down | |
|---|---|---|
| Default period | 15s | 300s stabilisation window |
| Behaviour | Aggressive | Conservative |
| Why | Avoid overload | Avoid thrashing |
Custom metrics (KEDA)
For queue depth, request rate, or external metrics, consider KEDA (Kubernetes Event-Driven Autoscaler):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
triggers:
- type: prometheus
metadata:
query: sum(rate(http_requests_total[2m]))
threshold: "100"Prerequisites
# Install metrics-server (required for CPU/memory HPA)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml# Verify kubectl top pods kubectl top nodes ```
Frequently Asked Questions
What's the difference between HPA and VPA?
HPA (HorizontalPodAutoscaler) adds more pod replicas. VPA (VerticalPodAutoscaler) adjusts the CPU/memory requests of existing pods. HPA is for stateless apps; VPA is useful for single-replica stateful workloads or when you can't scale horizontally. They can be used together but need careful configuration to avoid conflicts.
Why does HPA need metrics-server?
HPA reads resource metrics (CPU, memory) via the Kubernetes metrics API, which is provided by metrics-server. Without it, HPA cannot observe utilisation and will not scale. Install it with: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
What does the stabilisation window prevent?
Without a scale-down stabilisation window, HPA might scale down too aggressively after a short traffic spike — then scale back up again when traffic returns, causing thrashing. The default 300s window means HPA waits 5 minutes of consistently low load before reducing replicas.
What's a good CPU target percentage?
70% is a common starting point — it leaves 30% headroom before a pod is CPU-saturated while still triggering scale-out early enough. Setting it too high (>85%) means you scale out too late and requests queue. Too low (<50%) means you waste compute on pods that are mostly idle.