kubernetes
Kubernetes Autoscaler Threshold Calculator
Calculate the right HPA (Horizontal Pod Autoscaler) CPU and memory thresholds for your Kubernetes workloads. Get absolute target values in millicores and MiB.
Kubernetes HPA Configuration Guide
The Horizontal Pod Autoscaler (HPA) automatically scales Deployment replicas based on CPU, memory, or custom metrics. Correct threshold configuration is critical — wrong values cause either constant scale-flapping or under-utilized pods.
HPA v2 Spec Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: AverageValue
averageValue: 350m # 70% of 500m limit
behavior:
scaleDown:
stabilizationWindowSeconds: 300Scaling Algorithm
desired_replicas = ceil(current_replicas × (current_metric / target_metric))If 3 pods average 700m CPU with a 350m target: ceil(3 × (700/350)) = 6 replicas.
Common Mistakes
- 1.Target on limits not requests: HPA
type: Utilizationtargets a % of _requests_, not limits. If your requests are lower than limits, you'll scale out sooner than expected. - 2.No stabilization window: Default scale-down is immediate — configure a stabilization window to prevent yo-yo scaling.
- 3.CPU limits too low: If your app regularly hits CPU limits (throttling), it will never trigger HPA scale-out — it just runs slow instead.
- 4.Max replicas too low: If maxReplicas is too small, HPA can't scale to meet demand. Monitor
kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicasfor this condition.
Frequently Asked Questions
What is the right CPU target for HPA?
70% is the standard production default. This gives 30% headroom between scale-out triggers, preventing constant flapping. If your app has very spiky load, use 60%. If load is predictable and gradual, 80% is acceptable.
Why does HPA use absolute millicores instead of percentage?
HPA v2 supports both: `type: Utilization` (percentage of request, not limit) and `type: AverageValue` (absolute millicores). The calculator outputs absolute values because they're more predictable — `type: Utilization` depends on your requests being set accurately.
Why is memory-based autoscaling less reliable than CPU-based?
Memory is not compressible — a pod using 400 Mi of a 512 Mi limit isn't 'using too much CPU', it's just holding memory. Many apps never release memory to the OS even when idle. This causes HPA to scale out unnecessarily. Use memory-based HPA only for apps with predictable memory growth patterns.
How do I prevent HPA scale flapping?
Set a stabilization window: `behavior.scaleDown.stabilizationWindowSeconds: 300` (5 minutes). This prevents rapid scale-in after a brief traffic drop. Also ensure your --horizontal-pod-autoscaler-sync-period (default 15s) and --horizontal-pod-autoscaler-downscale-stabilization (default 5m) are appropriate.
Related Tools
Related Guides
kubernetes
Hetzner vs DigitalOcean for Kubernetes in 2025: An Honest Comparison
Hetzner is 3–5× cheaper than DigitalOcean for equivalent Kubernetes compute. But DO has managed K8s, better global coverage, and a larger app marketplace. Here's when each is the right choice.
kubernetes
Kubernetes Certificate Renewal: What Breaks and How to Fix It
kubeadm certificates expire after 1 year. Here's what actually breaks, how to check expiry, and the exact commands to renew before your cluster goes read-only.
kubernetes
Self-Hosted Kubernetes on Hetzner Cloud: Complete Setup Guide (2025)
A practical guide to running a production-grade HA Kubernetes cluster on Hetzner Cloud using kubeadm, HAProxy, Keepalived, and Longhorn — at a fraction of managed K8s cost.