K8sCalc

kubernetes

Kubernetes Autoscaler Threshold Calculator

Calculate the right HPA (Horizontal Pod Autoscaler) CPU and memory thresholds for your Kubernetes workloads. Get absolute target values in millicores and MiB.

Kubernetes HPA Configuration Guide

The Horizontal Pod Autoscaler (HPA) automatically scales Deployment replicas based on CPU, memory, or custom metrics. Correct threshold configuration is critical — wrong values cause either constant scale-flapping or under-utilized pods.

HPA v2 Spec Example

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: AverageValue
        averageValue: 350m   # 70% of 500m limit
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

Scaling Algorithm

desired_replicas = ceil(current_replicas × (current_metric / target_metric))

If 3 pods average 700m CPU with a 350m target: ceil(3 × (700/350)) = 6 replicas.

Common Mistakes

  1. 1.Target on limits not requests: HPA type: Utilization targets a % of _requests_, not limits. If your requests are lower than limits, you'll scale out sooner than expected.
  2. 2.No stabilization window: Default scale-down is immediate — configure a stabilization window to prevent yo-yo scaling.
  3. 3.CPU limits too low: If your app regularly hits CPU limits (throttling), it will never trigger HPA scale-out — it just runs slow instead.
  4. 4.Max replicas too low: If maxReplicas is too small, HPA can't scale to meet demand. Monitor kube_horizontalpodautoscaler_status_current_replicas == kube_horizontalpodautoscaler_spec_max_replicas for this condition.

Frequently Asked Questions

What is the right CPU target for HPA?

70% is the standard production default. This gives 30% headroom between scale-out triggers, preventing constant flapping. If your app has very spiky load, use 60%. If load is predictable and gradual, 80% is acceptable.

Why does HPA use absolute millicores instead of percentage?

HPA v2 supports both: `type: Utilization` (percentage of request, not limit) and `type: AverageValue` (absolute millicores). The calculator outputs absolute values because they're more predictable — `type: Utilization` depends on your requests being set accurately.

Why is memory-based autoscaling less reliable than CPU-based?

Memory is not compressible — a pod using 400 Mi of a 512 Mi limit isn't 'using too much CPU', it's just holding memory. Many apps never release memory to the OS even when idle. This causes HPA to scale out unnecessarily. Use memory-based HPA only for apps with predictable memory growth patterns.

How do I prevent HPA scale flapping?

Set a stabilization window: `behavior.scaleDown.stabilizationWindowSeconds: 300` (5 minutes). This prevents rapid scale-in after a brief traffic drop. Also ensure your --horizontal-pod-autoscaler-sync-period (default 15s) and --horizontal-pod-autoscaler-downscale-stabilization (default 5m) are appropriate.

Related Tools

Related Guides