K8sCalc

kubernetes

Kubernetes HPA Generator

Generate a Kubernetes HorizontalPodAutoscaler manifest targeting CPU and/or memory utilisation. Supports custom metrics and scale-down stabilisation.

Kubernetes HorizontalPodAutoscaler

HPA continuously watches pod metrics and adjusts the replica count of a Deployment (or StatefulSet) to match a target utilisation.

How it works

desired_replicas = ceil(current_replicas × (current_metric / target_metric))

If 3 pods are running at 90% CPU and target is 70%: desired = ceil(3 × 90/70) = ceil(3.86) = 4 pods

Scale-up vs scale-down

Scale-upScale-down
Default period15s300s stabilisation window
BehaviourAggressiveConservative
WhyAvoid overloadAvoid thrashing

Custom metrics (KEDA)

For queue depth, request rate, or external metrics, consider KEDA (Kubernetes Event-Driven Autoscaler):

yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  triggers:
    - type: prometheus
      metadata:
        query: sum(rate(http_requests_total[2m]))
        threshold: "100"

Prerequisites

bash
# Install metrics-server (required for CPU/memory HPA)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify kubectl top pods kubectl top nodes ```

Frequently Asked Questions

What's the difference between HPA and VPA?

HPA (HorizontalPodAutoscaler) adds more pod replicas. VPA (VerticalPodAutoscaler) adjusts the CPU/memory requests of existing pods. HPA is for stateless apps; VPA is useful for single-replica stateful workloads or when you can't scale horizontally. They can be used together but need careful configuration to avoid conflicts.

Why does HPA need metrics-server?

HPA reads resource metrics (CPU, memory) via the Kubernetes metrics API, which is provided by metrics-server. Without it, HPA cannot observe utilisation and will not scale. Install it with: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

What does the stabilisation window prevent?

Without a scale-down stabilisation window, HPA might scale down too aggressively after a short traffic spike — then scale back up again when traffic returns, causing thrashing. The default 300s window means HPA waits 5 minutes of consistently low load before reducing replicas.

What's a good CPU target percentage?

70% is a common starting point — it leaves 30% headroom before a pod is CPU-saturated while still triggering scale-out early enough. Setting it too high (>85%) means you scale out too late and requests queue. Too low (<50%) means you waste compute on pods that are mostly idle.