kubernetes

Kubernetes HPA Generator

Generate a Kubernetes HorizontalPodAutoscaler manifest targeting CPU and/or memory utilisation. Supports custom metrics and scale-down stabilisation.

Generator

Inputs

HPA Name

Namespace

Deployment Name

The Deployment this HPA manages.

Min Replicas

Max Replicas

CPU Target Utilisation

Scale out when average CPU across pods exceeds this threshold.

Also scale on Memory

Scale-down stabilisation window

Prevents thrashing — HPA won't scale down until this window passes with consistently low load.

Output — hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max

Kubernetes HorizontalPodAutoscaler

HPA continuously watches pod metrics and adjusts the replica count of a Deployment (or StatefulSet) to match a target utilisation.

How it works

desired_replicas = ceil(current_replicas × (current_metric / target_metric))

If 3 pods are running at 90% CPU and target is 70%: desired = ceil(3 × 90/70) = ceil(3.86) = 4 pods

Scale-up vs scale-down

	Scale-up	Scale-down
Default period	15s	300s stabilisation window
Behaviour	Aggressive	Conservative
Why	Avoid overload	Avoid thrashing

Custom metrics (KEDA)

For queue depth, request rate, or external metrics, consider KEDA (Kubernetes Event-Driven Autoscaler):

yaml

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  triggers:
    - type: prometheus
      metadata:
        query: sum(rate(http_requests_total[2m]))
        threshold: "100"

Prerequisites

bash

# Install metrics-server (required for CPU/memory HPA)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify kubectl top pods kubectl top nodes ```

Key Terms

Full glossary →

kubeadm

A tool for bootstrapping Kubernetes clusters. It automates the setup of control plane components and joining worker nodes, following Kubernetes best practices.

etcd

A distributed key-value store used by Kubernetes to store all cluster state and configuration. etcd is the single source of truth for the entire cluster.

cert-manager

A Kubernetes controller for automating TLS certificate management. cert-manager can issue certificates from Let's Encrypt, Vault, or internal CAs, and automatically renews them.

Helm

A package manager for Kubernetes. Helm charts bundle Kubernetes manifests into reusable packages with configurable values, versioned and published to chart repositories.

Frequently Asked Questions

What's the difference between HPA and VPA?

HPA (HorizontalPodAutoscaler) adds more pod replicas. VPA (VerticalPodAutoscaler) adjusts the CPU/memory requests of existing pods. HPA is for stateless apps; VPA is useful for single-replica stateful workloads or when you can't scale horizontally. They can be used together but need careful configuration to avoid conflicts.

Why does HPA need metrics-server?

HPA reads resource metrics (CPU, memory) via the Kubernetes metrics API, which is provided by metrics-server. Without it, HPA cannot observe utilisation and will not scale. Install it with: kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

What does the stabilisation window prevent?

Without a scale-down stabilisation window, HPA might scale down too aggressively after a short traffic spike — then scale back up again when traffic returns, causing thrashing. The default 300s window means HPA waits 5 minutes of consistently low load before reducing replicas.

What's a good CPU target percentage?

70% is a common starting point — it leaves 30% headroom before a pod is CPU-saturated while still triggering scale-out early enough. Setting it too high (>85%) means you scale out too late and requests queue. Too low (<50%) means you waste compute on pods that are mostly idle.

Related Calculators

HPA Thresholds K8s Node Sizing

Related Generators

K8s Deployment K8s PDB K8s StatefulSet

Related Guides

kubernetes

Kubernetes HPA Generator

Generator

Inputs

Output — hpa.yaml

Kubernetes HorizontalPodAutoscaler

How it works

Scale-up vs scale-down

Custom metrics (KEDA)

Prerequisites

Key Terms

Frequently Asked Questions

What's the difference between HPA and VPA?

Why does HPA need metrics-server?

What does the stabilisation window prevent?

What's a good CPU target percentage?

Related Calculators

Related Generators

Related Guides

CI/CD for Kubernetes with GitHub Actions: A Complete Guide (2026)

ArgoCD vs Flux: Choosing a GitOps Tool for Kubernetes in 2026

Hetzner vs DigitalOcean for Kubernetes in 2026: An Honest Comparison

Generator

Inputs

Output — hpa.yaml