kubernetes

Kubernetes Uptime SLA Calculator

Convert SLA percentages to real downtime. See exactly how many minutes of downtime 99.9%, 99.95%, and 99.99% actually allow per year, month, week, and day.

Understanding SLA Nines

SLA targets are expressed as a percentage of time the system is available. The "nines" shorthand refers to how many 9s appear in the percentage.

Downtime Budget Reference

SLA	Downtime/year	Downtime/month	Downtime/day
99.0%	3.65 days	7.3 hours	14.4 min
99.5%	1.83 days	3.6 hours	7.2 min
99.9%	8.76 hours	43.8 min	1.44 min
99.95%	4.38 hours	21.9 min	43.2 sec
99.99%	52.6 min	4.4 min	8.6 sec
99.999%	5.26 min	26.3 sec	0.86 sec

Kubernetes SLA Layers

Your overall SLA is the product of all layers:

Total SLA = control_plane_sla × worker_sla × network_sla × storage_sla

If each layer is 99.99%, your total is: 0.9999⁴ = 99.96% — still only three nines.

What Breaks 99.9%

›A single unplanned node reboot: ~5 min (if pods reschedule immediately)
›A botched rolling update: 10–30 min
›etcd quorum loss: hours (requires manual recovery)
›Network partition: depends on timeout + recovery

PodDisruptionBudget for Planned Maintenance

yaml

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2   # keep at least 2 pods up during node drain
  selector:
    matchLabels:
      app: my-app

Key Terms

Full glossary →

kubeadm

A tool for bootstrapping Kubernetes clusters. It automates the setup of control plane components and joining worker nodes, following Kubernetes best practices.

etcd

A distributed key-value store used by Kubernetes to store all cluster state and configuration. etcd is the single source of truth for the entire cluster.

cert-manager

A Kubernetes controller for automating TLS certificate management. cert-manager can issue certificates from Let's Encrypt, Vault, or internal CAs, and automatically renews them.

Helm

A package manager for Kubernetes. Helm charts bundle Kubernetes manifests into reusable packages with configurable values, versioned and published to chart repositories.

Frequently Asked Questions

What does 99.9% uptime actually mean?

99.9% uptime allows 8.76 hours of downtime per year — that's about 43.8 minutes per month or 10 minutes per week. It sounds high but is actually a fairly modest target. Most Kubernetes clusters running on self-managed infrastructure realistically achieve 99.9–99.95%.

What SLA does a managed Kubernetes service guarantee?

Major cloud providers typically guarantee 99.9–99.95% for their control plane SLA. GKE, EKS, and AKS all offer ~99.95% for the API server. Note: worker node uptime is separate and depends on your instance types. Spot/preemptible nodes do not contribute to your SLA.

Can I achieve 99.99% on self-hosted Kubernetes?

Yes, but it requires significant investment: multi-region or multi-AZ control plane, redundant load balancers, automated failover, and rigorous change management. Most self-hosted setups targeting 99.99% use a dedicated HA etcd cluster, 3+ control plane nodes across AZs, and active-active load balancers (Keepalived + HAProxy).

How does planned maintenance affect my SLA budget?

Planned maintenance counts against your SLA unless you have zero-downtime rolling upgrades. For 99.9% (8.76 hrs/year), a single 30-minute maintenance window consumes 6% of your annual budget. Use rolling updates, PodDisruptionBudgets, and the Kubernetes upgrade path planner to minimise disruption.

Related Tools

K8s Cluster Cost

Calculate the monthly cost of running a Kubernetes cluster on Hetzner Cloud. Choose server types for control planes, workers, and load balancers with HA mode.

HPA Thresholds

Calculate the right HPA (Horizontal Pod Autoscaler) CPU and memory thresholds for your Kubernetes workloads. Get absolute target values in millicores and MiB.

K8s Node Sizing

Calculate the right number and size of Kubernetes worker nodes for your workloads. Supports Hetzner Cloud and Vultr with verified pricing.

Related Generators

K8s PDB K8s HPA

Related Guides