K8sCalc

kubernetes

etcd Backup CronJob Generator

Generate a Kubernetes CronJob that automatically takes etcd snapshots and uploads them to S3. Includes the Secret, ServiceAccount, and complete CronJob YAML.

Automated etcd Backup Strategy

etcd is the only stateful component of a Kubernetes control plane. If you lose etcd data without a backup, you lose the entire cluster state — all Deployments, Services, Secrets, PVCs, and CRDs.

Why Automated Backups

Manual backups get skipped. Automated CronJobs run reliably even when your team is focused elsewhere. The generated CronJob:

  • Runs on a schedule you control
  • Uses etcdctl to take a verified snapshot
  • Uploads to S3 with a date-stamped filename
  • Prunes old backups to control storage costs
  • Logs every step for auditability

Snapshot Verification

The CronJob runs etcdctl snapshot status after taking the snapshot. This verifies the snapshot is not corrupted before uploading. A corrupted backup is worse than no backup — it gives false confidence.

Recovery Time Objective

With daily backups, your worst-case data loss is 24 hours. For critical clusters, use 6-hourly backups (0 */6 * * *). The snapshot process takes ~10 seconds for most clusters and has negligible impact on etcd performance.

S3 Retention Formula

storage_cost = snapshot_size_gb × backups_per_day × retention_days × 0.0119

Example: 500 MB snapshot, 1/day, 7 days = 3.5 GB × €0.0119 = ~€0.04/month.

etcdctl Authentication

etcd in kubeadm clusters uses mTLS. The CronJob mounts the PKI directory from the host (/etc/kubernetes/pki/etcd) and uses the healthcheck-client certificate, which has read access to etcd without write permissions — following the principle of least privilege.

Frequently Asked Questions

Why does the CronJob run on control plane nodes?

etcd only runs on control plane nodes. The backup script uses the etcd PKI certificates at /etc/kubernetes/pki/etcd/ which only exist on control plane nodes. The generated CronJob uses a nodeSelector and toleration to ensure it schedules on a control plane node.

How do I verify the backup is working?

Check the CronJob: `kubectl -n kube-system get cronjobs`. Check recent job runs: `kubectl -n kube-system get jobs`. Check logs: `kubectl -n kube-system logs job/etcd-backup-<timestamp>`. Also verify the S3 bucket has new files after the scheduled time.

How do I restore etcd from an S3 backup?

1. Download the snapshot: `aws s3 cp s3://bucket/snapshot.db /tmp/etcd-restore.db`. 2. Stop the API server: remove /etc/kubernetes/manifests/kube-apiserver.yaml. 3. Restore: `etcdctl snapshot restore /tmp/etcd-restore.db --data-dir /var/lib/etcd-restored`. 4. Swap data dirs and restart etcd. 5. Restore the API server manifest.

Can I use this with Hetzner Object Storage?

Yes. Set the S3 endpoint to https://s3.eu-central-1.amazonaws.com and region to eu-central-1. Hetzner Object Storage is S3-compatible and the AWS CLI (used in the CronJob) works with it directly. Generate S3 credentials in the Hetzner Console under Object Storage → Access Keys.