kubernetes
etcd Backup CronJob Generator
Generate a Kubernetes CronJob that automatically takes etcd snapshots and uploads them to S3. Includes the Secret, ServiceAccount, and complete CronJob YAML.
Automated etcd Backup Strategy
etcd is the only stateful component of a Kubernetes control plane. If you lose etcd data without a backup, you lose the entire cluster state — all Deployments, Services, Secrets, PVCs, and CRDs.
Why Automated Backups
Manual backups get skipped. Automated CronJobs run reliably even when your team is focused elsewhere. The generated CronJob:
- ›Runs on a schedule you control
- ›Uses etcdctl to take a verified snapshot
- ›Uploads to S3 with a date-stamped filename
- ›Prunes old backups to control storage costs
- ›Logs every step for auditability
Snapshot Verification
The CronJob runs etcdctl snapshot status after taking the snapshot. This verifies the snapshot is not corrupted before uploading. A corrupted backup is worse than no backup — it gives false confidence.
Recovery Time Objective
With daily backups, your worst-case data loss is 24 hours. For critical clusters, use 6-hourly backups (0 */6 * * *). The snapshot process takes ~10 seconds for most clusters and has negligible impact on etcd performance.
S3 Retention Formula
storage_cost = snapshot_size_gb × backups_per_day × retention_days × 0.0119Example: 500 MB snapshot, 1/day, 7 days = 3.5 GB × €0.0119 = ~€0.04/month.
etcdctl Authentication
etcd in kubeadm clusters uses mTLS. The CronJob mounts the PKI directory from the host (/etc/kubernetes/pki/etcd) and uses the healthcheck-client certificate, which has read access to etcd without write permissions — following the principle of least privilege.
Frequently Asked Questions
Why does the CronJob run on control plane nodes?
etcd only runs on control plane nodes. The backup script uses the etcd PKI certificates at /etc/kubernetes/pki/etcd/ which only exist on control plane nodes. The generated CronJob uses a nodeSelector and toleration to ensure it schedules on a control plane node.
How do I verify the backup is working?
Check the CronJob: `kubectl -n kube-system get cronjobs`. Check recent job runs: `kubectl -n kube-system get jobs`. Check logs: `kubectl -n kube-system logs job/etcd-backup-<timestamp>`. Also verify the S3 bucket has new files after the scheduled time.
How do I restore etcd from an S3 backup?
1. Download the snapshot: `aws s3 cp s3://bucket/snapshot.db /tmp/etcd-restore.db`. 2. Stop the API server: remove /etc/kubernetes/manifests/kube-apiserver.yaml. 3. Restore: `etcdctl snapshot restore /tmp/etcd-restore.db --data-dir /var/lib/etcd-restored`. 4. Swap data dirs and restart etcd. 5. Restore the API server manifest.
Can I use this with Hetzner Object Storage?
Yes. Set the S3 endpoint to https://s3.eu-central-1.amazonaws.com and region to eu-central-1. Hetzner Object Storage is S3-compatible and the AWS CLI (used in the CronJob) works with it directly. Generate S3 credentials in the Hetzner Console under Object Storage → Access Keys.