Kubernetes Upgrade Strategy
A Kubernetes cluster is usually meant to last for years. However, Kubernetes versions are supported for one year, so the cluster needs to be regularly upgraded. This is a great opportunity to adopt a continuous upgrade strategy, which will contribute to limit Technical Debt on the long run.
Two strategies are possible: in-place upgrades or greenfield (aka blue/green).
A greenfield approach avoids dealing with complex Kubernetes upgrades by reinstalling the cluster from scratch for every upgrade. The automation brought by the DevOps Stack makes it easy to adopt a greenfield approach, since the whole stack is managed as code.
The DevOps Stack allows you to have one cluster per git branch (using one Terraform workspace per branch), so both strategies are possible.
Below is a comparison of the two approaches.
The In-place upgrade strategy consists in upgrading the Kubernetes cluster while it is running.
Something could go wrong during the upgrade, requiring to reinstall and redeploy everything from scratch,
To mitigate this risk, set up a lab infrastructure with the exact same upgrade paths to test the upgrade on it beforehand,
Hardly ever tested disaster strategy.
Green field Upgrade
No risk to break cluster on upgrade,
Major infrastructure changes possible without risking to break something (change Ingress Controller, switch to Service Mesh, change your network allocation plan…),
Disaster recovery strategy tested on every upgrade.
New challenges for data persistence: RWO volumes can’t be attached to both cluster at the same time,
May require to synchronize some auto-generated objects to the new cluster (Secrets containing ServiceAccounts tokens if used outside cluster, Secrets containing Certificates generated by cert-manager, PersistentVolumes and PersistentVolumeClaims generated by StorageClasses…),
Requires a fully automated deployment strategy (which is a good thing).
A "blue" branch and a "green" branch. Deploy a brand new cluster on the inactive branch, test everything and once happy, switch the DNS or a frontal Load Balancer to the inactive cluster that becomes the active one.
When adopting a blue/green strategy, static resources —such as VPC, RDS or ElastiCache if you are on AWS— need to be declared in a separate Terraform project.