devops-stack-module-velero
A DevOps Stack module to deploy a backup solution based on Velero.
This module add the ability to perform backups of a cluster. This includes all Kubernetes objects, although with the GitOps approach of the Devops Stack, it is not necessary to have a backup of all Kubernetes objects, as they are generally disposable and recreated by ArgoCD.
The Velero chart used by this module is shipped in this repository as well, in order to avoid any unwanted behaviors caused by unsupported versions.
Current Chart Version | Original Repository | Default Values |
---|---|---|
5.0.2 |
Velero needs a storage location for backups, either a S3 compatible storage (AWS S3, Minio, etc.), Azure Blob storage, etc.
Since this module is meant to be instantiated using its variants, the usage documentation is available in each variant ( EKS ).
Usage
Backup
For an example of deployment, please see the variants documentation.
Once the Velero controller is deployed on the server, it is also possible to perform manual backup from the velero
client:
velero backup create \
--included-namespaces wordpress,loki \
--included-resources pv,pvc,pods \
--storage-location my-s3-bucket \
my-new-backup
Restore
The backup can be restored manually with the following commands:
# Restore from a backup
velero restore create --from-backup my-new-backup
# Restore from the last backup of a schedule
velero restore create --from-schedule restic-schedule
More options are described in the Velero documentation.
In order to perform restore successfully, Velero need that the object doesn’t already exists in the target cluster. As a disaster recovery tool, it is not able to merge an existing object such as a PVC with its backup. |
Monitoring
If enabled, this module exposes Velero metrics and setup a scrape target for Prometheus.
Grafana dashboard
This module contains a Grafana dashboard to monitor the frequency, state and duration of backups, as shown here below.
Alerts
This module also setups some alerts in Alertmanager for the following cases:
-
Partially failed backups ratio higher than a specified amount (
alert_partially_failed_ratio
); -
Failed backups ratio higher than a specified amount (
alert_failed_ratio
); -
No successful backup for a specified amount of time (
alert_backup_timeout
);
Technical Reference
Requirements
The following requirements are needed by this module:
-
argocd (>= 4)
-
kubernetes (~> 2)
-
null (>= 3)
-
utils (>= 1)
Providers
The following providers are used by this module:
-
null (>= 3)
-
kubernetes (~> 2)
-
utils (>= 1)
-
argocd (>= 4)
Resources
The following resources are used by this module:
-
argocd_application.this (resource)
-
argocd_project.this (resource)
-
kubernetes_namespace.velero_namespace (resource)
-
kubernetes_secret.velero_repo_credentials (resource)
-
null_resource.dependencies (resource)
-
null_resource.this (resource)
-
random_password.restic_repo_password (resource)
-
utils_deep_merge_yaml.values (data source)
Required Inputs
The following input variables are required:
cluster_name
Description: Name given to the cluster. Value used for naming some the resources created by the module.
Type: string
base_domain
Description: Base domain of the cluster. Value used for the ingress' URL of the application.
Type: string
Optional Inputs
The following input variables are optional (have default values):
argocd_namespace
Description: Namespace used by Argo CD where the Application and AppProject resources should be created.
Type: string
Default: "argocd"
target_revision
Description: Override of target revision of the application chart.
Type: string
Default: "v1.0.0"
cluster_issuer
Description: SSL certificate issuer to use. Usually you would configure this value as letsencrypt-staging
or letsencrypt-prod
on your root *.tf
files.
Type: string
Default: "ca-issuer"
namespace
Description: Namespace where the applications’s Kubernetes resources should be created. Namespace will be created in case it doesn’t exist.
Type: string
Default: "velero"
helm_values
Description: Helm chart value overrides. They should be passed as a list of HCL structures.
Type: any
Default: []
app_autosync
Description: Automated sync options for the Argo CD Application resource.
Type:
object({
allow_empty = optional(bool)
prune = optional(bool)
self_heal = optional(bool)
})
Default:
{
"allow_empty": false,
"prune": true,
"self_heal": true
}
dependency_ids
Description: IDs of the other modules on which this module depends on.
Type: map(string)
Default: {}
backup_schedules
Description: TBD
Type:
map(object({
disabled = optional(bool, false)
labels = optional(map(string), {})
annotations = optional(map(string), {})
schedule = string
template = object({
# labels = optional(map(string), {}) # TODO: test
# annotations = optional(map(string), {}) # TODO: test
storageLocation = optional(string)
ttl = optional(string)
includedNamespaces = list(string)
includedResources = list(string)
# enableSnapshot = optional(bool, true)
})
}))
Default: null
enable_monitoring_dashboard
Description: Boolean to enable the provisioning of a Velero dashboard for Grafana.
Type: bool
Default: true
alert_partially_failed_ratio
Description: Percentage of partially failed backups before triggering a Prometheus alert
Type: number
Default: 0.25
alert_failed_ratio
Description: Percentage of failed backups before triggering a Prometheus alert
Type: number
Default: 0.25
alert_backup_timeout
Description: Timeout in seconds before triggering the last successful backup alert
Type: number
Default: 86400
Outputs
The following outputs are exported:
id
Description: ID to pass other modules in order to refer to this module as a dependency.
restic_repo_password
Description: the password to access the restic repositories
Reference in table format
Show tables
= Requirements
Name | Version |
---|---|
>= 4 |
|
~> 2 |
|
>= 3 |
|
>= 1 |
= Providers
Name | Version |
---|---|
n/a |
|
~> 2 |
|
>= 1 |
|
>= 4 |
|
>= 3 |
= Resources
Name | Type |
---|---|
resource |
|
resource |
|
resource |
|
resource |
|
resource |
|
resource |
|
resource |
|
data source |
= Inputs
Name | Description | Type | Default | Required |
---|---|---|---|---|
Name given to the cluster. Value used for naming some the resources created by the module. |
|
n/a |
yes |
|
Base domain of the cluster. Value used for the ingress' URL of the application. |
|
n/a |
yes |
|
Namespace used by Argo CD where the Application and AppProject resources should be created. |
|
|
no |
|
Override of target revision of the application chart. |
|
|
no |
|
SSL certificate issuer to use. Usually you would configure this value as |
|
|
no |
|
Namespace where the applications’s Kubernetes resources should be created. Namespace will be created in case it doesn’t exist. |
|
|
no |
|
Helm chart value overrides. They should be passed as a list of HCL structures. |
|
|
no |
|
Automated sync options for the Argo CD Application resource. |
|
|
no |
|
IDs of the other modules on which this module depends on. |
|
|
no |
|
TBD |
|
|
no |
|
Boolean to enable the provisioning of a Velero dashboard for Grafana. |
|
|
no |
|
Percentage of partially failed backups before triggering a Prometheus alert |
|
|
no |
|
Percentage of failed backups before triggering a Prometheus alert |
|
|
no |
|
Timeout in seconds before triggering the last successful backup alert |
|
|
no |
= Outputs
Name | Description |
---|---|
ID to pass other modules in order to refer to this module as a dependency. |
|
the password to access the restic repositories |