Deployment on KinD

An example of a local deployment on KinD is provided here. Clone this repository and modify the files at your convenience. In the repository, as in a standard Terraform module, you will find the following files:

  • terraform.tf - declaration of the Terraform providers used in this project as well as their configuration;

  • locals.tf - local variables used by the DevOps Stack modules;

  • main.tf - definition of all the deployed modules;

  • s3_bucket.tf - configuration of the MinIO bucket, used as backend for Loki and Thanos;

  • outputs.tf - the output variables of the DevOps Stack, e.g. credentials and the .kubeconfig file to use with kubectl;

Requirements

On your local machine, you need to have the following tools installed:

Specificities and explanations

Local Load Balancer

MetalLB is used as a load balancer for the cluster. This allows us to have a multi-node KinD cluster without the need to use Traefik in a single replica with a NodePort configuration.

OIDC authentication

The DevOps Stack modules are developed with OIDC in mind. In production, you should have an identity provider that supports OIDC and use it to authenticate to the DevOps Stack applications.
You can have a local containing the OIDC configuration properly structured for the DevOps Stack applications and simply use an external OIDC provider instead of using Keycloak. Check this locals.tf on the Keycloak module for an example.

To quickly deploy a testing environment on KinD you can use the Keycloak module, as shown in the example.

After deploying Keycloak, you can use the OIDC bootstrap module to create the Keycloak realm, groups, users, etc.

The user_map variable of that module allows you to create OIDC users used to authenticate to the DevOps Stack applications. The module will generate a password for each user, which you can check later after the deployment.

If you do not provide a value for the user_map variable, the module will create a user named devopsadmin with a random password.

Self-signed SSL certificates

Since KinD is deployed on your machine, there is no easy way of creating valid SSL certificates for the ingresses using Let’s Encrypt. As such, cert-manager is configured to use a self-signed Certificate Authority and the remaining modules are configured to ignore the SSL warnings/errors that are a consequence of that.

When accessing the ingresses on your browser, you’ll obviously see warnings saying that the certificate is not valid. You can safely ignore them.

Deployment

  1. Clone the repository and cd into the examples/kind folder.

  2. Check out the modules you want to deploy in the main.tf file, and comment out the others;

    You can also add your own Terraform modules in this file or any other file on the root folder. A good place to start to write your own module is to clone the devops-stack-module-template repository and adapt it to your needs;
  3. On the oidc module, adapt the user_map variable as you wish (please check the OIDC section for more information).

  4. From the source of the example deployment, initialize Terraform, which downloads all required providers and modules locally (they will be stored in the hidden folder .terraform);

    terraform init
  5. Configure the variables in locals.tf to your preference:

    locals {
      kubernetes_version     = "v1.29.0"
      cluster_name           = "YOUR_CLUSTER_NAME"
      base_domain            = format("%s.nip.io", replace(module.traefik.external_ip, ".", "-"))
      cluster_issuer         = module.cert-manager.cluster_issuers.ca
      enable_service_monitor = false # Can be enabled after the first bootstrap.
      app_autosync           = true ? { allow_empty = false, prune = true, self_heal = true } : {}
    }
  6. Finally, run terraform apply and accept the proposed changes to create the Kubernetes nodes as Docker containers, and populate them with our services;

    terraform apply
  7. After the first deployment (please note the troubleshooting step related with Argo CD), you can go to locals and enable the ServiceMonitor boolean to activate the Prometheus exporters that will send metrics to Prometheus;

    This flag needs to be set as false for the first bootstrap of the cluster, otherwise the applications will fail to deploy while the Custom Resource Definitions of the kube-prometheus-stack are not yet created.
    You can either set the flag as true in the locals.tf file or you can simply delete the line on the modules' declarations, since this variable is set as true by default on each module.
    Take note of the local called app_autosync. If you set the condition of the ternary operator to false you will disable the auto-sync for all the DevOps Stack modules. This allows you to choose when to manually sync the module on the Argo CD interface and is useful for troubleshooting purposes.

Access the cluster and the DevOps Stack applications

Typically the KinD Terraform provider used in our code already appends the credentials to your default Kubeconfig, so you should be good to go to access the cluster.

Otherwise, you can use the content of the kubernetes_kubeconfig output to manually generate a Kubeconfig file or you can use the one automatically created on the root folder of the project.

Then you can use the kubectl or k9s command to interact with the cluster:

k9s --kubeconfig <PATH_TO_TERRAFORM_PROJECT>/<CLUSTER_NAME>-config

As for the DevOps Stack applications, you can access them through the ingress domain that you can find in the ingress_domain output. If you used the code from the example without modifying the outputs, you will see something like this on your terminal after the terraform apply has done its job:

Outputs:

ingress_domain = "your.domain.here"
keycloak_admin_credentials = <sensitive>
keycloak_users = <sensitive>
kubernetes_kubeconfig = <sensitive>

Or you can use kubectl to get all the ingresses and their respective URLs:

kubectl get ingress --all-namespaces

For example, if the base domain name is 172-19-0-1.nip.io, the applications are accessible at the following adresses:

https://grafana.apps.172-19-0-1.nip.io
https://alertmanager.apps.172-19-0-1.nip.io
https://prometheus.apps.172-19-0-1.nip.io
https://keycloak.apps.172-19-0-1.nip.io
https://minio.apps.172-19-0-1.nip.io
https://argocd.apps.172-19-0-1.nip.io
https://thanos-bucketweb.apps.172-19-0-1.nip.io
https://thanos-query.apps.172-19-0-1.nip.io

You can access the applications using the credentials created by the Keycloak module. They are written to the Terraform output:

# List all outputs
$ terraform output
keycloak_admin_credentials = <sensitive>
keycloak_users = <sensitive>
kubernetes_kubeconfig = <sensitive>
minio_root_user_credentials = <sensitive>

# To get the credentials for Grafana, Prometheus, etc.
$ terraform output keycloak_users
{
  "devopsadmin" = "PASSWORD"
}

Pause the cluster

The docker pause command can be used to halt the cluster for a while in order to save energy (replace kind-cluster by the cluster name you defined in locals.tf):

# Pause the cluster:
docker pause kind-cluster-control-plane kind-cluster-worker{,2,3}

# Resume the cluster:
docker unpause kind-cluster-control-plane kind-cluster-worker{,2,3}
When the host computer is restarted, the Docker container will start again, but the cluster will not resume correctly. It has to be destroyed and recreated.

Stop the cluster

To definitively stop the cluster on a single command (that is the reason we delete some resources from the state file), you can use the following command:

terraform state rm $(terraform state list | grep "argocd_application\|argocd_project\|kubernetes_\|helm_\|keycloak_") && terraform destroy

A dirtier alternative is to directly destroy the Docker containers and volumes (replace kind-cluster by the cluster name you defined in locals.tf):

# Stop and remove Docker containers
docker container stop kind-cluster-control-plane kind-cluster-worker{,2,3} && docker container rm -v kind-cluster-control-plane kind-cluster-worker{,2,3}
# Remove the Terraform state file
rm terraform.state

Conclusion

That’s it, you have deployed the DevOps Stack locally! For more information, keep on reading the documentation. You can explore the possibilities of each module and get the link to the source code on their respective documentation pages.

Troubleshooting

connection_error during the first deployment

In some cases, you could encounter an error like this the first deployment:

╷
│ Error: Error while waiting for application argocd to be created
│
│   with module.argocd.argocd_application.this,
│   on .terraform/modules/argocd/main.tf line 55, in resource "argocd_application" "this":
│   55: resource "argocd_application" "this" {
│
│ error while waiting for application argocd to be synced and healthy: rpc error: code = Unavailable desc = connection error: desc = "transport: error while dialing: dial tcp 127.0.0.1:45729: connect: connection refused"
╵

This error is due to the way we provision Argo CD on the final steps of the deployment. We use the bootstrap Argo CD to deploy the final Argo CD module, which causes a redeployment of Argo CD and consequently a momentary loss of connection between the Argo CD Terraform provider and the Argo CD server.

You can simply re-run the command terraform apply to finalize the bootstrap of the cluster.

For more informations about the Argo CD module, please refer to the respective documentation page.

Argo CD interface reload loop when clicking on login

If you encounter a loop when clicking on the login button on the Argo CD interface, you can try to delete the Argo CD server pod and let it be recreated.

For more informations about the Argo CD module, please refer to the respective documentation page.

loki-stack-promtail pods stuck with status CrashLoopBackOff

You could stumble upon loki-stack-promtail stuck in a creation loop with the following logs:

level=error ts=2023-05-09T06:32:38.495673778Z caller=main.go:117 msg="error creating promtail" error="failed to make file target manager: too many open files"
Stream closed EOF for loki-stack/loki-stack-promtail-bxcmw (promtail)

If that’s the case, you will have to increase the upper limit on the number of INotify instances that can be created per real user ID:

# Increase the limit until next reboot
sudo sysctl fs.inotify.max_user_instances=512
# Increase the limit permanently (run this command as root)
echo 'fs.inotify.max_user_instances=512' >> /etc/sysctl.conf