EKS variant

This folder contains the variant to use when deploying in AWS using an EKS cluster.

Usage

This module can be declared by adding the following block on your Terraform configuration:

module "thanos" {
  source = "git::https://github.com/camptocamp/devops-stack-module-thanos.git//eks?ref=<RELEASE>"

  cluster_name            = var.cluster_name
  argocd_namespace        = module.cluster.argocd_namespace
  base_domain             = module.cluster.base_domain
  cluster_issuer          = var.cluster_issuer

  metrics_storage = {
    bucket_id    = aws_s3_bucket.thanos_metrics_storage.id
    region       = aws_s3_bucket.thanos_metrics_storage.region
    iam_role_arn = module.iam_assumable_role_thanos.iam_role_arn
  }

  thanos = {
    oidc = module.oidc.oidc
  }

  depends_on = [module.argocd_bootstrap]
}

As you can see, a minimum requirement for this module is an S3 bucket with an IAM policy attached and an OIDC provider (more information below).

Although the declaration above allows you to have a barebones Thanos deployed, it is highly recommended that you customize a few settings for a production-ready deployment. You need to at least configure the resource requirements for a few of the Thanos' components and the size of the persistent volume used by the compactor. You can also configure the compactor retention times, as in the example below.

module "thanos" {
  source = "git::https://github.com/camptocamp/devops-stack-module-thanos.git//eks?ref=<RELEASE>"

  cluster_name            = var.cluster_name
  argocd_namespace        = module.cluster.argocd_namespace
  base_domain             = module.cluster.base_domain
  cluster_issuer          = var.cluster_issuer

  metrics_storage = {
    bucket_id    = aws_s3_bucket.thanos_metrics_storage.id
    region       = aws_s3_bucket.thanos_metrics_storage.region
    iam_role_arn = module.iam_assumable_role_thanos.iam_role_arn
  }

  thanos = {
    # OIDC configuration
    oidc = module.oidc.oidc

    # Configuration of the persistent volume for the compactor
    compactor_persistent_size = "100Gi"

    # Resources configuration for the pods
    compactor_resources = {
      limits = {
        memory = "1Gi"
      }
      requests = {
        cpu    = "0.5"
        memory = "512Mi"
      }
    }
    storegateway_resources = {
      limits = {
        memory = "1Gi"
      }
      requests = {
        cpu    = "0.5"
        memory = "1Gi"
      }
    }
    query_resources = {
      limits = {
        memory = "1Gi"
      }
      requests = {
        cpu    = "0.5"
        memory = "512Mi"
      }
    }

    # Retention settings for the compactor
    compactor_retention = {
      raw      = "60d"
      five_min = "120d"
      one_hour = "240d"
    }
  }

  depends_on = [module.argocd_bootstrap]
}

As you can see on the examples above, the variable thanos provides an interface to customize the most frequently used settings. This variable is merged with the local value thanos_defaults, which contains some sensible defaults to have a barebones working deployment. You can check the default values on the local.tf file.

If there is a need to configure something besides the common settings that we have provided above, you can customize the chart’s values.yaml by adding an Helm configuration as an HCL structure:

module "thanos" {
  source = "git::https://github.com/camptocamp/devops-stack-module-thanos.git//eks?ref=<RELEASE>"

  cluster_name            = var.cluster_name
  argocd_namespace        = module.cluster.argocd_namespace
  base_domain             = module.cluster.base_domain
  cluster_issuer          = var.cluster_issuer

  metrics_storage = {
    bucket_id    = aws_s3_bucket.thanos_metrics_storage.id
    region       = aws_s3_bucket.thanos_metrics_storage.region
    iam_role_arn = module.iam_assumable_role_thanos.iam_role_arn
  }

  thanos = {
    oidc = module.oidc.oidc
  }

  helm_values = [{ # Note the curly brackets here
    thanos = {
      map = {
        string = "string"
        bool   = true
      }
      sequence = [
        {
          key1 = "value1"
          key2 = "value2"
        },
        {
          key1 = "value1"
          key2 = "value2"
        },
      ]
      sequence2 = [
        "string1",
        "string2"
      ]
    }
  }]

  depends_on = [module.argocd_bootstrap]
}

S3 Bucket

You are in charge of creating an S3 bucket for Thanos to store the archived metrics.

We’ve decided to keep the creation of this bucket outside of this module, mainly because the persistence of the data should not be related to the instantiation of the module itself.

You can create an S3 bucket and an IAM policy using the code below:

resource "aws_s3_bucket" "thanos_metrics_storage" {
  bucket = format("thanos-metrics-storage-%s", module.eks.cluster_name)

  force_destroy = true

  tags = {
    Name    = "Thanos metrics storage"
    Cluster = module.eks.cluster_name
  }
}

module "iam_assumable_role_thanos" {
  source                     = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
  version                    = "4.0.0"
  create_role                = true
  number_of_role_policy_arns = 1
  role_name                  = format("thanos-s3-role-%s", module.eks.cluster_name)
  provider_url               = replace(module.eks.cluster_oidc_issuer_url, "https://", "")
  role_policy_arns           = [aws_iam_policy.thanos_s3_policy.arn]

  # List of ServiceAccounts that have permission to attach to this IAM role
  oidc_fully_qualified_subjects = [
    "system:serviceaccount:thanos:thanos-bucketweb",
    "system:serviceaccount:thanos:thanos-storegateway",
    "system:serviceaccount:thanos:thanos-compactor",
    "system:serviceaccount:thanos:thanos-sidecar",
    "system:serviceaccount:kube-prometheus-stack:kube-prometheus-stack-prometheus"
  ]
}

resource "aws_iam_policy" "thanos_s3_policy" {
  name_prefix = "thanos-s3-"
  description = "Thanos IAM policy for cluster ${module.eks.cluster_name}"
  policy      = data.aws_iam_policy_document.thanos_s3_policy.json
}

data "aws_iam_policy_document" "thanos_s3_policy" {
  statement {
    actions = [
      "s3:ListBucket",
      "s3:PutObject",
      "s3:GetObject",
      "s3:DeleteObject",
    ]

    resources = [
      aws_s3_bucket.thanos_metrics_storage.arn,
      format("%s/*", aws_s3_bucket.thanos_metrics_storage.arn),
    ]

    effect = "Allow"
  }
}
Do not forget that the bucket configuration also needs to be passed to the module kube-prometheus-stack.

OIDC

This module was developed with OIDC in mind.

There is an OIDC proxy container deployed as a sidecar on each pod that has a web interface. Consequently, the thanos variable is expected to have a map oidc containing at least the Issuer URL, the Client ID, and the Client Secret.

You can pass these values by pointing an output from another module (as above), or by defining them explicitly:

module "thanos" {
  ...

  thanos = {
    oidc = {
      issuer_url    = "<URL>"
      client_id     = "<ID>"
      client_secret = "<SECRET>"
    }
  }

  ...
}

Resource Configuration

Since the resource requirements are not the same on every deployment and because the consumed resources also influence the cost associated, we refrained from configuring default resource requirements for the components of Thanos. We did, however, set memory limits for some of the pods (query, storegateway and compactor all have a 1 GB memory limit). We recommend that you customize these values as you see fit.

At the very least you should configure the size for the PersistentVolume used by the compactor.

This value MUST be configured otherwise the compactor will NOT work on a production deployment. The Thanos documentation recommends a size of 100-300 GB.

Technical Reference

Dependencies

module.argocd_bootstrap

This module must be one of the first ones to be deployed and consequently it needs to be deployed after the module argocd_bootstrap.

Requirements

The following requirements are needed by this module:

Modules

The following Modules are called:

thanos

Source: ../

Version:

Required Inputs

The following input variables are required:

metrics_storage

Description: AWS S3 bucket configuration values for the bucket where the archived metrics will be stored.

Type:

object({
    bucket_id    = string
    region       = string
    iam_role_arn = string
  })

cluster_name

Description: Name given to the cluster. Value used for the ingress' URL of the application.

Type: string

base_domain

Description: Base domain of the cluster. Value used for the ingress' URL of the application.

Type: string

Optional Inputs

The following input variables are optional (have default values):

argocd_namespace

Description: Namespace used by Argo CD where the Application and AppProject resources should be created.

Type: string

Default: "argocd"

target_revision

Description: Override of target revision of the application chart.

Type: string

Default: "v1.0.0"

cluster_issuer

Description: SSL certificate issuer to use. Usually you would configure this value as letsencrypt-staging or letsencrypt-prod on your root *.tf files.

Type: string

Default: "ca-issuer"

namespace

Description: Namespace where the applications’s Kubernetes resources should be created. Namespace will be created in case it doesn’t exist.

Type: string

Default: "thanos"

helm_values

Description: Helm chart value overrides. They should be passed as a list of HCL structures.

Type: any

Default: []

app_autosync

Description: Automated sync options for the Argo CD Application resource.

Type:

object({
    allow_empty = optional(bool)
    prune       = optional(bool)
    self_heal   = optional(bool)
  })

Default:

{
  "allow_empty": false,
  "prune": true,
  "self_heal": true
}

dependency_ids

Description: IDs of the other modules on which this module depends on.

Type: map(string)

Default: {}

thanos

Description: Most frequently used Thanos settings. This variable is merged with the local value thanos_defaults, which contains some sensible defaults. You can check the default values on the local.tf file. If there still is anything other that needs to be customized, you can always pass on configuration values using the variable helm_values.

Type: any

Default: {}

Outputs

The following outputs are exported:

id

Description: ID to pass other modules in order to refer to this module as a dependency. It takes the ID that comes from the main module and passes it along to the code that called this variant in the first place.

Reference in table format

Show tables

= Requirements

Name Version

>= 4

>= 3

>= 3

>= 1

= Modules

Name Source Version

= Inputs

Name Description Type Default Required

AWS S3 bucket configuration values for the bucket where the archived metrics will be stored.

object({
    bucket_id    = string
    region       = string
    iam_role_arn = string
  })

n/a

yes

Name given to the cluster. Value used for the ingress' URL of the application.

string

n/a

yes

Base domain of the cluster. Value used for the ingress' URL of the application.

string

n/a

yes

Namespace used by Argo CD where the Application and AppProject resources should be created.

string

"argocd"

no

Override of target revision of the application chart.

string

"v1.0.0"

no

SSL certificate issuer to use. Usually you would configure this value as letsencrypt-staging or letsencrypt-prod on your root *.tf files.

string

"ca-issuer"

no

Namespace where the applications’s Kubernetes resources should be created. Namespace will be created in case it doesn’t exist.

string

"thanos"

no

Helm chart value overrides. They should be passed as a list of HCL structures.

any

[]

no

Automated sync options for the Argo CD Application resource.

object({
    allow_empty = optional(bool)
    prune       = optional(bool)
    self_heal   = optional(bool)
  })
{
  "allow_empty": false,
  "prune": true,
  "self_heal": true
}

no

IDs of the other modules on which this module depends on.

map(string)

{}

no

Most frequently used Thanos settings. This variable is merged with the local value thanos_defaults, which contains some sensible defaults. You can check the default values on the local.tf file. If there still is anything other that needs to be customized, you can always pass on configuration values using the variable helm_values.

any

{}

no

= Outputs

Name Description

id

ID to pass other modules in order to refer to this module as a dependency. It takes the ID that comes from the main module and passes it along to the code that called this variant in the first place.