Skip to main content

Test permutations with Terraform and GitHub Actions

·1654 words·8 mins
Hashicorp Terraform Github Actions
Table of Contents

I have been exploring the new test framework for Terraform 1.6 extensively since HashiConf in October this year. I have already written two very long posts on the topic of testing and validation with Terraform that you can read here:

A Comprehensive Guide to Testing in Terraform: Keep your tests, validations, checks, and policies in order
·5480 words·26 mins
Hashicorp Terraform Testing Test Policy Sentinel
Testing Framework in Terraform 1.6: A deep-dive
·4238 words·20 mins
Hashicorp Terraform Testing Test

In this post I want to illustrate a pattern for scaling up your testing using GitHub Actions.

Scenario
#

You are part of a platform team developing a Terraform module that sets up an Azure storage account according to a specification appropriate for your organization.

Your module has dependency on a module created by a different team in your organization. That module sets up an Azure resource group where the storage account your module produces is placed.

The source code for your module is stored in a GitHub repository and you want to use GitHub Actions to perform testing before you publish new versions of your module.

The scenario is illustrated in the figure below:

scenario

You expect that new versions of your module should be compatible with the three most recent minor versions of the other team’s resource group module that you depend on. You also expect that your module works as intended for each of the Azure regions that your organization is operating in. These regions are swedencentral, northeurope, and westeurope.

Your team has already figured out that there will be many tests to write to cover the permutation of the above criteria. To be precise, each individual test you write will need to be repeated 3x3=9 times (three locations, three versions).

The Terraform module your team is developing consists of a single main.tf file (to keep this scenario simple):

// main.tf
terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "3.80.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "3.5.1"
    }
  }
}

provider "azurerm" {
  features {}
}

variable "resource_group_name" {
  type = string
}

resource "random_id" "this" {
  keepers = {
    resource_group_name = var.resource_group_name
  }
  byte_length = 6
}

data "azurerm_resource_group" "this" {
  name = var.resource_group_name
}

resource "azurerm_storage_account" "this" {
  name                      = "st${random_id.this.dec}"
  access_tier               = "Hot"
  account_kind              = "StorageV2"
  account_replication_type  = "LRS"
  account_tier              = "Standard"
  resource_group_name       = data.azurerm_resource_group.this.name
  location                  = data.azurerm_resource_group.this.location
  enable_https_traffic_only = true
  tags                      = data.azurerm_resource_group.this.tags
}

The important parts to notice in this module is that it takes an input variable named resource_group_name:

variable "resource_group_name" {
  type = string
}

This variable is used in a data source for a resource group:

data "azurerm_resource_group" "this" {
  name = var.resource_group_name
}

And this data source is later referenced in arguments of the storage account resource:

resource "azurerm_storage_account" "this" {
  // ... 

  resource_group_name = data.azurerm_resource_group.this.name
  location            = data.azurerm_resource_group.this.location
  tags                = data.azurerm_resource_group.this.tags
}

For illustrative purposes your team is interested in running the following test as defined in tests/main.tftest.hcl:

// tests/main.tftest.hcl
run "setup" {
  variables {
    location    = "swedencentral"
    name_suffix = "tftest-swedencentral-1.1.0"
  }

  module {
    source  = "app.terraform.io/your-tf-org/resource-group-module/azurerm"
    version = "1.1.0"
  }
}

run "proper_tags_should_be_propagated" {
  variables {
    resource_group_name = run.setup.resource_group.name
  }

  command = apply

  assert {
    condition     = alltrue([
      contains(keys(azurerm_storage_account.this.tags), "source"),
      contains(keys(azurerm_storage_account.this.tags), "module")
    ])
    error_message = "Proper tags are not propagated to the storage account"
  }
}

There are two run blocks in this test file. The first run block named setup uses the module you depend on. In this case it specifically uses version 1.1.0 of the module. This run block defines a location variable that is set to swedencentral. This covers one of the nine cases we want to test.

The second run block is our actual test. It makes sure that appropriate tags are propagated to the storage account from the resource group. Specifically we require that two tags are set, source and module.

Now we turn to the solution to the issue of how to test all permutations of versions and locations.

Solution
#

There are a number of options for how to solve the permutation of tests you need to run in this scenario. In this post I present a simple solution using a strategy in GitHub Actions.

Before we look at the GitHub Actions workflow, let’s look at a modified version of our test file, this one stored in templates/main.tftest.hcl.tpl:

// templates/main.tftest.hcl.tpl
run "setup" {
  variables {
    location    = "{{LOCATION}}"
    name_suffix = "tftest-{{LOCATION}}-{{VERSION}}"
  }

  module {
    source  = "app.terraform.io/your-tf-org/resource-group-module/azurerm"
    version = "{{VERSION}}"
  }
}

run "proper_tags_should_be_propagated" {
  variables {
    resource_group_name = run.setup.resource_group.name
  }

  command = apply

  assert {
    condition     = alltrue([
      contains(keys(azurerm_storage_account.this.tags), "source"),
      contains(keys(azurerm_storage_account.this.tags), "module")
    ])
    error_message = "Proper tags are not propagated to the storage account"
  }
}

The only difference to the test file shown before is that the explicit version is replaced by a placeholder value {{VERSION}}, and each explicit location is replaced by {{LOCATION}}.

The GitHub Actions workflow is located in the file .github/workflows/tftest.yaml in our repo. We start building our workflow like this:

on: workflow_dispatch

permissions:
  id-token: write
  contents: read

To start with we only want to trigger the workflow manually, that is why we have on: workflow_dispatch as the trigger. We add a few permissions that will be required for the Azure login action where we use a federated identity in Azure (see the documentation for details on how to set this up).

Next we start defining our job:

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        version: ["1.1.0", "1.2.0", "1.3.0"]
        location: ["swedencentral", "westeurope", "northeurope"]

We have a single job named test. We will run the job on Ubuntu, using the latest version available. Next we define the strategy with the following configurations:

  • fail-fast: false, this is required to not have all tests cancelled if a single test fails. The default value for fail-fast is true.
  • matrix is used to configure a few settings that we want to vary between tests. In this case we vary version and location. There will be one run for each combination of version and location, for a total of nine runs.

The last part of the workflow consists of the steps we want to run for each test:

steps:
  - uses: azure/login@v1
    with:
      client-id: ${{ secrets.AZURE_CLIENT_ID }}
      tenant-id: ${{ secrets.AZURE_TENANT_ID }}
      subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
  - uses: actions/checkout@v4
  - run: |
      sed 's/{{VERSION}}/${{ matrix.version }}/g; s/{{LOCATION}}/${{ matrix.location }}/g' \
        templates/main.tftest.hcl.tpl > tests/main.tftest.hcl      
  - uses: hashicorp/setup-terraform@v2
    with:
      terraform_wrapper: false
  - run: terraform init
    env:
      TF_TOKEN_app_terraform_io: ${{ secrets.TF_TOKEN }}
  - run: terraform test

The first step uses the azure/login@v1 action to sign in to Azure. This is required because we will be creating resources in Azure using Terraform (remember: the Terraform test framework runs actual plan and apply operations, creating actual resources!)

The second step of the workflow uses the actions/checkout@v4 action to check out the source code. If you come from an Azure DevOps background you might be surprised that you need to explicitly add this step. I prefer that GitHub Actions requires that you add this step if you intend to do something with the source code in the repository, I find the behind-the-scenes checkout in Azure DevOps to be confusing.

The third step requires some explanation:

- run: |
    sed 's/{{VERSION}}/${{ matrix.version }}/g; s/{{LOCATION}}/${{ matrix.location }}/g' \
      templates/main.tftest.hcl.tpl > tests/main.tftest.hcl    

Here we use sed to:

  • Find each occurrence of {{VERSION}} in the file templates/main.tftest.hcl.tpl and replace it by ${{ matrix.version }} which in turn is replaced by GitHub Actions with a value from version in the matrix we defined above.
  • Find each occurrence of {{LOCATION}} in the file templates/main.tftest.hcl.tpl and replace it by ${{ matrix.location }} which in turn is replaced by GitHub Actions with a value from location in the matrix we defined above.

The result of these search-and-replace operations are stored in tests/main.tftest.hcl.

Here we opted for a simple sed search-and-replace, which is fine when the number of variables we need to replace are few. If we needed to replace many more variables than two we would probably look into using a templating tool for this instead. Use common sense here, if sed works for your purposes then there is no need to involve any other tool.

The last few steps sets up the Terraform CLI tool, initializes Terraform, and finally runs the tests:

- uses: hashicorp/setup-terraform@v2
  with:
    terraform_wrapper: false
- run: terraform init
  env:
    TF_TOKEN_app_terraform_io: ${{ secrets.TF_TOKEN }}
- run: terraform test

The terraform init step requires an environment variable TF_TOKEN_app_terraform_io with the value of a Terraform Cloud token. This is because the module we are using is located in a private Terraform registry (see the documentation for details on this).

The full workflow for reference:

on: workflow_dispatch

permissions:
  id-token: write
  contents: read

jobs:
  test:
    strategy:
      fail-fast: false
      matrix:
        version: ["1.1.0", "1.2.0", "1.3.0"]
        location: ["swedencentral", "westeurope", "northeurope"]
    runs-on: ubuntu-latest
    steps:
      - name: Azure login
        uses: azure/login@v1
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - uses: actions/checkout@v4
      - run: |
          sed 's/{{VERSION}}/${{ matrix.version }}/g; s/{{LOCATION}}/${{ matrix.location }}/g' \
            templates/main.tftest.hcl.tpl > tests/main.tftest.hcl          
      - uses: hashicorp/setup-terraform@v2
        with:
          terraform_wrapper: false
      - run: terraform init
        env:
          TF_TOKEN_app_terraform_io: ${{ secrets.TF_TOKEN }}
      - run: terraform test

When we trigger this single workflow we can see that nine individual jobs are started:

github actions have started

Jumping into Azure after a short while we can confirm that nine different resource groups have been created:

resource groups in Azure

Finally, after a few minutes all jobs are finished1:

github

The results are clearly indicating that a regression have been introduced in version 1.3.0 of the resource group module that we depend on. We can conclude that our module is not ready to be release as a new version, and it is time to talk to the team responsible for the resource group module to see what changes they have made.


  1. When you use a matrix with many different variables (I use two) the number of billable minutes in GitHub Actions can quickly escalate. In the simple example I used 43 billable minutes. ↩︎

Mattias Fjellström
Author
Mattias Fjellström
Cloud architect consultant and an HashiCorp Ambassador