Skip to main content

Terraform Search: Deep-Dive

·2249 words·11 mins
Terraform Search Aws Hashiconf

Over the past few years there have been many features related to configuration-driven state manipulation. These types of operations involve using moved blocks to move a resource from one address in your state to a new address (e.g. migrating a resource from the root module to a new child-module), removed blocks to remove a resource from the state file (e.g. the resource should no longer be managed by the current Terraform configuration), and import blocks to import resources into your state file (e.g. bring resources provisioned using some other means under management by the current Terraform configuration).

One of the latest additions to Terraform is Terraform search. Although not a feature for directly manipulating your state file, it will likely be involved in the process of bringing existing infrastructure under management by Terraform.

In this blog post we will learn what Terraform search is, how it works, and see a few examples of it in use.

What is Terraform search?#

Terraform search is declarative resource discovery. This feature allows you to discover resources through special types of queries against your Terraform providers. The end goal is to bring the discovered resources under management by Terraform.

The declarative part of Terraform search is that you define your search queries using the HCL language similar to how you configured the desired state of your infrastructure in your Terraform configuration.

Provider Support
#

A common bottleneck for many new features in Terraform is that they must be implemented by the providers (think of actions, ephemeral resource arguments, and identity-based imports to name a few recent cases).

These new features are often co-developed with one or two providers to ensure there is support for the feature from day one. This time the AWS provider already has some support for Terraform search in the following list resources: aws_instance, aws_iam_role, and aws_cloudwatch_log_group.

It will likely take some time before other providers are onboarded, and even more time before there is widespread support for list resources.

This post will focus on what is available today.

How Terraform search works
#

The workflow for Terraform search consists of the following high-level steps:

  1. Configure one or more queries using the new list block in .tfquery.hcl files.
  2. Run terraform query to discover resources fulfilling the queries.
  3. (Optional) Bring these resources in under management by Terraform.

The new list block has the following general structure:

list "<list type>" "<symbolic name>" {
  provider = ... # required argument
  # query arguments depending on list type
}

The list block has two labels. The first label is the list block type, e.g. aws_instance. The second label represents the symbolic name of the list block. The combination of list block type and symbolic name must be unique within all your .tfquery.hcl files.

Search queries (list blocks) are defined in a new file type ending with .tfquery.hcl. This means that Terraform search is not part of the normal Terraform plan and apply workflow.

A full example of a .tfquery.hcl file that configures the AWS provider and includes a query for all the EC2 instances in the configured region is shown below:

provider "aws" {
  region = "us-west-1"
}

list "aws_instance" "all" {
  provider = aws
}

The .tfquery.hcl files also has support for locals blocks and variable blocks.

Executing a basic Terraform search query
#

To execute the queries you have configured in your .tfquery.hcl files you run the new terraform query CLI command. All .tfquery.hcl files in the same directory where you run the command are included.

You have to run queries from a directory where you have initialized a Terraform configuration.

So before you do anything else, (at minimum) create a main.tf file with the following contents:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
  }
}

Now run terraform init to download the AWS provider and initialize the configuration.

Running terraform query for the configuration shown in the previous section gives us the following results1:

$ terraform query
list.aws_instance.all   account_id=123456789012,id=i-0835b41ff06f2b6cf,region=us-west-1   frontend
list.aws_instance.all   account_id=123456789012,id=i-0e7ad4412b60c75f5,region=us-west-1   frontend
list.aws_instance.all   account_id=123456789012,id=i-066e446260eb7f82b,region=us-west-1   backend
list.aws_instance.all   account_id=123456789012,id=i-064fd00d079825559,region=us-west-1   backend
list.aws_instance.all   account_id=123456789012,id=i-01ea716dd96e54d01,region=us-west-1   backend

The columns in the output show the following data:

  • The address of the query (e.g. list.aws_instance.all).
  • An object containing the identity attributes of the discovered resource (e.g. for EC2 instances this includes the AWS account ID, the instance ID, and the AWS region).
  • The Name tag of the discovered resource (e.g. web).

The results are printed to the terminal where you ran the command.

You can refine and update the query and re-run terraform query as many times as you like.

Generating configuration for the discovered resources
#

You can ask Terraform to generate Terraform configuration and import blocks for the resources that are discovered by a query using the -generate-config-out=path flag to the terraform query command. The path value must be the name of a file that does not yet exist (i.e. Terraform can’t append to a file or replace an existing file).

Since the generated configuration also includes import blocks you can easily import the discovered resources into your Terraform state and start managing them using Terraform going forward.

To generate the configuration for the previous example, add the -generate-config-out flag to the command:

$ terraform query -generate-config-out=instances.tf
list.aws_instance.all   account_id=123456789012,id=i-03a2dfb151604938a,region=us-west-1   web
list.aws_instance.all   account_id=123456789012,id=i-003d6dc317a52fe40,region=us-west-1   web
list.aws_instance.all   account_id=123456789012,id=i-0d28e2fb14ba0ed4f,region=us-west-1   web

The generated file instances.tf contains the following code (truncated for brevity):

# __generated__ by Terraform
# Please review these resources and move them into your main configuration files.

# __generated__ by Terraform
resource "aws_instance" "all_0" {
  # details omitted ...
}

import {
  to       = aws_instance.all_0
  provider = aws
  identity = {
    account_id = "123456789012"
    id         = "i-0835b41ff06f2b6cf"
    region     = "us-west-1"
  }
}

# ... three resources omitted for brevity ...

resource "aws_instance" "all_4" {
  # details omitted ...
}

import {
  to       = aws_instance.all_4
  provider = aws
  identity = {
    account_id = "123456789012"
    id         = "i-01ea716dd96e54d01"
    region     = "us-west-1"
  }
}

For each EC2 instance there is a resource block and an import block.

The resource blocks contain all available attributes of the resource type, which are a lot for EC2 instances. Before you move this into your Terraform configuration you might want to remove all unnecessary default values from the configuration.

The generated import blocks use the new import by identity feature.

Terraform Import Resources by Identity
·676 words·4 mins
Terraform

Note that the generated configuration is somewhat experimental, and at the time of writing the generated EC2 configuration is invalid and requires some hands-on modifications. In fact, for this specific configuration I got 95 individual errors when I tried to run terraform plan.

I believe all of these errors disappear if you remove all unnecessary default values from each aws_instance resource.

Using meta-arguments in list blocks
#

A list block has support for the count and for_each meta-arguments.

In the following example we use for_each with a list of region names to query for EC2 instances in three different regions:

provider "aws" {
  region = "us-west-1" # default region
}

locals {
  regions = ["us-west-1", "us-east-1", "eu-west-1"]
}

list "aws_instance" "all" {
  for_each = toset(local.regions)

  provider = aws
  
  config {
    region = each.value
  }
}

When I run terraform query for my AWS account I get the following output:

$ terraform query
list.aws_instance.all["eu-west-1"]   account_id=123456789012,id=i-045d428c88b12f39e,region=eu-west-1   backup

list.aws_instance.all["us-east-1"]   account_id=123456789012,id=i-089f3a5328681f9bb,region=us-east-1   web01
list.aws_instance.all["us-east-1"]   account_id=123456789012,id=i-08298eac244d627ec,region=us-east-1   web02

list.aws_instance.all["us-west-1"]   account_id=123456789012,id=i-0835b41ff06f2b6cf,region=us-west-1   frontend
list.aws_instance.all["us-west-1"]   account_id=123456789012,id=i-0e7ad4412b60c75f5,region=us-west-1   frontend
list.aws_instance.all["us-west-1"]   account_id=123456789012,id=i-066e446260eb7f82b,region=us-west-1   backend
list.aws_instance.all["us-west-1"]   account_id=123456789012,id=i-064fd00d079825559,region=us-west-1   backend
list.aws_instance.all["us-west-1"]   account_id=123456789012,id=i-01ea716dd96e54d01,region=us-west-1   backend

The list of results is separated based on the region that was queried. Note that the use of for_each creates three separate queries.

Using count works in a similar way.

Using variables in query files
#

You can define variables using variable blocks in the .tfquery.hcl files. However, these variables must also be defined in the root module.

You can pass values to these variables using the -var 'myvar=myvalue' or -var-file=filename flags. You can also set values for the variables using the terraform.tfvars or the *.auto.tfvars variables files that are picked up automatically by Terraform when you run terraform query.

An example of using a variable for the AWS region name in the query looks like this:

variable "aws_region" {
  type    = string
  default = "eu-west-1"
}

provider "aws" {
  region = var.aws_region
}

list "aws_instance" "all" {
  provider = aws
}

To execute the query for a region other than the default region run:

$ terraform query -var='aws_region=us-east-1'

list.aws_instance.all   account_id=123456789012,id=i-089f3a5328681f9bb,region=us-east-1   web01
list.aws_instance.all   account_id=123456789012,id=i-08298eac244d627ec,region=us-east-1   web02

Using filters to discover EC2 instances
#

So far we have queries for EC2 instances based in a given region only. Commonly you would like to query for other attributes.

In the following example we query for all EC2 instances in the given region that has a tag with key Owner and value platform-team:

provider "aws" {
  region = "us-west-1"
}

list "aws_instance" "platform_team" {
  provider = aws

  config {
    filter {
      name = "tag:Owner"
      values = ["platform-team"]
    }
  }
}
$ terraform query

list.aws_instance.platform_team   account_id=123456789012,id=i-066e446260eb7f82b,region=us-west-1   backend
list.aws_instance.platform_team   account_id=123456789012,id=i-064fd00d079825559,region=us-west-1   backend
list.aws_instance.platform_team   account_id=123456789012,id=i-01ea716dd96e54d01,region=us-west-1   backend

You can include any number of filters in a query to narrow down the results.

The filters that you can use include all the normal EC2 instance filter names, you can find a list of available filters in the documentation.

How to handle negative filters to discover EC2 instances
#

Unfortunately, negative filters are not supported (e.g. show me all instances that do not have this tag). Currently I do not know if there is a good solution for how to solve this.

The following contrived example will run three separate queries:

  • Find all EC2 instances in a region.
  • Find all instances that have a given tag key/value pair set.
  • Find all instances that are part of the first query, but not part of the second query. In fact, this query will be split into one query for each instance that was not part of the second query.
  region = "us-west-1"
}

list "aws_instance" "all" {
  provider = aws
}

list "aws_instance" "platform_team" {
  provider = aws

  config {
    filter {
      name = "tag:Owner"
      values = ["platform-team"]
    }
  }
}

locals {
  # all query results for each query
  all           = list.aws_instance.all.data
  platform_team = list.aws_instance.platform_team.data

  # I have not found out what the column name is, but I can extract it like this
  column_name = keys(local.all[0])[1]

  # extract instance ids from the queries
  all_ids           = [for i in list.aws_instance.all.data : i[local.column_name].id]
  platform_team_ids = [for i in list.aws_instance.platform_team.data : i[local.column_name].id]

  # compute the missing instance ids
  missing = setsubtract(local.all_ids, local.platform_team_ids)
}

list "aws_instance" "test" {
  for_each = local.missing
  provider = aws

  config {
    filter {
      name = "instance-id"
      values = [each.value]
    }
  }
}

The magic is in the locals block. Note that this is an example I hacked together, and it is possible that it could be simplified. Running terraform query for this example gives me the following output:

$ terraform query

list.aws_instance.platform_team   account_id=123456789012,id=i-066e446260eb7f82b,region=us-west-1   backend
list.aws_instance.platform_team   account_id=123456789012,id=i-064fd00d079825559,region=us-west-1   backend
list.aws_instance.platform_team   account_id=123456789012,id=i-01ea716dd96e54d01,region=us-west-1   backend

list.aws_instance.all   account_id=123456789012,id=i-0835b41ff06f2b6cf,region=us-west-1   frontend
list.aws_instance.all   account_id=123456789012,id=i-0e7ad4412b60c75f5,region=us-west-1   frontend
list.aws_instance.all   account_id=123456789012,id=i-066e446260eb7f82b,region=us-west-1   backend
list.aws_instance.all   account_id=123456789012,id=i-064fd00d079825559,region=us-west-1   backend
list.aws_instance.all   account_id=123456789012,id=i-01ea716dd96e54d01,region=us-west-1   backend

list.aws_instance.test["i-0835b41ff06f2b6cf"]   account_id=123456789012,id=i-0835b41ff06f2b6cf,region=us-west-1   frontend

list.aws_instance.test["i-0e7ad4412b60c75f5"]   account_id=123456789012,id=i-0e7ad4412b60c75f5,region=us-west-1   frontend

The first set of results show the instances that have the Owner tag with a value of platform-team. The second set of results are all my EC2 instances. The third and fourth result are the two instances that do not have the required tag.

A caveat with this approach is that if you export Terraform configuration with this command it will include all instances, even the same instance multiple times.

Limit the number of search results
#

If you want to limit the number of search results that are returned from a query you can add the limit argument:

list "aws_instance" "most" {
  provider = aws

  limit = 10 # return at most 10 results
}

Terraform search for other AWS resources
#

In the previous sections we have seen many examples of queries for EC2 instances. At the time of wiring there are two other supported list resource types.

You can discover IAM roles using the aws_iam_role list resource type. The following example discovers all IAM roles in the AWS account:

provider "aws" {
  region = "us-west-1"
}

list "aws_iam_role" "all" {
  provider = aws
}

The results from terraform query for this configuration are shown below:

$ terraform query

list.aws_iam_role.all   account_id=123456789012,name=AmazonEKSAutoClusterRole                               AmazonEKSAutoClusterRole
list.aws_iam_role.all   account_id=123456789012,name=AmazonEKSAutoNodeRole                                  AmazonEKSAutoNodeRole
list.aws_iam_role.all   account_id=123456789012,name=Amazon_EventBridge_Invoke_Api_Destination_1271079307   Amazon_EventBridge_Invoke_Api_Destination_1271079307
list.aws_iam_role.all   account_id=123456789012,name=slack-role-9jzjy7qt                                    slack-role-9jzjy7qt

The query does not include service-linked roles.

Finally, you can also query for AWS CloudWatch log groups in a given region:

provider "aws" {
  region = "eu-west-1"
}

list "aws_cloudwatch_log_group" "all" {
  provider = aws
}

The results from terraform query for this configuration are shown below:

$ terraform query
list.aws_cloudwatch_log_group.all   account_id=123456789012,name=/aws/lambda/my-http-function,region=eu-west-1   /aws/lambda/my-http-function
list.aws_cloudwatch_log_group.all   account_id=123456789012,name=/aws/lambda/my-sqs-function,region=eu-west-1    /aws/lambda/my-sqs-function
list.aws_cloudwatch_log_group.all   account_id=123456789012,name=/aws/lambda/secret-rotation-lambda,region=eu-west-1   /aws/lambda/secret-rotation-lambda
list.aws_cloudwatch_log_group.all   account_id=123456789012,name=/aws/lambda/slack,region=eu-west-1                    /aws/lambda/slack

The aws_iam_role and aws_cloudwatch_log_group list resources do currently not support any further configuration.

Key takeaways
#

Terraform search will simplify bringing unmanaged resources into your Terraform state so that you can properly managed them going forward.

With Terraform search you define queries using the new list block. You add one or more queries in .tfquery.hcl files and you run terraform query to execute the queries. To generate resource and import blocks for each discovered resource you can add the -generate-config-out=<filename> flag to the command.

At the time of writing there are only three supported list resource types: aws_instance, aws_iam_role, and aws_cloudwatch_log_group. This list will be extended in the coming days, weeks, and months. New providers will also be onboarded in due time.


  1. Your results will differ of course, this depends on how many EC2 instances you have in the configured region of your AWS account. ↩︎

Mattias Fjellström
Author
Mattias Fjellström
Cloud architect · Author · HashiCorp Ambassador · Microsoft MVP