Testing Infrastructure Code

aws go iac terraform terratest

Clone and follow along in the project repository…

If you’re part of the masses using Terraform (or OpenTofu), you’ve likely heard of Terratest. In the project’s own words, Terratest is a Go library that provides patterns and helper functions for testing infrastructure. Terratest can seem daunting since it requires writing Go. The good news is you only need a very small part of Go’s surface area, but have the option of using it to its full potential if you need to. If you’ve wanted to get started with Terratest but haven’t had time, read on for a quick and painless intro!

Getting Started

Terratest leverages Go’s native testing library. The first thing that means is you need Go installed and properly configured. Luckily, installing Go has gotten easier over the years. Instead of downloading releases and following manual instructions (you can still do that if you prefer), it’s likely just a brew install go away. Aside from the installation, you’ll want to create a workspace (traditionally ${HOME}/go) and add some new environment variables to your shell profile. Here’s mine:

❯ grep GO .zshrc
export GOPATH="${HOME}/go"
export GOBIN="${GOPATH}/bin"
export GOROOT="/usr/lib/go"
export PATH="${GOBIN}:${GOROOT}/bin:${PATH}"
test -d "${GOPATH}" || mkdir "${GOPATH}"
test -d "${GOPATH}/src/github.com" || mkdir -p "${GOPATH}/src/github.com"

For this project we’ll use the organization scheme below – feel free to experiment and pick what works best for you. There are a few common patterns you’ll see when browsing community modules… tests are placed in ${PROJECT_ROOT}/test/src, follow a test_name_test.go convention, and run against specific configurations in ${PROJECT_ROOT}/examples:

.
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── build
│   ├── Dockerfile
│   └── src
├── environments
│   └── dev
│       └── Makefile
├── examples
├── modules
│   └── network
│   └── web
└── test
    ├── Makefile
    └── src
        └── example_complete_test.go
        └── etc.

We want to focus on Terratest vs Terraform, so our contrived example takes a number of shortcuts such as using the default VPC in your AWS account and Amazon’s DNS. To keep it slightly more realistic, we’ll break out a couple modules. You might use a community module to handle the network bits, an internal module from your network team that exposes custom network details, or add additional services to meet your requirements… This all plugs nicely into our example hierarchy. You can represent high level functionality as modules, which in turn may consume other modules for the heavy lifting, then customize how it all gets stitched together for each target environment.

One last aside before we jump in… The sample project is configured to use aws-vault, which is particularly useful when working across a lot of different accounts. It manages AWS-related environment details for you so you can focus on getting work done. It keeps credentials locked away in your OS' keychain and uses temporary credentials to access your infrastructure – a security win even if you only use a single account. You don’t have to use aws-vault to use Terratest, but the scripts we’ll be using assume it’s in place. Feel free to refactor to meet your personal tastes, or take a few minutes to get aws-vault installed before jumping into the examples.

Boilerplate

Before we can test, we need a few lines of boilerplate to load any required modules and configure Terratest. Luckily once you work it out for one project it’s easy to turn into a template:

package test

import (
	"testing"

	"github.com/gruntwork-io/terratest/modules/terraform"
	"github.com/stretchr/testify/assert"
)

func TestExamplesComplete(t *testing.T) {
	t.Parallel()

	terraformOptions := &terraform.Options{
		TerraformDir: "../../examples/complete",
		Upgrade:      true,
		VarFiles: []string{"fixtures.us-east-2.tfvars"},
	}

	defer terraform.Destroy(t, terraformOptions)

	terraform.InitAndApply(t, terraformOptions)

Note how the terraform import is under a terratest/modules path. We’ll see examples of using modules below, but it’s a hint at Terratest’s modular approach (check out the full list in their repo, or the module documentation). Aside from the expected Terraform coverage, there are modules for your IaaS of choice, ways to validate common DevOps tooling (Docker, k8s), as well as http and shell modules which provide a lot of flexibility.

Technically you don’t need an assertion framework, but we follow the docs by pulling in testify. This makes tests a lot easier to read and write. You can easily swap this out if you have a preferred framework, or drop it entirely by using native comparison operations.

Next, we configure terraformOptions, providing the path to the code to test and passing any var files to be used (relative to TerraformDir). Lastly, in typical test fashion, we defer a cleanup operation to ensure we don’t leave artifacts around (more on this below), and use InitAndApply to run terraform init and terraform apply as part of each test (there are variations such as InitAndPlan, Init, Plan and Apply).

In our simple example we’ll just build one “complete” test covering all functionality. Typically you would have multiple tests, with fixtures customized accordingly to cover common use cases.

A Simple Test

In our contrived example, we have a network module that discovers the default VPC and associated subnets. In the real world you might have complicated infrastructure you manage or shared infrastructure from another team that you simply consume. You need to utilize network components like VPCs and subnets to get your service deployed. Wouldn’t it be nice to confirm your sensitive production service deploys to the desired network?

Let’s write the simplest test we can… Since the example code selects the default VPC in an attempt to run anywhere, we’ll start with a test that confirms the returned VPC starts with vpc-. This is easy to extend. For example, you could ensure target VPCs have specific tags.

In classic red/green/refactor style, let’s write a test we know will fail:

// ...

	vpcID := terraform.Output(t, terraformOptions, "vpc_id")
	assert.Equal(t, "vpc-foobah", vpcID)

As expected, running our test returns:

    TestExamplesComplete: examples_complete_test.go:37: 
                Error Trace:    examples_complete_test.go:37
                Error:          Not equal: 
                                expected: "vpc-foobah"
                                actual  : "vpc-7a5ce123"
                            
                                Diff:
                                --- Expected
                                +++ Actual
                                @@ -1 +1 @@
                                -vpc-foobah
                                +vpc-7a5ce123
                Test:           TestExamplesComplete
--- FAIL: TestExamplesComplete (293.52s)
FAIL
exit status 1
FAIL    test    293.834s 

It’s always nice to confirm things fail when expected… Let’s fix that:

import (
	"strings"
	// ...
)

// ...

	vpcID := terraform.Output(t, terraformOptions, "vpc_id")
	assert.True(t, strings.HasPrefix(vpcID, "vpc-"))

Note how we used the standard strings library to extend our test. This is generic enough it should match any returned VPC. Does it?

--- PASS: TestExamplesComplete (291.19s)
PASS
ok      test    291.875s

Awesome, now we have confidence things work as expected. Just to make this a bit more interesting, let’s ensure the list of availability zones output by the network module match the region specified in our fixtures:

// ...

	availabilityZones := terraform.OutputList(t, terraformOptions, "availability_zones")
	for _, az := range availabilityZones {
		assert.True(t, strings.HasPrefix(az, "us-east-2"))
	}

Ideally we’ve added coverage, and everything still passes:

--- PASS: TestExamplesComplete (308.07s)
PASS
ok      test    308.782s

One gotcha to be aware of, there is a lot of output when tests are running. I’ve purposefully zeroed in on the more informational parts. One section shows the outputs from the terraform run. Here’s an example:

TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: Apply complete! Resources: 11 added, 0 changed, 0 destroyed.
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: Outputs:
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: availability_zones = [
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "us-east-2a",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "us-east-2b",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "us-east-2c",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: ]
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: cloudwatch_log_group = /ecs/terratest-experiment-dev
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: dns_name = terratest-experiment-dev-2000011916.us-east-2.elb.amazonaws.com
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: ecr_repository_url = 012345678901.dkr.ecr.us-east-2.amazonaws.com/terratest-experiment-dev
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: subnet_cidrs = [
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "172.31.32.0/20",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "172.31.0.0/20",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "172.31.16.0/20",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: ]
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: subnet_ids = [
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "subnet-abcdef",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "subnet-ghjkil",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66:   "subnet-mnopqr",
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: ]
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: vpc_cidr = 172.31.0.0/16
TestExamplesComplete 2020-07-26T17:38:06-04:00 logger.go:66: vpc_id = vpc-7a5ce123

When I first started writing tests that compare outputs, I naively reached for terraform.Output and tried treating things like subnet_cidrs as lists. Output returns a string. Be aware of Terratest’s other output methods such as OutputList (seen above) and OutputMap. These will let you use native slices and map methods without cryptic errors.

One last thing… It takes a while, huh? That’s why I consider Terratest an “e-2-e” or “integration” test tool (vs unit tests that typically only take seconds to run). You can help this a bit by testing with intent (write efficient tests, do not strive for 100% coverage), but in the end running automation and confirming the actions it takes requires time.

Watch this video for a great overview of how to test infrastructure code.

Using Modules

Technically, you’ve already used modules… terraform is a module like all the others. Let’s extend this a bit by roping in the aws module to do some IaaS-specific probing:

import (
	"github.com/gruntwork-io/terratest/modules/aws"
	// ...
)

// ...

	deploymentSubnets := terraform.OutputList(t, terraformOptions, "deployment_subnets")
	for _, s := range deploymentSubnets {
		assert.True(t, aws.IsPublicSubnet(t, s, "us-east-2"))
	}

Our network module just exposes the default subnets provided by AWS. In your case you likely have public and private subnets. Something like a database cluster is usually on a set of private subnets. Just aws.IsPrivateSubnet, right? Good guess, but this is where browsing the module documentation pays off. As it turns out, there is no IsPrivateSubnet method, but that’s easy to work around by inverting our assertion:

	privateSubnets := terraform.OutputList(t, terraformOptions, "private_subnet_ids")
	for _, s := range privateSubnets {
		assert.False(t, aws.IsPublicSubnet(t, s, "us-east-2"))
	}

Advanced Topics

I’m sure you’ve noticed I kept showing tests, but not how I ran them… Since there’s a bit of setup and you’ll likely be running tests a lot (including in pipelines), I prefer a simple “test harness” to add consistency and reduce typing. One way is using a Makefile:

AWS_PROFILE := personal
REGION := us-east-2
VAULT_CMD := aws-vault exec $(AWS_PROFILE) --
# TF_CMD := $$GOPATH/bin/terraform
TF_CMD := terraform

export TF_DATA_DIR ?= $(CURDIR)/.terraform
export TF_CLI_ARGS_init ?= -get-plugins=true

init:
	cd src && go mod init test

tidy:
	cd src && go mod tidy

test:
	$(TF_CMD) fmt --write=false -check -diff -recursive ..
	cd src && $(VAULT_CMD) go test -v -timeout 30m

clean:
	cd src && rm -rf $(TF_DATA_DIR) go.mod go.sum

Another thing to think about if running a lot of parallel tests or using Terratest in a large organization is how to safely orchestrate tests at scale…

Since tests are running Terraform and creating real resources, it’s possible to have name collisions if teams are testing in the same account. Using a module to define a consistent naming convention can help.

In larger teams, or anywhere there are “hotspots” with multiple engineers working on similar areas of your infrastructure, when pipelines randomly fire you could end up munging state. Use a module to bolt state locking on yourself, or let a tool like terragrunt handle it for you.

Often when testing you leave artifacts behind or get something in a bad state. How will you deal with these resources? One way is dedicated test accounts. Disposable environments provide full isolation. While you may ultimately run your tests in production to validate deployments, by then you will have more confidence everything works as expected. Another option is tagging resources according to a schema that allows automated cleanup. Combining these is even better, using something like aws-nuke to auto-wipe all resources in ephemeral accounts on a schedule.

Conclusion

If your project is more than a hobby, it’s worth testing… While wasteful tests are to be avoided, testing with intent is essential to ensure quality. Luckily, it’s easy to get up and running with Terratest. You don’t need to reinvent the wheel, you can leverage a veritable Swiss Army Knife to cover your infrastructure code and leverage the flexibility of modules to creatively extend validation beyond Terraform itself.

Best of all, Terratest is open source. Whether you want to help the community by submitting PRs or read the code to understand how it works, the code is there for you to browse and extend.