Clone the project repo to follow along…
So far in this series we’ve learned the fundamentals of Amazon’s Elastic Container Service, containerized a simple Node.js application, and deployed it to the cloud. In the final article of this series, we’ll eliminate the toil of building and maintaining ECS infrastructure by automating everything we’ve learned using Terraform.
Before diving into Terraform, the first thing we’ll need is a “container
definition” to feed the aws_ecs_task_definition
resource. The good news is
we’ve already built this while working through the manual configuration of our
task.
By simply trimming off everything in the task definition except the
containerDefinitions
list, you’ll have all you need. The initial slog of
figuring out the task definition JSON is paying dividends! This is just reusing
a portion of an existing file, and the point of this article is the actual
Terraform, so I’ll simply link to the example from the project repo.
Since the first article I did a bit of clean up – removing unnecessary values
(nulls, empty lists) and templating a few more things to make reuse easier.
NOTE: To work through the entire demo deployment, you will need to modify one line in the container definition – the path to the Parameter Store secret. You also need to add the secret to test retrieval. In your AWS account, go to Services > Systems Manager > Parameter Store > Create parameter. For anything sensitive, always use SecureString. If you use the same path and name, you will only need to insert your AWS Account ID in the container definition. Otherwise, adjust as needed.
To make this simpler for anyone to test drive, we’ll use the default VPC and subnets that come with new AWS accounts. If you’ve deleted those, you can use Terraform’s AWS VPC and subnet resources (or a module of your choice) to create space for our example.
A past series went into a lot of detail around creating network resources from scratch. Rather than rehash that I want to cover another common scenario – using data sources to discover existing network infrastructure. Beyond account defaults in this case, you will often have shared VPCs, subnets, NAT gateways, etc. that you can consume rather than having to re-create for each service.
data "aws_vpc" "selected" {
default = true
}
data "aws_subnet_ids" "private" {
vpc_id = data.aws_vpc.selected.id
}
The aws_subnet_ids
data source gives us a list of subnets matching specified
criteria we can use elsewhere in our configuration. We’ll use the private
subnets to house our ECS tasks. Here simply using the vpc_id
gets the job done,
but a common practice is using tags to make selection of appropriate resources
intuitive.
Before we tackle ECS itself, we need to address IAM. When deploying manually, we
leveraged the default ecsTaskExecutionRole
and fixed it up to allow access to
Parameter Store
and Secrets Manager. At the
time it was easy to (ab)use, but we called out the best practice of using
service-specific roles. As part of our automation, let’s have Terraform manage
any roles and policies for us:
resource "aws_iam_role" "app" {
name = var.role_name
description = "ECS Task Execution Role for ${var.app_name}"
force_detach_policies = true
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
tags = local.tags
}
resource "aws_iam_policy" "parameter_store_ro" {
name_prefix = "ParameterStoreRO"
description = "Grants RO access to Parameter Store"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssm:GetParameters",
"kms:Decrypt"
],
"Resource": [
"arn:aws:ssm:*:*:parameter/*",
"arn:aws:kms:*:*:key/*"
]
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "attach_parameter_store_policy" {
role = aws_iam_role.app.name
policy_arn = aws_iam_policy.parameter_store_ro.arn
}
resource "aws_iam_role_policy_attachment" "attach_aws_managed_policy" {
role = aws_iam_role.app.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
Service-specific roles allow more granular access controlWhile this is still wide open (we could further limit Parameter Store access to specific paths), it gives you a starter recipe for automating a fully functional service. Refer to the policy examples we ran through previously if you need to grant Secrets Manager access instead.
With network details gathered and IAM squared away, we’re ready to take care of ECS. As you’ll recall from previous articles, we need to create an ECR repository our ECS tasks can access. We’ll also attach a lifecycle policy to our repository to avoid old images building up and wasting space.
resource "aws_ecr_repository" "app" {
name = "${var.app_name}-${var.environment}"
image_tag_mutability = "MUTABLE"
image_scanning_configuration {
scan_on_push = true
}
tags = local.tags
}
resource "aws_ecr_lifecycle_policy" "app" {
repository = aws_ecr_repository.app.name
policy = <<EOF
{
"rules": [
{
"rulePriority": 1,
"description": "Expire untagged images older than a week",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 7
},
"action": {
"type": "expire"
}
}
]
}
EOF
}
Since we are using container insights and the awslogs driver, when we manually created the ECS service we had to make sure we created the CloudWatch Log Group or our service wouldn’t start. Now we can let Terraform manage that for us.
To make the ECS-specific bits more modular, a number of variables are used. These are referenced directly by our Terraform resources and exposed within the container definition via templatefile. Aside from the service name, region and port, the ECS CPU units, memory reserve and hard limit are configurable. This is tunable enough for most services without overwhelming the operator with excess detail. Finding the right balance reduces cognitive load for others using your automation.
We leverage a lot of defaults in the service configuration, but do pull in the subnets discovered above and expose instance details. For our simple case we’ll run a single task instance, so use instance_percent_min = 0 and instance_percent_max = 100. In the real world we could increase instance_count and adjust the percentages as needed so we can use rolling updates to avoid downtime.
resource "aws_cloudwatch_log_group" "app" {
name = "/ecs/${var.app_name}-${var.environment}"
retention_in_days = 7
tags = local.tags
}
resource "aws_ecs_cluster" "app" {
name = "${var.app_name}-${var.environment}"
tags = local.tags
setting {
name = "containerInsights"
value = "enabled"
}
}
resource "aws_ecs_task_definition" "app" {
family = "${var.app_name}-${var.environment}"
container_definitions = templatefile("container-definition.json", {
name = "${var.app_name}-${var.environment}"
environment = var.environment
image = "${aws_ecr_repository.app.repository_url}:latest"
region = var.region
port = var.container_port
cpu = var.task_cpu
memory_limit = var.task_memory_limit
memory_reserve = var.task_memory_reserve
})
task_role_arn = aws_iam_role.app.arn
execution_role_arn = aws_iam_role.app.arn
network_mode = "awsvpc"
cpu = var.task_cpu
memory = var.task_memory_limit
depends_on = [aws_cloudwatch_log_group.app]
tags = local.tags
}
resource "aws_ecs_service" "app" {
name = "${var.app_name}-${var.environment}"
cluster = aws_ecs_cluster.app.arn
task_definition = aws_ecs_task_definition.app.arn
enable_ecs_managed_tags = true
propagate_tags = "SERVICE"
launch_type = "FARGATE"
scheduling_strategy = "REPLICA"
desired_count = var.instance_count
deployment_maximum_percent = var.instance_percent_max
deployment_minimum_healthy_percent = var.instance_percent_min
network_configuration {
subnets = local.private_subnets
security_groups = [aws_security_group.app.id]
assign_public_ip = true
}
lifecycle {
ignore_changes = [desired_count]
}
depends_on = [aws_iam_role.app]
tags = local.tags
}
Refer to the project repo for the fully functional Terraform… Just adjust tfvars as needed, then you can run some tests in your account (refer to the README for specifics on configuring Terraform for use with AWS)!
❯ cat hello-world.tfvars
role_name = "helloWorldTaskExecutionRole"
region = "us-east-2"
environment = "production"
app_name = "hello-world"
container_port = 8080
task_cpu = 256
task_memory_limit = 512
task_memory_reserve = 256
instance_count = 1
instance_percent_min = 0
instance_percent_max = 100
❯ terraform init
❯ terraform plan -var-file=hello-world.tfvars -out=plan
# ...
❯ terraform apply plan
# ...
Apply complete! Resources: 13 added, 0 changed, 0 destroyed.
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
State path: terraform.tfstate
Outputs:
ecr_repo = 012345678901.dkr.ecr.us-east-2.amazonaws.com/hello-world-production
iam_role_arn = arn:aws:iam::012345678901:role/helloWorldTaskExecutionRole
iam_role_name = helloWorldTaskExecutionRole
If we curl the public IP of the deployed task on our container port, we can see the secret retrieval working (via our IAM role) as it exposes a value from Parameter Store – this is obviously only for the sake of example. Never expose secrets, including in logs!
❯ http 3.21.52.50:8080
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 47
Content-Type: text/html; charset=utf-8
Date: Fri, 01 May 2020 02:49:28 GMT
ETag: W/"2f-ZLQ5cKIxGhGpXgCP1cNhiPt9ZB0"
X-Powered-By: Express
Top secret message: HELLO FROM PARAMETER STORE!
Counting variable definitions and outputs, we’ve managed to automate away the toil of manually managing ECS-based services in less than a few hundred lines. Rather than simply recreating the exact service we deployed by clicking through the AWS console, we iterated and improved security though a service-specific IAM role and used lifecycle management to further reduce toil. Beyond the initial build, this gives us a framework we can use to continue extending our service, ensures consistency as we go, enables reuse when building similar services, and acts as documentation for ourselves or future team members – all the advantages of Infrastructure as Code.
That continues in the spirit of our minimally viable example service… In the real world you would likely have additional network configuration (perhaps an ALB fronting several tasks), more containers to manage (additional services, sidecars for monitoring or security), backing services to prepare, etc. You can keep adding these things yourself, but as the complexity grows you’ll want to consider modules. Whether you consume modules from the Terraform Registry, GitHub authors you trust, or create your own… they’ll let you avoid copying and pasting code, further ensure consistency, make reuse even easier, and allow you to build increased confidence in shared components.
Hopefully this is enough to get you started toward the nirvana of running containerized services on AWS ECS. Terraform makes the initial infrastructure build and maintenance a breeze. Once your MVP is live, you can continue shipping updates with just a few commands… It’s just a matter of building a new image with your code, pushing to ECR, and updating the ECS service to pull in the latest change. That’s too much to cram in here, but for an example of how it could work refer to the do script.
See you next time!
References
This is the final part of a multi-part series, jump to part one: