Terraforming AWS: Part III

Clone the companion project to follow along…

So far in Part I and Part II of this series we have provisioned a multi-AZ network with custom VPC, subnets, internet gateway and routing tables then deployed highly-available, auto-scaling Linux servers using EC2 and ALB. Our multi-tier starter project only has one big piece remaining – the database!

In the final part of this series we’ll explore AWS’ Relational Database Service (RDS). More than simply moving your database to the cloud, RDS provides numerous DBaaS advantages including fault tolerance, automated backups, and easy upgrades. Before we jump in, here’s a refresher on what we’re building:

Our N-Tier Goal

Overview

Adopting a DBaaS like RDS as part of your application architecture has many benefits. Your team can focus on their value proposition (which probably isn’t just running a database!), offload management tasks to the IaaS provider, and easily spin up new instances using common tools such as Terraform or Ansible.

If you’re still leery of putting your data in the hands of a provider, take heart… RDS has a solid track record, having been announced in 2009 and rapidly evolving since inception. From a service perspective, you have more traditional options such as MySQL or PostgreSQL or the latest offering known simply as Aurora. While Aurora is technically the new kid on the block, it’s an easy choice thanks to an impressive feature set.

Amazon Aurora features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across three Availability Zones (AZs).

Whichever option you choose, you’ll save a lot of time not thinking about DBA tasks and be able to easily manage your infrastructure with automation. For our simple project, we are going to provision a MySQL RDS instance using the previously configured subnets to build a database subnet group spanning all of the AZs in our region.

To save time we’ll dial back storage and backup options (even this simple case can take 10-20 minutes to deploy based on load in the target region), but discuss trade-offs as we go so you are prepared to adjust when using RDS in production!

Code

In Part I we used cidrsubnet to split our CIDR range into several subnets. This allowed us to deploy a subnet in each AZ within our region, a best practice for high availability. Aside from distributing EC2 instances across these subnets, RDS also requires subnets spanning at least two availability zones to form what’s known as a DBSubnetGroup. Since we already have the subnets, the Terraform is easy:

resource "aws_db_subnet_group" "default" {
  name       = "main"
  subnet_ids = aws_subnet.private_subnets[*].id
}

That’s it! Aside from optional tags, DBSubnetGroups don’t require many options. The key thing here is our familiar splat syntax to build up subnet_ids. Without this RDS will automatically use the project’s default VPC (assuming you haven’t deleted it). Now we can configure our RDS instance with just a few more lines of HCL:

resource "aws_db_instance" "rds" {
  vpc_security_group_ids  = [aws_security_group.mysql_ingress.id]
  db_subnet_group_name    = aws_db_subnet_group.default.name
  maintenance_window      = "Sat:00:00-Sat:03:00"
  multi_az                = true
  allocated_storage       = 10
  backup_retention_period = 0
  skip_final_snapshot     = true
  engine                  = "mysql"
  engine_version          = "5.7"
  instance_class          = var.db_instance_type
  name                    = "testdb"
  username                = "root"
  password                = var.db_password
  parameter_group_name    = "default.mysql5.7"
}

Our choice of MySQL is made obvious by engine, engine_version, and parameter_group_name. You’ll find versions of these catered to your database back-end of choice in the documentation. Worth noting, engine_version can be specified in semver format to control automated upgrades. If you specify “MAJOR.MINOR.PATCH”, you effectively pin the database version. Specifying a version as “MAJOR.MINOR” will pull in PATCH upgrades automagically. This is because allow_minor_version_upgrade is true by default.

Never fear… upgrades only happen during maintenance windows. In our example, we’ve carefully tuned maintenance_window to ensure any upgrades or fail-overs happen outside business hours. Adjust as needed. If you’re feeling particularly brave, you can also set allow_major_version_upgrade = true. Since automated maintenance often involves provisioning and promoting new database instances, you need to ensure that all subnets in your DBSunetGroup have at least one free IP at all times.

For our use case we’ve provisioned a tiny amount of storage (10GB), but in practice you will likely need more allocated_storage. Storage will auto-scale by default. If you want to disable this, you can set max_allocated_storage = 0 or chose a reasonable threshold for your application (if you don’t set a threshold, I suggest a billing alarm).

Relating to storage, anything in production will want to appropriately tune backup_retention_period to ensure backups are persisted. Here I’ve simply saved free-tier space. If you do make this non-zero, you should also remove skip_final_snapshot which prevents the database from automatically exporting a backup before deletion… you can’t take a final snapshot when backups are disabled, but it’s a good safety net. For critical databases, you also want deletion_protection = true.

While optional, multi_az takes full advantage of AWS’ geographic diversity to provide more resilience. When enabled, RDS automatically replicates data to a standby instance in a different AZ. While we’re talking about production, it’s not shown here to save resources but you will likely want to export slow query and error logs to S3 (enabled_cloudwatch_logs_exports = ["slowquery", "error"]) and enable enhanced monitoring (monitoring_interval = 60). If you enable enhanced monitoring, refer to the AWS documentation for more information on configuring the required IAM role.

The last two things I’ll discuss here are the security group and secrets management… Since our Terraform code will be committed to source control, we obviously don’t want things like database passwords exposed. Here we simply use a variable input (var.db_password). This will be prompted for if not provided via -var or the environment. I use direnv to help in situations like this. Once configured for your shell, it can automatically run commands and export environment variables. This makes it easy to retrieve secrets from Vault and have them consumed by Terraform.

❯ cat .envrc
export TF_VAR_db_password=$(vault read secret/my-secret ...)

Last but not least, we need to configure one or more security groups to provide as vpc_security_group_ids. This is similar to the process followed when we worked through the EC2 auto-scaling group:

resource "aws_security_group" "mysql_ingress" {
  name   = "${var.env_name}-myql-ingress-sg"
  vpc_id = aws_vpc.vpc.id

  ingress {
    from_port   = 3306
    to_port     = 3306
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Conclusion

RDS is highly configurable, stable and performant. In less than 40 lines and 15 minutes we’ve managed to provision a fully functional MySQL database. Gone are the days of waiting on racking hardware or opening tickets!

Outside the lab you’ll want to make many of the adjustments noted above – select a suitable maintenance window, chose an appropriate amount of upgrade automation, think carefully about your backups and tune monitoring. Depending on your use case, you may also want to enable encryption. We’ve avoided that here since it would involve more moving parts (KMS and certificate management), but might revisit the topic in a dedicated article. For now, I hope you see how easy it is to embrace DBaaS thanks to AWS’ RDS!