Clone the companion project to follow along…
So far in Part I and Part II of this series we have provisioned a multi-AZ network with custom VPC, subnets, internet gateway and routing tables then deployed highly-available, auto-scaling Linux servers using EC2 and ALB. Our multi-tier starter project only has one big piece remaining – the database!
In the final part of this series we’ll explore AWS’ Relational Database Service (RDS). More than simply moving your database to the cloud, RDS provides numerous DBaaS advantages including fault tolerance, automated backups, and easy upgrades. Before we jump in, here’s a refresher on what we’re building:
Adopting a DBaaS like RDS as part of your application architecture has many benefits. Your team can focus on their value proposition (which probably isn’t just running a database!), offload management tasks to the IaaS provider, and easily spin up new instances using common tools such as Terraform or Ansible.
If you’re still leery of putting your data in the hands of a provider, take heart… RDS has a solid track record, having been announced in 2009 and rapidly evolving since inception. From a service perspective, you have more traditional options such as MySQL or PostgreSQL or the latest offering known simply as Aurora. While Aurora is technically the new kid on the block, it’s an easy choice thanks to an impressive feature set.
Amazon Aurora features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64TB per database instance. It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across three Availability Zones (AZs).
Whichever option you choose, you’ll save a lot of time not thinking about DBA tasks and be able to easily manage your infrastructure with automation. For our simple project, we are going to provision a MySQL RDS instance using the previously configured subnets to build a database subnet group spanning all of the AZs in our region.
To save time we’ll dial back storage and backup options (even this simple case can take 10-20 minutes to deploy based on load in the target region), but discuss trade-offs as we go so you are prepared to adjust when using RDS in production!
In Part I we used cidrsubnet
to split our CIDR range into several subnets. This
allowed us to deploy a subnet in each AZ within our region, a best practice for
high availability. Aside from distributing EC2 instances across these subnets,
RDS also requires subnets spanning at least two availability zones to form
what’s known as a DBSubnetGroup
. Since we already have the subnets, the
Terraform is easy:
resource "aws_db_subnet_group" "default" {
name = "main"
subnet_ids = aws_subnet.private_subnets[*].id
}
That’s it! Aside from optional tags, DBSubnetGroups
don’t require many options. The key thing here is our familiar splat syntax to
build up subnet_ids
. Without this RDS will automatically use the project’s
default VPC (assuming you haven’t deleted it). Now we can configure our RDS
instance with just a few more lines of HCL:
resource "aws_db_instance" "rds" {
vpc_security_group_ids = [aws_security_group.mysql_ingress.id]
db_subnet_group_name = aws_db_subnet_group.default.name
maintenance_window = "Sat:00:00-Sat:03:00"
multi_az = true
allocated_storage = 10
backup_retention_period = 0
skip_final_snapshot = true
engine = "mysql"
engine_version = "5.7"
instance_class = var.db_instance_type
name = "testdb"
username = "root"
password = var.db_password
parameter_group_name = "default.mysql5.7"
}
Our choice of MySQL is made obvious by engine
, engine_version
, and
parameter_group_name
. You’ll find versions of these catered to your database
back-end of choice in the documentation.
Worth noting,
engine_version
can be specified in semver format to control
automated upgrades. If you specify “MAJOR.MINOR.PATCH”, you effectively pin the
database version. Specifying a version as “MAJOR.MINOR” will pull in PATCH
upgrades automagically. This is because allow_minor_version_upgrade
is true by
default.
Never fear… upgrades only happen during maintenance windows. In our example,
we’ve carefully tuned maintenance_window
to
ensure any upgrades or fail-overs happen outside business hours. Adjust as
needed. If you’re feeling particularly brave, you can also set
allow_major_version_upgrade = true
. Since automated maintenance often involves
provisioning and promoting new database instances, you need to ensure that all
subnets in your DBSunetGroup
have at least one free IP at all times.
For our use case we’ve provisioned a tiny amount of storage (10GB), but in
practice you will likely need more allocated_storage
. Storage will auto-scale by
default. If you want to disable this, you can set max_allocated_storage = 0
or
chose a reasonable threshold for your application (if you don’t set a threshold,
I suggest a billing alarm).
Relating to storage, anything in production will want to appropriately tune
backup_retention_period
to ensure backups are persisted. Here I’ve simply saved
free-tier space. If you do make this non-zero, you should also remove
skip_final_snapshot
which prevents the database from automatically exporting a
backup before deletion… you can’t take a final snapshot when backups are
disabled, but it’s a good safety net. For critical databases, you also
want deletion_protection = true
.
While optional, multi_az
takes full advantage of AWS’ geographic diversity to
provide more resilience. When enabled, RDS automatically replicates data to a
standby instance in a different AZ. While we’re talking about production, it’s
not shown here to save resources but you will likely want to export slow query
and error logs to S3 (enabled_cloudwatch_logs_exports = ["slowquery", "error"]
)
and enable enhanced monitoring (monitoring_interval = 60
). If you enable
enhanced monitoring, refer to the AWS documentation for more information on
configuring the required IAM role.
The last two things I’ll discuss here are the security group and secrets
management… Since our Terraform code will be committed to source control, we
obviously don’t want things like database passwords exposed. Here we simply use
a variable input (var.db_password
). This will be prompted for if not provided
via -var
or the environment. I use direnv to help in
situations like this. Once configured for your shell, it can automatically run
commands and export environment variables. This makes it easy to retrieve
secrets from Vault and have them consumed by
Terraform.
❯ cat .envrc
export TF_VAR_db_password=$(vault read secret/my-secret ...)
Last but not least, we need to configure one or more
security groups to provide as vpc_security_group_ids
. This is similar to the
process followed when we worked through the EC2 auto-scaling group:
resource "aws_security_group" "mysql_ingress" {
name = "${var.env_name}-myql-ingress-sg"
vpc_id = aws_vpc.vpc.id
ingress {
from_port = 3306
to_port = 3306
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
RDS is highly configurable, stable and performant. In less than 40 lines and 15 minutes we’ve managed to provision a fully functional MySQL database. Gone are the days of waiting on racking hardware or opening tickets!
Outside the lab you’ll want to make many of the adjustments noted above – select a suitable maintenance window, chose an appropriate amount of upgrade automation, think carefully about your backups and tune monitoring. Depending on your use case, you may also want to enable encryption. We’ve avoided that here since it would involve more moving parts (KMS and certificate management), but might revisit the topic in a dedicated article. For now, I hope you see how easy it is to embrace DBaaS thanks to AWS’ RDS!