Lean IAM Management

aws go iam oss

Part of DevOps culture is avoiding tickets when automation can provide lower-friction alternatives. Similarly, the SRE mindset seeks to eliminate toil. We also know from experience that the most effective Agile teams are granted a high level of autonomy.

IAM is a routine stumbling block to autonomy. Teams need an “appropriate” level of access to do their job. That often turns into waiting on tickets, with another team twiddling bits to unblock development. We say we hire people we can trust, then increase toil and wait time by taking decision making out of the hands of those closest to the problem.

In our defense, IAM is scary because it involves a lot of moving parts. We’re not trying to block work, we just want adequate controls. How can we do better? Lean thinking tells us to observe, explore the problem space, and run experiments starting with the simplest solution which adds value.

There is no such thing as one size fits all. For context, we manage a lot of AWS accounts (following the best practice of sandboxing services in dedicated accounts to limit blast radius). A central “IAM account” houses user-related resources, and cross-account role assumption provides access to account resources.

The IAM account has conveniently-named groups (team-foo, team-bar) to make it easier for us to know what groups do at a glance, enable discovery, and ensure more sensible access requests. Within an account, teams are autonomous. They have the most context about their services, and we trust them to self-manage custom policies or other resources specific to their accounts.

As we scaled and started fielding more access requests, we investigated alternatives to traditional RBAC. ABAC has a lot of promise, and this is honestly where we want to be some day, but it is a form of complexity (changing the status quo) and still has gaps in service support. While an interesting approach, ABAC effectively changes the shape of the problem but still requires tooling to manage access.

Next we looked at ConsoleMe to provide self-service management of IAM resources. The sheer scope of features equates to complexity that led to cost/benefit discussions. It is an amazingly-capable tool that anyone tackling IAM management should consider, but putting our lean product hat on we had to ask, “Is this the simplest solution to our problem?”

We didn’t need cross-account discovery, control of per-account resources, or many other advanced features ConsoleMe provides. We just need to eliminate tickets for group membership changes! Having an IT helpdesk add new team members to groups provided no value. Team members know best who should be in team-specific groups. CloudTrail provides an audit log. We wanted a simple way for group members to add additional members (GitHub team maintainer model).

Enter grouper, a lightweight API written in Go that handles this one aspect of AWS IAM management… It’s stateless, can be easily extended (it is an intentionally lean microservice, but uses Gin so you can quickly bolt on new endpoints), provides an audit trail (CloudWatch, configurable Slack webhook), and is a twelve-factor app that can be hosted atop your favorite container orchestrator.

If this sounds interesting, browse the README for more detail on current functionality and how to deploy. If you have questions or feature requests, open issues or submit PRs.