Deployments at Scale with AWS ECR and ECS

With devops at Rocket, one of our primary goals is to make the life of the developer easier. A specialty of ours is assisting in our customer's deployment process. In short, this is how a code change can safely make it from a developer's laptop all the way to a production environment where it can be seen by end users.

In a typical software project, your environments will be something along the lines of dev, staging, and prod. Maintaining a release pipeline to these is simple enough in most cases, however sometimes things can get hairy.

Let's say that a customer comes along who decides they need a separate set of environments. The new environments will need completely separate infrastructure, i.e, clusters, loadbalancers, and whatever other resources you happen to be using. Also, they have a whole list of features they want you to add to your application. Considering they have money to pay you for it, we'll call that customer Money Bags Inc, or MB for short.  

This is a big request, but sometimes it does happen for various reasons. Now the environments you need to maintain look something like dev, mb-dev, staging, mb-staging, prod, and mb-prod. If you fork your code for the new customer's features, you will be effectively developing two separate apps. Instead, you opt to add MB's features to your existing codebase. This means you will need a way to deploy your code changes to all environments. The features need to work for the new customer without impacting functionality for your existing ones. Knowing what's deployed where, and testing new features in this scenario is a whole lot to think about!

If you are facing this, you're in luck because we're going to talk about a pipeline solution for exactly that. Our solution will allow you to maintain a high level of automation as well as visibility into the state of your environments. A developer should have little concern about their code ending up in the proper place. If however they need to find some information about the state of the environment, it should be readily available, i.e, not obscured by our automation. As we are an AWS partner, the tooling focused on here will be AWS services, specifically AWS ECS and ECR. Keep in mind however that we are showing a paradigm, and the method can be implemented outside of these technologies.

The Tools

Let's just briefly discuss the tools necessary to achieve this. I won't go into too much detail about them here, as the focus of this blog is the paradigm. I will talk about the minimum you need to understand, but links are included to find out more info should you be interested.

Docker

Before we discuss the actual AWS services, let's just briefly touch on Docker, as the AWS services we'll be using are built on top of it. There are two main concepts related to Docker that we are interested in - containers and images. From Docker's documentation:

Fundamentally, a container is nothing but a running process, with some added encapsulation features applied to it in order to keep it isolated from the host and from other containers. One of the most important aspects of container isolation is that each container interacts with its own private filesystem; this filesystem is provided by a Docker image. An image includes everything needed to run an application - the code or binary, runtimes, dependencies, and any other filesystem objects required.

Also relevant to us is that images can be tagged. This is just a way of identifying a particular image. An image is not limited to just one tag, it can have many.

A lot of people introduce the concept of containers as light weight virtual machines. This isn't true, but the application of containers is similar. For more information on how to use Docker I suggest you look at their Quickstart guide. If you want to know more about the actual architecture of containers I highly recommend this video.

AWS ECR

ECR is a Docker registry service fully managed by AWS. This is what we'll use to hold our container images after they've been built. The service itself is rather simple. You can check out the getting started guide by AWS for a quick setup. Let's say we wanted to create an ECR repository for our project's api.  We could do that with the AWS CLI tool using the following command:

aws ecr create-repository --repository-name api

In our setup, ECR will be the only thing that is shared across MB and our multi-tenant environment. Each environment will make references to that single source of truth.

AWS ECS

ECS is a container orchestration service provided by AWS. This will serve as the runtime for our containers. To make it simple, this is what will execute the docker run command for us. You can check out the AWS getting started guide and documentation to learn more. What's most relevant here is that ECS uses a task definition to define how we want to run our containers. Here's a rather simple example of what one might look like.

{
  "family": "api",
  "memory": "512",
  "cpu": "256",
  "networkMode": "awsvpc",
  "taskRoleArn": "arn:aws:iam::012345678922:role/api",
  "containerDefinitions": [
    {
      "name": "api",
      "image": "012345678922.dkr.ecr.us-east-1.amazonaws.com/api:dev",
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ]
    }
  ],
  "compatibilities": [
    "FARGATE"
  ],
  "requiresCompatibilities": [
    "FARGATE"
  ]
}

You can probably guess what most of these parameters do. The one that's really relevant to us here is "image". This is set to track an image on the repository that we created earlier.

THE METHOD

Mutable Tags

Now that we have a basic understanding of our technologies, let's discuss the core the method for keeping our environments in sync. The key to this is the mutable tag. In our ECS task definition example up above, we actually had a mutable tag set in the "image" value. Let's deconstruct what we have set in that field:

012345678922.dkr.ecr.us-east-1.amazonaws.com/api:dev

The first part just references our AWS account number and region. The next bit you may recognize as the name we gave to our repository earlier, api. Finally we have the name dev. That's our mutable tag. It's simple, but actually quite powerful.

Going back to the example environments that we introduced, it is ideal to have the api service in dev and mb-dev to stay up to date with one another. When we are testing a new feature, we want to know that all environments of a given tier have the feature deployed. Since we have the mutable tag, to update both our dev environments, we just push a new image to ECR, change the dev tag to that new image, and run a simple deployment command on both our dev ECS clusters. In code, this is just a few simple steps. Here are some scripts that accomplish our process:

  • First, we build our image and push it to ECR. One thing to note is that we are first pushing with COMMIT_HASH. This creates an easy link between our images and the git commit that triggered their build.
#!/bin/bash

COMMIT_HASH=$(git rev-parse HEAD)

ECR_REPOSITORY="012345678922.dkr.ecr.us-east-1.amazonaws.com/api"

IMAGE_NAME="${ECR_REPOSITROY}:${COMMIT_HASH}"

docker build -t $IMAGE_NAME .

eval $(aws ecr get-login --no-include-email)

docker push $IMAGE_TAG
  • Next we query for the image we just pushed using the AWS CLI and attach the tag $ENVIRONMENT to it. We will want that value to vary depending on which environment we wish to push. This can be accomplished with any CI/CD tool. When we wish to push to the dev environments, we want ENVIRONMENT=dev.
#!/bin/bash

MANIFEST=$(aws ecr batch-get-image \
    --repository-name $IMAGE_NAME \
    --image-ids imageTag=$COMMIT_HASH \
    --query 'images[].imageManifest' \
    --output text)
    
aws ecr put-image \
  --region $AWS_DEFAULT_REGION \
  --repository-name $ECR_REPOSITORY\
  --image-tag $ENVIRONMENT \
  --image-manifest "$MANIFEST"
  • Finally we trigger an ECS deployment in all of our dev environments. This will make our clusters pull the new image. We need to set our access keys appropriately in order to point to the desired environment.
#!/bin/bash

export AWS_ACCESS_KEY_ID=$DEV_AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$DEV_AWS_SECRET_ACCESS_KEY

aws ecs update-service --force-new-deployment --cluster api --service api

export AWS_ACCESS_KEY_ID=$MB_DEV_AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$MBDEV_AWS_SECRET_ACCESS_KEY

aws ecs update-service --force-new-deployment --cluster api --service api

Now let's say our new feature is passing tests in both dev and mb-dev and we're ready to push to staging and mb-staging. Our configuration in ECS should be almost identical. Let's just change the dev tag to staging in our task definition. Then we just need to re-run our second and third scripts, the only difference being that we need ENVIRONMENT=staging instead of ENVIRONMENT=dev.

Immutable Tags

We can use this same method for our production environments as well, but you may want to maintain a higher level of control. A better approach here may be to use an immutable tag. For example, we can include a timestamp suffix on the tag, something like prod-11/12/20-10:30. The timestamp ensures that only one image is ever given this tag, hence "immutable". Also with the timestamp we will know exactly when an image was certified ready for production. If for some reason we need to roll back production to a specific date, we can easily identify the correct image. The tagging script for this is mostly the same, we just need to add in a new variable $MASTER_TAG.

#!/bin/bash

MASTER_TAG="master-$(date +"%m/%d/%Y-%T")"

MANIFEST=$(aws ecr batch-get-image \
    --repository-name $IMAGE_NAME \
    --image-ids imageTag=$COMMIT_HASH \
    --query 'images[].imageManifest' \
    --output text)

Conclusion

Using a few simple scripts, we have a fully automated build and deployment process. Our method is highly scalable as well. If we have another customer that comes along similar to MB, the tagging mechanism remains the same. All we need to do is introduce a new deploy command for their ECS cluster. We also have easy visibility into what's deployed where. If we need to know the state of our API container within a given environment, all we have to do is check ECR and see which mutable tags are set to a particular image. You can easily see this from the AWS console. This paradigm has been quite successful for our customers.

If you need help implementing it, please reach out to us, as well as for many other awesome devops solutions.