Docker is a helpful tool for both developers and ops. It can simplify both the development of an application as well as deployment and management of it. In this post we are going to explore a common pitfall related to developing an application in Docker on the Mac and see what we can do to mitigate the issue and work as productively as possible.
Developing in Docker
Developing in Docker has a number of advantages over developing directly on your Mac. Before we begin, let’s remind ourselves of a few reasons why we might be developing in Docker:
- Easy Onboarding
- Setting up a new project can involve a number of steps. With Docker and the help of Docker Compose, this can be reduced to a single step.
- Infrastructure as Code
- The act of declaring a Dockerfile or Docker Compose file makes it explicit exactly what dependencies are required for the app to run and how they are wired up.
- When working on a number of projects with conflicting dependencies, it can be a difficult and error-prone to isolate one from another. Due to the nature of containers, Docker solves this problem for free.
- Docker helps ensure that each developer is running the application with the same version of dependencies and tools. This includes the OS, database, language version, and libraries. If you are deploying an image to production, you are also getting a development environment that more closely resembles that of production — reducing the issue of “it works on my machine!”.
While there are many more reasons to use Docker, especially in production, these are a few of the benefits you can gain from using it for development, even if its use stops there!
So What’s the Problem?
When developing in Docker, there are a couple of steps that need to take place in order to get an application running:
- Build a Docker image
- Run a container based on that image
So what happens when you make a change to your code? In order to see that change, you need to rebuild your image and start a new container. Often this is satisfactory, especially when working with compiled languages, as Docker will cache unchanged parts of an image and rebuild only those that have changed. This may mean a simple recompile of the application binary and we are off to the races.
In order to work around this, developers will often create a bind mount. This means that we specify a folder on the host machine (commonly the application working directory) and instruct Docker to keep that directory in sync with a directory in the container. This way when we make a change to a source file on the host, that change is propagated to the container without rebuilding our image, thus keeping the “hot reload” intact. Problem solved, right?
Docker works its magic by leveraging features of the linux kernel, notably namespaces (for isolation) and control groups (or cgroups — for resource management). On your Mac, these resources do not exist. Therefore, in order for Docker Desktop for Mac to function, it runs a linux virtual machine. Along with the VM comes a filesystem sharing utility called “osxfs” which is in charge of keeping the filesystem native to your Mac in sync with the linux-based filesystem of your docker containers.
This sync process comes at a cost. While Docker has made great strides in improving sync performance, the process is still much slower than running natively without syncing. The issue is compounded when you have applications that make changes to large amounts of small files, as each change made on the host needs to be detected and propagated to the container, and vice-versa.
What Can We Do?
There are a number of options to increase the speed of filesystem syncing with Docker. We will explore the following:
We will implement each strategy in a simple project to evaluate their performance, then take a look at the pros and cons of each approach.
In order to evaluate the performance of different sync strategies, we need to execute a repeatable task that results in heavy IO load. One such task is the installation of Rails. This is due to the large amount of dependencies required for installation. If we tell Docker to bind mount the gem installation directory, it will ensure any files that are created in the container during gem installation are copied to the host filesystem. Note that this is a somewhat contrived example, but it is an easy way to demonstrate how the sync process can affect the speed of IO in Docker on the Mac and consequently, the speed at which your application executes.
First, let’s create a new directory and enter it:
mkdir speedtest && cd speedtest
Next, create a simple Dockerfile within that directory:
cat << 'EOF' >> Dockerfile FROM ruby:2 WORKDIR /speedtest CMD ["/bin/bash"] EOF
This produces a Dockerfile that looks like this:
FROM ruby:2 WORKDIR /speedtest CMD ["/bin/bash"]
Now, build the image:
docker build -t speedtest:latest .
Great. We now have an image that can be run to perform our speed tests.
Establishing a Baseline
First, let's see how long it takes to install Rails without any filesystem syncing. This will establish a baseline that we can use for comparison.
docker run -it --rm speedtest:latest
This will drop us into a bash shell as specified in in the CMD portion of the Dockerfile. As for the flags?
-itis actually two separate flags:
-i: interactive. This keeps STDIN open. Without it, the container would immediately exit.
-t: pseudo-tty. Allows us to send input to the container.
--rm: tells Docker to remove the container upon exit instead of keeping it in a stopped state. We won’t need it to persist so it’s good to keep things clean and tidy.
Now, let’s install Rails and establish that baseline!
time gem install rails
On my machine, looking at the “real” time elapsed, it took about 54 seconds.
We can now exit the container.
Bind Mount: Consistent
Bind mounts have three different types in Docker: consistent, delegated, and cached. By default when a bind mount is created it is of type consistent. This means that whenever a write occurs, it is immediately reflected to the other end of the mount. Since this is the default, it is what most developers will be using when they mount their working directory. So let’s see what kind of effect this has on performance. Using the docker image from before, let’s again log into the container, only this time we will bind mount the gem home to a local directory.
First, create a local gem directory for mounting:
Now log into the container:
docker run \ --rm \ -it \ --mount \ type=bind,\ source="$(pwd)/container_gems",\ target="/usr/local/bundle" \ speedtest:latest
Now, we time the Rails install again:
time gem install rails
In this case, the installation took about 2 minutes and 55 seconds! This is an increase of 2 minutes, or about 3x slower. Ouch!
Again, although this example is contrived, you can see how this could significantly slow down execution of a dockerized application. When working with a Rails project, there are lots of small file writes taking place all the time, and when you are syncing your working directory this will slow down your application significantly. The same can be said for any other application which performs a lot of IO.
Bind Mount: Cached
As mentioned earlier, one of the options that Docker Desktop for Mac allows is setting a bind mount as type cached. What this means is that Docker will view the macOS host as the authoritative source of truth, and there could be delays before updates are visible within the container. Typically, these delays are within a second or two — not enough to matter in most cases, but as we will see it can gain us some speed increases.
Clear the local gem directory:
rm -rf container_gems && mkdir container_gems
Log into the container with a cached bind mount:
docker run \ --rm \ -it \ --mount \ type=bind,\ source="$(pwd)/container_gems",\ target="/usr/local/bundle",\ consistency=cached \ speedtest:latest
Now, running the same speed tests as before, I get 2 minutes and 4 seconds. This is certainly faster than a consistent mount, but it’s still significantly slower than our baseline.
Bind Mount: Delegated
This is similar to the cached type, but in this case the container’s filesystem has the authoritative view and updates on the host may be delayed. Running the same test as we did for the cached type, I get a result of 2 minutes and 13 seconds.
Docker Sync is a ruby gem which enables you to keep your code base in sync with the container while allowing the application to perform nearly at full speed. In short, the way it achieves this is by creating a docker volume that your app can write to at full speed. This volume is then connected to a special container which syncs that volume with the host in an asynchronous fashion. For more details, see this page.
So how does this strategy perform? Let’s take a look.
First install the gem:
gem install docker-sync
Now, create a YAML file which defines a simple docker-sync configuration:
cat << EOF >> docker-sync.yml version: 2 syncs: speedtest-sync: src: "$(pwd)/container_gems" EOF
You should end up with a file that looks something like this:
version: 2 syncs: speedtest-sync: src: "/Users/chris/code/speedtest/container_gems"
Now start up your docker container:
docker run \ --rm \ -it \ --mount \ type=volume,\ source="speedtest-sync",\ target="/usr/local/bundle",\ volume-nocopy=true \ speedtest:latest
Notice that the volume source we specify is that which we declared in the docker-sync.yml file. For more information on why we set the nocopy option, see here.
Timing the Rails install, I get about 1 minute and 1 second. This is very close to our baseline of 53 seconds!
In this case, the slow sync via "osxfs" is hidden from the application, which sees only a fast docker volume.
Mutagen is self-described as a “fast, continuous, multidirectional file synchronization tool”. Of the supported synchronization types, the one that we are interested in is its support for Docker containers. Once we start the mutagen daemon, we'll simply tell it to create a synchronization session between our local code and a remote path on the docker container. Mutagen will then seamlessly copy an agent binary into the container which will communicate with the host to keep things in sync. You can learn more about how mutagen works here with Docker-specific information available here.
Without further ado, let’s get things running and see how it performs.
First we will need to install the agent binary:
brew install havoc-io/mutagen/mutagen
(Note: this steps assumes you have the homebrew package manager installed. If not, see here.)
Next, start the mutagen daemon:
mutagen daemon start
Now we’ll need to start our container. This step is simple— like in our baseline step, there is no need to mount any volumes. The only difference is that we will give a name to the container so that we can reference it later.
docker run -it --rm --name speedtest speedtest:latest
Now, we tell mutagen to keep things in sync:
mutagen create ./container_gems/ \ docker://speedtest/usr/local/bundle
That’s it! We can now time our Rails install as we did in previous steps. I get about 55 seconds. Taking into account margin for error, this is about the same as our baseline!
For reference, there are a few other mutagen commands worth knowing.
mutagen list will list all sync sessions and their statuses,
mutagen monitor shows a dynamic status display for a single session, allowing you see if things are working, and
mutagen terminate will permanently stop synchronization. Instead of terminating, you can
resume as well. Lastly, it’s worth mentioning the
-i flag of
mutagen create. With it, you can tell mutagen to ignore sync on certain directories. If you are running a Rails app, for example, it might be a good idea to specify something like the following:
mutagen create ./my-app docker://my-app/app-dir -i tmp -i log
as these are directories that are often written to but have little value in syncing.
We’ve talked a bit about the results of the different sync methods we have tried, but let’s take a closer look. Here we can see the execution time of each strategy:
It’s clear that a normal bind mount makes a significant dent in IO performance. By instructing Docker to favor host or container consistency we can easily gain some speed. Introducing a third-party tool to the mix allows us to significantly improve performance on top of that.
Let’s take a quick look at the pros and cons of each approach:
Bind Mount (Cached & Delegated)
- Comes integrated with Docker out of the box.
- Simple to adapt to current code; as easy as adding a flag!
- Performance still isn’t great.
- If thinking about using this option, you’ll most likely want to use the cached option over delegated. This is because the use-case of mounting the application working directory is usually to sync changes from the host to the container. Therefore, it makes sense to use the cached option as it favors consistency on the host side.
- Established project.
- Good performance.
- Easy to set up simple cases while also supporting complex setups.
- Works across multiple platforms. Install the gem and add the configuration files and it should just work.
- Complex sync strategy leaves greater risk of errors.
- Pollutes Docker with extra volumes and containers used only to sync code.
- Can cause heavy resource usage in certain situations.
- Fastest solution available.
- Easy to set up.
- Supports local filesystem sync and sync over remote SSH sessions in addition to docker-based syncing.
- Requires manual setup.
- Still considered early beta with a single contributor at time of writing.
Docker is a great tool for developing applications. While using Docker, it often makes sense to create a bind mount to ensure that changes to your local codebase are immediately reflected into your application container. By default, however, doing so can create significant application performance issues.
Adding a simple flag to your volume mounts is an easy way to help mitigate the issue. For the fastest possible speeds, look towards a third-party tool such as docker-sync or mutagen. In many — but not all! — cases the small effort required to implement one of these solutions will pay off greatly with faster application performance and, as a result, increased developer productivity.
Hopefully this guide helps you to choose the best option for your application. Personally, I prefer Mutagen for its speed and flexibility of use cases. The great thing is that all of the available options are easy to implement and switch between, so if one solution doesn’t work out it’s easy to try another!