Optimising Docker builds with multi-stage targers

4 July 2024

I've found that one of the most powerful features in Docker is the ability to use multi-stage builds. This can significantly improve your development workflow and optimize your production containers, by allowing extra dependencies locally, without bloating the final production image.

Understanding Multi-Stage Builds

Multi-stage builds allow you to use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don't want in the final image. You can also build specific parts of the Docker file, by specifying which target you wish to build. With this you can have a single Dockerfile for both development and production, without needing to duplicate their contents.

Separating Dev and Prod Dependencies

One of the key benefits of multi-stage builds is the ability to separate your development and production dependencies. This is particularly useful in Python projects where you often have different requirements for development (like testing and linting tools) and production.

Let's look at an example Dockerfile:

# Stage 1: Base image for both dev and prod
FROM python:3.12 AS base

RUN pip install requests

# Stage 2: Production build
FROM base AS prod
ENV IS_PROD=true

# Stage 3: Development dependencies
FROM base AS dev
RUN pip install pytest

ENV IS_PROD=false

In this Dockerfile, we have three stages:

base: A common base stage with Python 3.12.
prod: Includes only production dependencies.
dev: Includes all dependencies (including development ones).

Building Specific Targets

To build a specific target, you can use the --target flag:

# Build the development image
docker build --target dev -t myapp:dev .

# Build the production image
docker build --target prod -t myapp:prod .

Default Build Behavior

It's important to note that if you don't specify a target, Docker will build the last stage by default. In our example, if you run docker build . without any flags, it will build the dev stage.

Benefits of Multi-Stage Builds

Let's dive deeper into why using multi-stage builds is beneficial:

Smaller Production Images:
- By separating dev and prod dependencies, your production image only contains what's necessary to run the application, resulting in a smaller image size.
- A typical Python web application might require development tools like pytest, flake8, and mypy, which are not needed in production. By excluding these, you can reduce your image size.
- This can be beneficial for services like AWS Fargate and AWS Lambda, as it reduces the load time for the services to download the container image.
Improved Security:
- Fewer dependencies in your production image means a reduced attack surface.
- Example: Development tools often have their own dependencies and potential vulnerabilities. By excluding them from your production image, you minimize the risk of these vulnerabilities being exploited.
- You can also use multi-stage builds to run security scans on your code before creating the final production image, ensuring that only vetted code makes it to production.
Faster Builds and Deployments:
- Smaller images are quicker to build, push, and pull, speeding up your CI/CD pipelines.
- In a microservices architecture with dozens of services, reducing each container size by 100MB can save gigabytes of data transfer and storage, significantly speeding up deployments.
- Faster builds mean quicker feedback loops for developers.
Consistency:
- Using the same base image for both development and production ensures consistency across environments.
- This reduces "it works on my machine" problems by ensuring that the development environment closely mirrors production.
Flexibility:
- You can easily switch between development and production builds without maintaining separate Dockerfiles.
- This simplifies your build process and reduces the chance of discrepancies between environments.
Optimized Build Cache:
- Multi-stage builds allow you to optimize your build cache more effectively.
- You can structure your Dockerfile so that layers that change less frequently (like installing dependencies) are earlier in the file, while layers that change more often (like copying your application code) are later.
- This means that subsequent builds can reuse cached layers more effectively, speeding up your build process.
Easy Integration of Build Tools:
- You can use specialized build tools or compilers in early stages without bloating your final image.

Drawbacks

One of the biggest drawbacks using multi-stage builds, is the inability to re-use sections in the middle of a stage.

For instance, typically I would want to have dev and prod images separated only by the dependencies they install. However to produce an optimise build, you would want to add you application code after building the dependencies. This would therefore mean you would have to duplicate the COPY commands in both the dev and prod stages.

Multi-stage builds in Docker offer a powerful way to optimize your container images and streamline your development workflow. By separating your dev and prod dependencies, you can ensure that your production containers are lean, secure, and efficient, while still maintaining a robust development environment.