Software Consulting Services

6 recommendations for your Docker containers

Tags: Technologies
docker

 

This article lists various simple steps you can take to improve your Docker containers, including reducing build time and size, and enhancing security.

 

Using Layers for faster builds

 

Docker layers can be thought of as different filesystem states generated during a container's build. Layers represent different stages, or snapshots (as they are immutable), of different steps in the build process.

 

Layers are normally created automatically by various instructions in the Dockerfile that change the filesystem. The main instructions that create layers are RUN, ADD and COPY.

 

Our main use for layers is to improve build time and cost. When Docker is able to tell that a layer in the build process will remain the same, it skips the steps needed to generate it. One situation in which Docker knows that a layer doesn't need to be generated again is when a instruction is run and the filesystem hasn't changed since the last time that same instruction was run. For example, if a dependency installation command is run with the same lockfile, we can let Docker know that there is no need to run it again.

 

We can leverage this by splitting steps in the Dockerfile so that more layers are generated. For example, consider the following Dockerfile:

 

FROM node:latest
COPY . .
RUN npm install && npm run build
CMD ["npm", "start"]

 

In this file, we have a single RUN instruction that modifies the filesystem in two different ways. First, an npm install command which adds Javascript modules to the filesystem, and then another step which generates dist files and also creates more changes in the filesystem. Since it's a single RUN instruction, only one layer is generated here. This means that even if the files outputted by npm install remain the same, Docker will run both npm install and npm run build.

 

We can improve this by splitting the RUN instruction as follows:

 

FROM node:latest
COPY package.json package-lock.json .
RUN npm install
COPY . .
RUN npm run build
CMD ["npm", "start"]

 

Now, different layers are generated for each npm command. If the npm install command's output is known not to change, Docker will not re-generate its layer, saving time. Note that we also split the COPY instruction, which also creates more layers. We only copy the dependency files (package.json, package-lock.json) first, so that Docker sees that they remain the same as the previous build when it's time to run npm install, and this way, it's clear that the npm install layer doesn't need to be generated again.

 

In order to see details regarding your container's layers, the docker history <image> command may be used to display the different build process steps. Note that steps that have a size of 0 bytes are not relevant regarding layers since they contain no changes to the filesystem.

 

The .dockerignore file

 

By default, Docker will copy almost all files when running an instruction such as COPY . ., with only few exceptions, such as the Dockerfile itself. This can lead to unnecessary files remaining in the final image. Normally, we would only want to copy files that are relevant to the build process. One way to avoid this problem is with a .dockerignore file. Similar to .gitignore, Docker will ignore files and directories included in .dockerignore. An example .dockerignore file is as follows:

 

**/.git
dist
node_modules
README.md

 

When it comes to certain files and directories such as node_modules, the .dockerignore file is especially useful, because they can harm the build process and introduce unexpected behavior. However, in other cases, an even better solution can be used to remove unnecessary files from the final image, that being multi-stage builds.

 

Multi-stage builds

 

Multi-stage builds allow you to create intermediate images that are only used for producing files. You can then copy only the required files into the final image. These intermediate images are discarded and not included in the final build. Multi-stage builds are a best practice when it comes to image size optimization.

 

Let's say you have certain files which are required only for the build process, but are not needed at runtime, such as a source files in a compiled project. You might find yourself removing the unnecessary src/ directory after it's been used. See the following example Dockerfile for a trivial Go executable:

 

FROM golang:latest
WORKDIR /app
# This is also an example of layer use.
COPY go.mod go.sum .
RUN go mod download
COPY main.go .
RUN go build -o ./myapp
# The above command creates a binary executable "myapp"
# Here, we remove the source files, which are not needed in the final image.
RUN rm main.go mod.go
CMD ["/app/myapp"]

 

Note: for brevity, this might not be an optimal Go build command for your project.

 

Notice how we manually run rm to remove unwanted files from the final build. In this case, .dockerignore is of no help because files such as main.go are required during the build process. This is where multi-stage builds come in.

 

Multi-stage builds occur when we specify multiple FROM instructions. The last FROM instruction in the Dockerfile accounts for the final image, and all others are discarded after build. We can use the COPY instruction to bring files built in the intermediate images into the final image. We can rewrite the Dockerfile as follows:

 

FROM golang:latest AS builder
WORKDIR /build
COPY go.mod go.sum .
RUN go mod download
COPY main.go .
RUN go build -o ./myapp
FROM alpine:latest
WORKDIR /app
COPY --from=builder /build/myapp .
CMD ["/app/myapp"]

 

Now with two different FROM instructions, we have a multi-stage build. The first stage has a FROM instruction using AS to specify a name, which is useful for referring to different images during build. In the first image, we run build commands as usual, creating a single binary executable file which is the only file that we need in the final image. We then create another stage, this one being the last, which only copies the myapp file into the final image. The COPY instruction here accepts a --from parameter which specifies which intermediate image we are referring to. A multi-stage build can include any number of stages, or in other words, any number of intermediate images.

 

Furthermore, note that the final stage does not use the same base image as the builder stage. It uses a minimal alpine image without the Go compiler, because it is not needed anymore. This can be even better, though, which brings us to the next section regarding optimal base images.

 

Optimal base images

 

You may have noticed the popularity of base images based on Linux distributions such as Alpine, which, in comparison to tried-and-true options such as Debian, are much smaller, offer alternative default packages, and may be more secure by having less dependencies. While going in-depth into the pros-and-cons of using one image over another is beyond the scope of this article, it is generally agreed that an optimal image should be smaller and contain less programs when possible, both for performance and security reasons. The following sections are about individual alternative base images, but keep in mind this is by no means an exhaustive list.

 

Note: For cases such as Node.js, where dependencies such as a runtime program are still required in the final image, you can consider exploring alternative tags. For instance, the node:25 image has various alternatives, including, for instance, Alpine-based images (eg. node:25-alpine).

 

In order to create a mostly-fair comparison, the following is a list of metrics of our previous Go executable Dockerfile when built using the debian:trixie-slim base image for the final stage. Each section will cover the same metrics as well:

 

Sample app:
Base image: debian:trixie-slim
Size: 83.3 MB
Package count: 80

 

Alpine

 

Alpine is a much smaller image based on the Linux distribution of the same name. It is a popular alternative to Debian focused on security and minimal build size. Alpine includes alternative foundational libraries, in particular, musl libc instead of GNU libc, and is missing various packages available in Debian by default. This can require additional build steps for certain use cases, but in general, the security and build size advantages are worth considering.

 

Sample app:
Base image: alpine:3.23.2
Size: 10.7 MB
Package count: 18

 

Scratch

 

The scratch image is extremely minimal and is essentially an empty filesystem. It has no available tags. It isn’t based on any Linux distribution, and includes only two packages, one being stdlib, and the other being our executable, and is therefore extremely lightweight. scratch is ideal when your application has no runtime dependencies. We can use this for our sample app because Go compiles to a statically linked binary in this case, and we also happen to need no additional runtime dependencies. A notable feature of scratch is that it doesn't include libc, a package manager, an etc/passwd file, or a shell program, which decreases the image's surface of attack and size.

 

However, for many applications, scratch might be too minimal. For example, applications that need CA certificates or libc would require extra steps, and an option such as Distroless can be preferred instead.

 

Sample app:
Base image: scratch
Size: 2.25 MB
Package count: 2

 

Distroless

 

Google Distroless images are available from the gcr.io container registry and are similar to scratch, but include libc and CA certs, among other very small utilities. The difference between scratch is almost none, so Distroless images can be used with scratch is insufficient. Distroless images come in variants in order to accommodate various runtimes, such as Java and Node, and not only statically linked binaries.

 

Sample app:
Base image: gcr.io/distroless/static
Size: 4.33 MB
Package count: 6

 

Inspecting included packages

 

In the previous section, you might have noticed we provided metrics regarding each image's package count. This is possible thanks to the Software Bill of Materials, or SBOM. We can use programs such as anchore/syft (free and open source) to analyze a container's final build to then generate an SBOM. You can think of an SBOM as a detailed report of a piece of software's license, version, among others, but most importantly, a list of software packages that it depends on, also including their versions.

 

For example, we can generate an SBOM for our sample image's alpine:3.23.2 variant using syft as such:

 

docker build -t gosampleapp .
# this outputs to terminal, but other outputs are available too.
syft gosampleapp

 

This is the output:

 

✔ Loaded image
   (...)
 ✔ Parsed image
   (...)
 ✔ Cataloged contents
   ├── ✔ Packages                        [18 packages]
   ├── ✔ Executables                     [18 executables]
   ├── ✔ File metadata                   [80 locations]
   └── ✔ File digests                    [80 files]
NAME                    VERSION      TYPE
alpine-baselayout       3.7.1-r8     apk
alpine-baselayout-data  3.7.1-r8     apk
alpine-keys             2.6-r0       apk
alpine-release          3.23.2-r0    apk
apk-tools               3.0.3-r1     apk
busybox                 1.37.0-r30   apk
busybox-binsh           1.37.0-r30   apk
ca-certificates-bundle  20251003-r0  apk
libapk                  3.0.3-r1     apk
libcrypto3              3.5.4-r0     apk
libssl3                 3.5.4-r0     apk
musl                    1.2.5-r21    apk
musl-utils              1.2.5-r21    apk
test.com/test           UNKNOWN      go-module  # <-- This is our application.
scanelf                 1.3.8-r2     apk
ssl_client              1.37.0-r30   apk
stdlib                  go1.25.5     go-module
zlib                    1.3.1-r2     apk

 

The SBOM report can be used for multiple purposes. In this case, we can use it to get an overview of packages included in our final build. We can also use this to identify unwanted or outdated dependencies, and especially, common security vulnerabilities, as we'll see in the next section.

 

Note that while this article covers only syft, it isn't the only option when it comes to SBOM generation and vulnerability analysis. Docker also provides the Docker Scout tool, and there are multiple enterprise alternatives outside the realm of open source.

 

Checking for vulnerabilities

 

Adjacent to syft is anchore/grype. We can use grype mainly to obtain a report of CVEs (Common Vulnerabilities and Exposures). It can scan an image directly, or an SBOM generated by syft. We can use the following basic commands to get started with grype:

 

docker build -t gosampleapp .
# output the SBOM to a json file
syft gosampleapp -o table -o spdx-json=sbom.spdx.json
grype sbom:./sbom.spdx.json

 

First we create a new SBOM and make sure it's in the spdx format. CycloneDX is also supported. With the -o table option we can also ensure the SBOM is shown in the terminal for our convenience.

 

Then, we run grype with the generated SBOM file as input. An output similar to the following should be printed to terminal:

 

✔ Vulnerability DB                [updated]
 ✔ Scanned for vulnerabilities     [0 vulnerability matches]
   ├── by severity: 0 critical, 0 high, 0 medium, 0 low, 0 negligible
No vulnerabilities found

 

SBOM tools such as grype can also be integrated into CI pipelines, but note that it's possible for them to produce noisy output and false positives depending on your project, especially when scanning images that have large amounts of dependencies. In grype's case, there are ways to filter the output and decide which issues require your attention.

 

We recommend you on video