Contents

Squashing Docker Images

I was wondering the effect of merging layers (squashing) on the size of an image. Now Docker provides an experimental --squash option for the build.

Squash newly built layers into a single new layer

In order to illustrate the impact I have chosen a simple example that has room to be improved by squashing layers. The example installs pandas in a miniconda image for Python 3.

FROM continuumio/miniconda3:latest

RUN conda install --quiet --yes 'pandas'
RUN  conda clean --all -f -y

CMD ["python", "-c", "import pandas; print(pandas.__version__);"]

Let’s build and run it.

$ docker build --force-rm -t pandas .

$ docker run --rm pandas

# 1.0.1

So far so good. Checking the size of the image and the corresponding layers.

$  docker images --filter=reference='pandas'                    

# REPOSITORY  TAG     IMAGE ID      CREATED        SIZE
# pandas      latest  6a097f0c0eae  3 minutes ago  1.48GB

# 5 Layers
$ docker inspect --format '{{range .RootFS.Layers}}{{printf "%s\n" .}}{{end}}' pandas
 
# sha256:2db44bce66cde56fca25aeeb7d09dc924b748e3adfe58c9cc3eb2bd2f68a1b68
# sha256:3ee7190fd43a351744a978485681a07109d66997c522ad3583860965439e1828
# sha256:5215cb249b792178bbfd0562910c157435697f159508635da513e6a0709869b6
# sha256:4af92f6fd12d224d7df98a054c53168f2d3594d8cd70c8ba0c7e28df31ac7858
# sha256:f22c6f2f28a966a5b2d62448f7dda9364595fa6daf9769486654ee108366ce80

# The layers at the bottom are the layers of Miniconda base image
$ docker history pandas

# IMAGE         CREATED         CREATED BY                                      SIZE                COMMENT
# 6a097f0c0eae  14 minutes ago  /bin/sh -c #(nop)  CMD ["python" "-c" "impor…   0B                  
# e62ed6b5d6d8  16 minutes ago  /bin/sh -c conda clean --all -f -y              0B                  
# 2fe14c3fd8e7  16 minutes ago  /bin/sh -c conda install --quiet --yes 'pand…   1.05GB              
# 406f2b43ea59  4 months ago    /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B                  
# <missing>     4 months ago    /bin/sh -c wget --quiet https://repo.anacond…   151MB               
# <missing>     4 months ago    /bin/sh -c apt-get update --fix-missing &&  …   210MB               
# <missing>     4 months ago    /bin/sh -c #(nop)  ENV PATH=/opt/conda/bin:/…   0B                  
# <missing>     4 months ago    /bin/sh -c #(nop)  ENV LANG=C.UTF-8 LC_ALL=C…   0B                  
# <missing>     5 months ago    /bin/sh -c #(nop)  CMD ["bash"]                 0B                  
# <missing>     5 months ago    /bin/sh -c #(nop) ADD file:1901172d265456090…   69.2MB     

Summary

  • Size: 1.48GB – ouch
  • Nb layers: 5

Squash

Let’s try the new --squash option. To be able to use this option you have to turn on experimental feature on the Docker daemon.

# The build with the squash option
$ docker build --force-rm --squash -t pandas .

$  docker images --filter=reference='pandas'  

# REPOSITORY  TAG     IMAGE ID      CREATED        SIZE
# pandas      latest  82f65007af87  3 minutes ago  1.29GB

# 4 layers
$ docker inspect --format '{{range .RootFS.Layers}}{{printf "%s\n" .}}{{end}}' pandas 

# sha256:2db44bce66cde56fca25aeeb7d09dc924b748e3adfe58c9cc3eb2bd2f68a1b68
# sha256:3ee7190fd43a351744a978485681a07109d66997c522ad3583860965439e1828
# sha256:5215cb249b792178bbfd0562910c157435697f159508635da513e6a0709869b6
# sha256:322504279862275d0a2e00aaec5f41eb7009765baff817e6347bd3992082f17e

$ docker history pandas 

# IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
# 82f65007af87        7 minutes ago                                                       858MB               merge # sha256:5a32e51cdbb504aa518d92847a98b00f6cd11fb5dcd33a3903daae6197c5283a to sha256:406f2b43ea59a121345b188cc94595c539014c5b644bf95c61458a9b5b2905ba
# <missing>           11 minutes ago      /bin/sh -c #(nop)  CMD ["python" "-c" "impor…   0B                  
# <missing>           11 minutes ago      /bin/sh -c conda clean --all -f -y              0B                  
# <missing>           11 minutes ago      /bin/sh -c conda install --quiet --yes 'pand…   0B                  
# <missing>           4 months ago        /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B                  
# <missing>           4 months ago        /bin/sh -c wget --quiet https://repo.anacond…   151MB               
# <missing>           4 months ago        /bin/sh -c apt-get update --fix-missing &&  …   210MB               
# <missing>           4 months ago        /bin/sh -c #(nop)  ENV PATH=/opt/conda/bin:/…   0B                  
# <missing>           4 months ago        /bin/sh -c #(nop)  ENV LANG=C.UTF-8 LC_ALL=C…   0B                  
# <missing>           5 months ago        /bin/sh -c #(nop)  CMD ["bash"]                 0B                  
# <missing>           5 months ago        /bin/sh -c #(nop) ADD file:1901172d265456090…   69.2MB   

Obviously layers at the bottom coming from the base image are not squashed. In the history we can see an additional step (at the top) with a comment merge sha256xx to sha256:yyy. This is where the squash has occurred.

Summary

  • Size: 1.29GB -> ~ -190MB (-13%) not so bad
  • Nb layers: 4

Explanation

The size reduction is not magical. Squashing layers avoid to store one layer before the clean step and one layer after. In consequence only the cleaned layer is kept and so the saving is directly related to the cleaning. We can check if this hypothesis is valid by performing the clean manually in the image.

$ docker run --rm -it pandas bash

# Checking the size before cleaning
$ du -s -B MB /opt/conda

# 1221MB  /opt/conda

# Cleaning
$ conda clean --all -q -f -y

# Checking the size before cleaning
$ du -s -B MB /opt/conda

# 1026MB  /opt/conda

It’s consistent since we can see that the ~190 MB are saved by the clean step.

Alternatives

There is some alternatives to the --squash options however it’s always a tradeoff with something else.

Crafting Dockerfiles

Best practices on building dockerfiles have led to avoid this inconvenient by using big one-liner commands. The drawback is that dockerfiles loose in readability and some of them end with endless one-liner spanning on multiple lines–it’s a paradox.

FROM continuumio/miniconda3:latest

# Only one layer
RUN conda install --quiet --yes 'pandas' && \
    conda clean --all -f -y

CMD ["python", "-c", "import pandas; print(pandas.__version__);"]

Let’s see the result with this well crafted Dockerfile.

# No size overhead
$ docker images --filter=reference='pandas'          

# REPOSITORY  TAG     IMAGE ID      CREATED        SIZE
# pandas      latest  6db3ad2d6438  5 seconds ago  1.29GB

# 4 layers
$ squash docker inspect --format '{{range .RootFS.Layers}}{{printf "%s\n" .}}{{end}}' pandas 

# sha256:2db44bce66cde56fca25aeeb7d09dc924b748e3adfe58c9cc3eb2bd2f68a1b68
# sha256:3ee7190fd43a351744a978485681a07109d66997c522ad3583860965439e1828
# sha256:5215cb249b792178bbfd0562910c157435697f159508635da513e6a0709869b6
# sha256:6b9cf98f6831e55f95fe7f5876ccc9befe648504921e2032ec45f80ae4995ec9

$ squash docker history pandas 

# IMAGE         CREATED         CREATED BY                                      SIZE   COMMENT
# 6db3ad2d6438  32 seconds ago  /bin/sh -c #(nop)  CMD ["python" "-c" "impor…   0B     
# 2f4c8e61f521  33 seconds ago  /bin/sh -c conda install --quiet --yes 'pand…   858MB  
# 406f2b43ea59  4 months ago    /bin/sh -c #(nop)  CMD ["/bin/bash"]            0B     
# <missing>     4 months ago    /bin/sh -c wget --quiet https://repo.anacond…   151MB  
# <missing>     4 months ago    /bin/sh -c apt-get update --fix-missing &&  …   210MB  
# <missing>     4 months ago    /bin/sh -c #(nop)  ENV PATH=/opt/conda/bin:/…   0B     
# <missing>     4 months ago    /bin/sh -c #(nop)  ENV LANG=C.UTF-8 LC_ALL=C…   0B     
# <missing>     5 months ago    /bin/sh -c #(nop)  CMD ["bash"]                 0B     
# <missing>     5 months ago    /bin/sh -c #(nop) ADD file:1901172d265456090…   69.2MB 

As expected the result is the same with a crafted Dockerfile.

Multi-stage builds

The idea is to use a multi-stage build to copy all the layers into a single layer. I think this is the worst thing to do however I mention it since this solution is discussed see here for example. For this reason I will not develop this solution. If you are interested in finding alternatives, see also this question on SO.

Buildah

Using the Buildah tool to build images also offer a --squash option.

Wrap up

Squashing image at the build has several benefits

  • Lighter images: The squashing will optimize the image size. However if Dockerfiles are written with layer optimization in mind the size reduction will be negligible.
  • Cleaner Dockerfiles: This is obviously the main advantage since you can will write Dockerfile with readability in mind rather than focusing on layer optimization. Computers are better than human on these low level tasks.
  • Avoid accidental leaking: Sometimes it is not desirable to give access to intermediate layers. They may contain information you don’t want to share.

Other potential unconfirmed–for me–benefits

  • Transfer / Storage optimization: The reduction of the number of layers may have a positive impact on registries to host transfers (push / pull).
  • Faster builds: Could lead to faster builds since cache has not to be cleaned at each step?