My Precious Little Container (emphasis on ‘little’)

Sat, Jan 21, 2017 7-minute read
The Knot Worldwide Tech Team
The Knot Worldwide Tech Team

UPDATE: The code for the app described herein, including all of the Docker bits, is now available on GitHub.

We have a fully functioning container running in our infrastructure which is 2.0 MB. That is not a typo.

Do I have your attention?

Granted, the app it contains is pretty small in its scope to begin with; it doesn’t do much, but it does do something, something which we were in need of. (Basically a go-between from Amazon SNS to RabbitMQ, if you’re curious.)

I owe quite a lot of the ideas for this container and this blog post to a similar post by Tim Dysinger of FP Complete.

Here’s how we got it so small:

1. Use Haskell.

… OK, you don’t have to use Haskell. Don’t knock it ’til you’ve tried it, though. I’ll restrain myself from rhapsodizing the joy of coding in a purely functional language with inferred static types — there are other places to read about that. I’ll just say the app worked the first time I ran it. Regardless, there are a couple of characteristics of Haskell that come in handy for this purpose, which we’ll discuss in due course.

2. Use Alpine Linux.

That may seem obvious to those of you familiar with Alpine, but the reason is not exactly what you might expect. Alpine containers tend to be smaller than, say, Debian or CentOS containers, yes, but as we’ll see, this isn’t going to matter that much in the end. What Alpine does give us is generally smaller binaries and fewer and smaller dynamic library dependencies.

This is the Dockerfile for the initial Haskell/GHC build box, which includes several of the tools we’ll need. ``FROM alpine

Install GHC and immediate dependencies – these are HUGE, so we do # this in the first layers in hopes that we don't have to redo it

very often.

ENV ALPINE_GHC_VERSION 8.0
RUN echo \
"https://s3-us-west-2.amazonaws.com/alpine-ghc/$ALPINE_GHC_VERSION"\ >> /etc/apk/repositories
ADD \
https://raw.githubusercontent.com/mitchty/alpine-ghc/master/mitch.tishmack%40gmail.com-55881c97.rsa.pub \ /etc/apk/keys/mitch.tishmack@gmail.com-55881c97.rsa.pub
RUN apk update && apk add ghc

^– if you can help it, DO NOT CHANGE ABOVE THIS LINE –^

Install basic dev tools, gmp, zlib, and su-exec

RUN apk update && apk add \
alpine-sdk \
git \
ca-certificates \
gmp-dev \
zlib-dev \
su-exec

GRAB A RECENT BINARY OF STACK

ENV STACK_VERSION 1.1.2
RUN curl -Lo /usr/local/bin/stack \
https://s3.amazonaws.com/static-stack/stack-$STACK_VERSION-x86_64 \ && chmod 755 /usr/local/bin/stack
ENV INIT /sbin/dumb-init
ENTRYPOINT ["/sbin/dumb-init"]
ENV DUMB_INIT_VERSION 1.2.0
RUN curl -Lo $INIT \
https://github.com/Yelp/dumb-init/releases/download/v$DUMB_INIT_VERSION/dumb-init_${DUMB_INIT_VERSION}_amd64 \
&& chmod 755 $INIT
ENV UPX_VERSION=3.91
RUN curl -Lo /usr/local/bin/upx \
https://github.com/lalyos/docker-upx/releases/download/v$UPX_VERSION/upx \
&& chmod 755 /usr/local/bin/upx
RUN adduser -h /h -s /bin/false -S user
USER user
RUN mkdir -p /h/app
WORKDIR /h/app``

Note that:

  1. Again, this owes a great debt to the first Dockerfile on the FP Complete post.
  2. This container isn’t small. At all. In fact, it’s 3 orders of magnitude bigger than the container with which we’ll eventually end up.
  3. ghc and stack are Haskell-specific, but dumb-init and upx, which we’ll discuss, are not. One could imagine breaking the latter two out into a sort of template Dockerfile that could build on top of multiple base build images. (Or you could put them into a base image and build other build images on top of them. But compilers and such tend to be huge, and I’m impatient when it comes to Docker builds, so I prefer to put huge things in the very lowest layer of my images.)

Every Docker container, independent of size, should include Yelp’s [dumb-init](https://engineeringblog.yelp.com/2016/01/dumb-init-an-init-for-docker.html) (or a similar tool, but dumb-init is the smallest such of which I am aware). If you don’t know why, read their blog post, or Phusion’s, or FP Complete’s.

3. Build as small of a static binary as possible.

This is where Haskell shines for our purposes. Of course any compiled language (e.g. Go, C) is capable of creating a static binary; by the same token, many popular languages which are interpreted do not have this ability. That’s not necessarily a deal-breaker, but it does make it harder to track down and minimize your dependencies.

GHC also has a neat feature called -split-objs. Haskell packages are often composed of multiple modules; -split-objs breaks the modules apart from each other and only links in the modules you actually need. This helps a lot in this particular case, because this app uses networking and SSL which ends up technically bringing in a boatload of dependencies. By the time -split-objs is done with it, the size of the executable is reduced considerably from what it otherwise would be.

(When you use -split-objs, you get a warning telling you the feature is “experimental”; however, the feature has been mentioned in documentation for over 10 years, so it seems to be a pretty stable experiment. I also just discovered a similar feature called -split-sections which may be more efficient.)

4. No, I mean smaller than that even.

Enter UPX, the Ultimate Packer for eXecutables. This is totally a black box to me, I must confess, but apparently it’s a black box full of win. Running upx --best --ultra-brute

on our executable reduces its size by 79% (!!!)

Once again, these last couple of ideas I owe to the aforementioned FP Complete post; there are other ideas there as well, those ideas just didn’t happen to be very effective in our particular case.

5. Throw most of the image away.

I have a confession to make: an awful lot of this post is overkill. (I wanted to see just how small I could go.) Not this part, though. This is where we bring in the proverbial BFG: [strip-docker-image](http://blog.xebia.com/how-to-create-the-smallest-possible-docker-container-of-any-image/).

“strip” here is a laughable understatement. Picture if you asked me to “strip” a stolen car and when you came back all that was left was a spark plug.

Essentially, this script takes a Docker image and a list of files (and packages, but currently only Debian and RPM packages are supported, not Alpine packages) and recursively tracks down the dependencies of those files. Then it creates a new Docker image literally FROM scratch, with only those files.

It does take some trial and error to figure out what files you really need, but not as much as you might think. Here’s our Makefile rule: docker-strip: docker-compile docker/strip-docker-image/strip-docker-image \ -i $(NAME)-compiled:latest -t $(NAME)-stripped:latest -v \ -f /bin/http-to-rabbitmq \ -f /sbin/dumb-init \ -f /sbin/su-exec \ -f /etc/passwd \ -f /etc/group \ -f /etc/protocols \ -f /etc/ssl/certs/b204d74a.0 # <- cert SNS uses

As the comment on the last line implies, we need one specific SSL cert, because the app needs to make exactly one HTTPS request to SNS to confirm the SNS subscription. /etc/passwd and /etc/group are needed if you want to run your app as a non-root user (which you do, of course). And finally, did you know that for a Linux machine to access a TCP/IP network at all, it requires /etc/protocols? Neither did I, but I figured it out.

And that’s all. su-exec brings in one binary dependency, so eight total files. That is the entire contents of the container.

You could make something like this work with an interpreted language (strip-docker-image takes directories as well as files, so you could tell it to include, say, your entire node_modules directory) but it might be a bit more challenging.

(Note: there is an issue I found with strip-docker-image cleaning up after itself. I solved it locally by altering the script to set its work directory world-writable; that may or may not be a good solution for you depending on the security of your build environment.)

For completeness, here is the final Dockerfile after all this: FROM xolocalvendors/http-to-rabbitmq-stripped:latest MAINTAINER XO Group Team EXPOSE 3000 USER root ENTRYPOINT ["/sbin/dumb-init"] CMD ["/sbin/su-exec", "user", "/bin/http-to-rabbitmq"]

Not only is this container tiny, it’s also secure: say someone gains access to the container. And then they… do what exactly? There’s not even a shell in there. Pwned.

About the Author

Steven Collins is a former mathematician and a current DevOps Engineer for the Local Squads at XO Group. He is eagerly awaiting the inevitable Haskell revolution and plays a lot of Rock Band while he waits.Originally published at blog.eng.xogrp.com.