Docker has been around for a while now and is currently extremely popular. In this post, I’ll explain my subjective experience with Docker. Disclaimer: I’m not a Docker guru.

First, the Sales Pitch

For the uninitiated, here’s a good intro, explaining how Docker works, and what its merits are:

To summarize, Docker:

  1. Offers a portable, lightweight method to ship your application
  2. Keeps production and development environments virtually identical (pun intended)
  3. … and does much more, but let’s focus on the above two things for now.

Before we move on, here is some of the relevant vernacular:

            build          run
Dockerfile -------> Image -----> Container   \
                                 Container 2 | compose
                                 Container 3 +---------> Deployment
                                 ...         |
                                 Container 4 /

So you start with a Dockerfile, and build that into an Image. When you run an Image, Docker creates a Container. If you like, you can compose multiple Containers into a Deployment.

Docker achieves the “portable and lightweight” part using containerization. When containers run on a host, they share much of the host’s stack, meaning the containers need to contain only the bare minimum to run the contained application. At the same time, from the point of view of the contained application, the world looks as if it’s running on an ordinary machine. The portable yet heavyweight alternative is visibly less attractive: shipping an entire virtual machine image, including the OS and everything contained inside.

Docker achieves the “identical environments” part by allowing images to be run anywhere. Because they’re lightweight and portable, sharing images is easy, and they are fast to start. The benefit of having identical development and production environments is obvious: you no longer trip over issues like “why does it work on production, but not on development!?” Being confident in that your latest changes won’t crash on production is a huge win.

… and now, welcome to my world

Warning: read the below as more of an admission of guilt, not what my stance on best practices are :)

Most of the libraries and applications I write tend to be relatively small, in the range of 10-20 thousand line of Python code. They tend to be used internally as opposed to publicly. The teams working on these applications are also small: usually just one or two people. For applications, we use Django as the Web framework, and serve it through Apache via mod_wsgi. We store most of our data in MongoDB, and a tiny part of it goes into SQLite3 because that’s easiest to set up with Django. Architecturally, most of the applications are monoliths. We make an exception for components that really need to scale well: these are service-oriented. The apps tend to get deployed on a single AWS EC2 instance, with the exception of the service-like components, which scale into the hundreds/thousands of instances when needed.

The apps get unit-tested fairly heavily as part of our continuous integration suite that includes Buildbot and CircleCI. We also practice the “ghetto” version of continuous delivery: each build, regardless of whether it passed CI or not, gets pushed into a development deployment. The development deployment is a toy replica of the production deployment (they are completely independent). We have scheduled tests running on RunScope that test critical functionality on both production and development deployments. Whenever someone breaks the build, they also break the development deployment, triggering alarms from the CI suite as well as RunScope, and they scramble to fix things (or roll them back). This happens relatively rarely. Every week or so we do a “release”: deploy the newest version of the app to the production deployment.

Currently, we keep all of our configuration as machine images (AMIs). These get built from bash scripts — you can call them a poor man’s version of a Dockerfile. These are the “portable but heavyweight” alternative that I described above. The heavyweight part doesn’t really bother us because we never ship these things: we just spin up another EC2 instance whenever necessary. Whenever we need to update our apps, developers python setup.py sdist them into a tarball and do a glorified pip install --upgrade on the deployment side (a running EC2 instance). In the unlikely event that things don’t work out, we can just downgrade back to the previous version. I can’t remember the last time that happened.

For our small team, the above set-up has worked fairly well. DevOps issues occupy a fairly small fraction of our time — we spend most of the time working on the actual code: fixing bugs and adding new features.

Last but not least, I do my own DevOps. This is on top of architecture, design, implementation, testing and other fun things. I try to have a life on top of this as well, so I have to pick my battles.

Can We Use Docker Here?

The answer, of course, is “yes we can”. But should we?

The two benefits of Docker above, lightweight-ness/portability and identical dev/production environments, are awesome: there’s no doubt about that. They don’t come for free, though: we’d have to do a bit of work to get there.

For example, one of the distinguishing features of a container, as opposed to a VM, is that it’s really supposed to only run one process. This means that you need to create a new image for each process running in your application. For us, that means:

  1. MongoDB (just the database engine, not the data)
  2. Apache, mod_wsgi and Django
  3. Celery (our apps do a lot of background processing)
  4. … and anything else that requires its own process

On top of that, we’d need to think about where to store the data, because it shouldn’t go inside the image/container for various reasons. First, if you have hundreds of GB of data, keeping it inside the image defeats the purpose of having these lightweight containers in the first place. Second, if your container crashes, you risk having your database left in an unhappy state. There may be other reasons, but the two above are the main ones I can think of. Docker does offer a solution to the problem — Volumes — so this issue is more of a hurdle, not a showstopper.

So, once we’ve worked out your images and your volumes, we need to tie them together into a single deployment using “docker compose”. Once we’ve achieved that, we can enjoy the benefits of containerization. In theory.

Going back to the original question: should we use Docker here? For the time being, I think the answer is no. While the benefits are great, we’re not really experiencing the problems that Docker can solve right now. On top of that, the amount of effort required to reach those benefits is significant.

Docker-free workplace