Home

Awesome

Containers for DAaaS

Containers to be used for general purpose Data Science.

Docker Basics and Good Practices

The information below covers some recommended good practices with Docker and provides some links for further reading.

Useful References and Background Reading:

Start from a minimal official docker image

Recommendation: start from a small official base image and build layers ontop of it.

Example steps to add a new image to aaw-contrib-containers

  1. Scan image using a tool like Trivy $ ./trivy image hub.docker.com/yourdockerimage:latest (*best practice*)
  2. Create new branch of StatCan/aaw-contrib-contains
  3. Commit docker file and publish new branch
  4. Allow repo CI, ACR and Artifactory scan to complete
  5. Create pull request (ping us in our Slack space https://statcan-aaw.slack.com #general)

Scan your containers for vulnerabilities

Recommendation: scan images for vulnerabilities, and consider incorporating this process as a job in your CI/CD pipeline.

Order stanzas from least likely to change to most likely to change

Recommendation: to take advantage of layer caching and reduce your image build times, it is recommended to put expensive stanzas that don't change often early in your Dockerfile, and put lighter stanzas that do change often at the end. This allows you to avoid rebuilding expensive unchanging layers every time you need to rebuild your image after making a small change (e.g. installing a new Python package).

Handling python packages in Dockerfiles

Do setup, execute, and cleanup in a single stanza

Good Example:

RUN apt-get -y update && \
    apt-get install git && \
    rm -rf /var/lib/apt/lists/*

Bad Example:

RUN apt-get -y update
RUN apt-get install git
RUN rm -rf /var/lib/apt/lists/*

Recommendation: Keep your setup, execute, and cleanup work in a single stanza so that an image layer of minimal size gets created instead of multiple larger layers.

Execute containers as non-root user

Recommendation: always set the user to a non-root user at the end of your Dockerfile by default using the USER directive in your Dockerfile.

Do a multi-stage build

There are a couple of key reasons to building an image in this way.

Recommendation: consider using a multistage build to reduce disk footprint, and, if applicable, consider copying your applications/build artifacts to a distroless image to improve security.

Avoid cache-busting your Dockerfiles

Set build-time variables

Recommendation: where applicable, consider using build-time arguments by using ARG and --build-arg to declare and override build-time arguments.

Lint your Dockerfile

Recommendation: lint your Dockerfile to improve code quality.

Understand when not to use Alpine-based images

Recommendation: Depending on your project requirements, if your project contains many Python dependencies, you may want to consider using a Docker image based on Debian or another Linux distribution other than Alpine. Note that this is not an absolute recommendation, as there are other considerations (e.g. security) that can impact the decision of which base image to use. However, you should be aware of the above-mentioned implication of using Alpine images for projects with significant Python dependencies.

Background reading for those interested: