Home

Awesome

A Docker workflow to work reproducibly with papaja

<!-- Add logo of the papaja riding the Docker whale -->

This repository provides tools to interactively create dynamic, submission-ready, APA-style mansucripts in R with the R package papaja inside a Docker container.

Why use a Docker workflow?

Working inside a Docker container, rather than in your local environment, safeguards mid- to long-term computational reproducibility of the manuscript. Docker containers clearly define—and thereby conserve—the software environment used to write a manuscript and provide the means to conveniently share and recreate the environment. In doing so, containers prevent code rot (that is, code breaking because of changes to the software environment, such as R package updates) and contribute to the computational reproducibility of the manuscript. This is particularly important for workflows that involve nontrivial software dependencies,as in the case of rmarkdown and papaja (e.g., R and all R packages, LaTeX and LaTeX packages, pandoc and pandoc filters; see Figure 1.1 of the papaja manual). For a brief primer on containers see the supplementary material by Klein et al. (2018). For a concise hands-on introduction see the ROpenSci Docker tutorial; a more comprehensive introduction is available from the Docker project.

What are my options?

The tools provided here are geared towards working interactively with papaja inside a Docker container. There will be two options:

<!-- Install a specific version of pandoc instead? -->

Prerequisites

Install Docker and confirm that Docker is set up correctly the following in the shell

docker run hello-world

You should see the following output:

Hello from Docker.
This message shows that your installation appears to be working correctly.
...

Although only strictly necessary for Windows users who intend to use the RStudio option, I recommend installing Git. Windows users should add Git bash to the Windows context menu (this should be the default). Without Git installed, replace git web--browse in the last line of run_docker.sh by open (Mac OS or Linux).

The RStudio container

The container comes, among other things, with the latest version of Debian and user-specified versions of R, RStudio (including pandoc), TeX Live, and papaja (by default the latest versions are used).

How to use

  1. Place run_docker.sh and Dockerfile in your project directory
  2. Edit run_docker.sh and set variables at the top of the script (see below)
  3. Execute run_docker.sh in the shell
sh run_docker.sh

Note, Windows users may need to explicitly grant Docker access to the project directory (see Settings > Resources > File sharing).

  1. Work interactively with RStudio in the browser
  2. Stop the container
    • Results not saved to the project directory will be lost
    • Interactively installed packages (e.g., install.packages()) will be lost and should be added to DESCRIPTION
docker stop <PROJECT_NAME>

What run_docker.sh does

The script performs the following series of actions

  1. Build papaja base image named <BASE_NAME>
    • This image will be reused across projects as appropriate to save disk space and get started more quickly
  2. Build a project-specific image named <PROJECT_NAME>
    • Installs R package dependencies (as specified in DESCRIPTION) from MRAN snapshot (see below)
  3. Unless present, create files
    • DESCRIPTION
    • CITATION
    • CITATION.cff
  4. Run container
    • Share current shell working directory with container (this should be the project directory)
  5. Open a browser window with RStudio
  6. When the container is stopped, it is automatically removed

run_docker.sh options

The top section of the script defines several project-specific variables that define the software environment.

VariableDescription
PAPAJA_BASE_NAMEBase image name (must be lowercase)
PROJECT_NAMEProject image name (must be lowercase
R_RELEASER version to use.
RSTUDIO_VERSIONRStudio version to use. For available versions see here.
TEXLIVE_VERSIONYear of the TeX Live distribution to use (2000 or later)
PAPAJA_VERSIONpapaja version to use (Git commit, branch, or tag; see ?remotes::install_github).
NCPUSNumber of cores used to install R packages.

Specify an MRAN snapshot

By default, R packages are installed from the MRAN snapshot corresponding to the last day that the specified R version was the most recent release. A different MRAN snapshot can be specified in the calls to docker build:

docker build \
    --build-arg BUILD_DATE=<DATE> \
    ...

Install a specific version of an R package

To install a specific version of an R package, adapt the following as necessary and append it to the Dockerfile:

RUN Rscript -e "remotes::install_version('rlang', '0.4.7', repos = 'http://cran.us.r-project.org', upgrade = FALSE, Ncpus = $NCPUS)"

Use Git inside the container

The containers provided here come with Git (and SSH) installed. To seamlessly use Git inside the container, grant the container access to .gitconfig by adding the following to the docker run call:

docker run -d \
    --mount type=bind,src=<PATH TO .gitconfig>,dst=/home/rstudio/.gitconfig,readonly \
    ...

For example, share the current user's .gitconfig with

docker run -d \
    --mount type=bind,src="/$HOME/.gitconfig",dst=/home/rstudio/.gitconfig,readonly \
    ...

To use Git with SSH, grant the container access to the SSH credentials, by adding the following to the docker run call:

docker run -d \
    --mount type=bind,src=<PATH TO credentials>,dst=/home/rstudio/.ssh,readonly \
    ...

For example,

docker run -d \
    --mount type=bind,src="/$HOME/.ssh",dst=/home/rstudio/.ssh,readonly \
    ...