I got a request from a client to automate producing a monthly sales report for a web shop. The client wanted a printable pdf that would extract sales information for a given month and represent them with figures and tables. After specifying all the details I came into a conclusion that it would make sense to implement the report using R, which makes it easy to create good looking reports containing graphs and tables using R's Markdown and ggplot2 libraries.

Generating pdf:s with R Markdown requires first of all R and a bunch of other dependencies such as LaTeX for typesetting. Installation and maintenance of those installations in a web server is a bit tedious especially if you at some point need to migrate to another environment. An easy way to avoid these difficulties is to use a Docker containers that contains the needed software and are easy to maintain and move if a need be.

Using R Markdown in a Docker container

As mentioned above, using a Docker container saves me from having to installing R, LaTeX and other dependencies to the server itself. Alsi it is easy to deploy the same container to another web server if a need arises.

I'm going to use rocker/verse image from the rocker R images for creating the report. This image has the R Markdown and LaTeX systems pre-installed for compiling pdf reports. If you are new to Docker, here is the official documentation of how to get it installed to your system.

Assuming you have Docker installed, let's pull the rocker/verse image with R version 3.5.1:

docker pull rocker/verse:3.5.1

First I'm going to test the image to see what R packages I possibly need to add to the image. This image has also RStudio pre-installed in it, and it is configured to run the RStudio server by default. So for testing, running the following command will start RStudio in localhost:8787:

docker run --rm -p 8787:8787 rocker/verse:3.5.1

Once you type localhost:8787 to your browser, you will be asked for a username and password, which are both rstudio by default. To my happy surprise, all the packages that I needed were already installed so I could just go ahead running the pdf compilation.

Generating the pdf with Docker

Let's begin by compiling pdf reports with the rocker/verse:3.5.1 image.
I have provided a git repository that contains example files for creating pdf reports with R Markdown and Docker. You can clone it to you computer by running

git clone https://github.com/jlintusaari/R-docker-report.git

The repository contains an example_report.R that takes as input a csv file and generates a pdf report using the example_report.Rmd template below:


---
title: "Texas housing sales report"
author: "Matti Meikäläinen"
date: "`r paste(Sys.Date())`"
output: pdf_document
---

## Number of sales

The following chart shows the number of housing sales in three cities in Texas. Data is from the `txhousing` data set provided by the TAMU real estate center.

```{r echo=FALSE, message=FALSE, warning=FALSE}
ggplot(dt, aes(date, sales, color=city)) + geom_point() + geom_smooth()
```

The generated example pdf looks like this:

sales

The example csv data is stored in data.csv file and was created with the R/make_csv.R script. Assuming we are in the folder where the example_report.R file is located, we run the following command to compile the report with Docker:

docker run --rm -v $PWD:/report -w /report rocker/verse:3.5.1 \
 Rscript --vanilla example_report.R data.csv

The compilation works, but there are multiple things that could be improved:

  1. The default user in the container is root, causing the generated pdf to be owned by root
  2. The above command is rather long and requires setting e.g. the working directory with -w
  3. With my actual report, the latex system was missing some packages that were automatically installed
    but made the pdf compilation slow

So let's create a new Docker image based on rocker/verse:3.5.1 that is better configured for our purposes. The following Dockerfile starts the image creation from the rocker/verse:3.5.1 image and adds configurations to address the above issues:

FROM rocker/verse:3.5.1

# My sales report required an additional latex package called `eurosym`.
# RUN tlmgr install eurosym

# Set a user and the working directory
USER rstudio
WORKDIR /report

# Set the container to run `Rscript --vanilla ` by default
ENTRYPOINT ["/usr/local/bin/Rscript", "--vanilla"]

# Set the `example_report.R data.csv` as the default script to run with ENTRYPOINT's Rscript
CMD ["example_report.R", "data.csv"]

You can find this file from the docker folder in the git repository. Now assuming you are in the docker folder where the Dockerfile is located, you can create the new image with:

docker build -t report-maker .

After this we can use the report-maker image to make the pdf compilation both faster and more convenient. The following command will create the report (remember to remove the root owned example_report.pdf first if you haven't):

docker run --rm -v $PWD:/report report-maker

Of course our report-maker image can be used to run any kind of R scripts, but the name is descriptive for our purpose.

Production use

The web shop that this report will be used in is implemented with Ruby on Rails. Rails provides an inbuilt task system that can easily access the database of the web-application to retrieve relevant data for the report creation.

For automating the whole report generation process, the easiest way is to create a rails task that:

  1. Queries the database and saves the relevant information to a csv file
  2. Provides the generated csv file as an argument for the report-maker

After that one can setup a cron job that runs the task e.g. the first day of every month. Once generated, the report can be made downloadable for the client from the web application.