Getting Started with rmake

Introduction

R is a mature scripting language for statistical computations and data processing. An important advantage of R is that it allows writing repeatable statistical analyses by programming all steps of data processing in scripts, which allows re-executing the whole process after any change in data or processing steps.

There are several useful packages for R to obtain repeatability of statistical computations, such as knitr and rmarkdown. These tools allow writing R scripts that generate reports combining text with tables and figures generated from data.

However, if analyses grow in complexity, manual re-execution of the whole process may become tedious, prone to errors, and very demanding computationally. Complex analyses typically involve:

Many pre-processing steps on large datasets
Repetitive execution of commands differing only in parameters
Production of multiple output files in various formats

It is inefficient to re-run all pre-processing steps repeatedly to refresh the final report after any change. A caching mechanism provided by knitr is helpful but limited to a single report. Splitting complex analyses into several parts and saving intermediate results into files is rational, but brings another challenge: management of dependencies between inputs, outputs, and underlying scripts.

This is where Make comes in. Make is a tool that controls the generation of files from source data and script files by reading dependencies from a Makefile and comparing timestamps to determine which files need to be refreshed.

The rmake package provides tools for easy generation of Makefiles for statistical and data manipulation tasks in R.

Key Features

The main features of rmake are:

Use of the well-known Make tool
Easy definitions of file dependencies in the R language
High flexibility through parameterized execution and programmatic rule generation
Simple, short code thanks to the %>>% pipeline operator and templating
Support for R scripts and R markdown files
Extensibility for user-defined rule types
Isolated and parallel execution via Make’s parallel processing
Support for all platforms: Unix (Linux), MacOS, Windows, and Solaris
Compatibility with RStudio

Why Use rmake?

R allows the development of repeatable statistical analyses. However, when analyses grow in complexity, manual re-execution on any change may become tedious and error-prone. Make is a widely accepted tool for managing the generation of resulting files from source data and script files. rmake makes it easy to generate Makefiles for R analytical projects.

Installation

To install rmake from CRAN:

install.packages("rmake")

Alternatively, install the development version from GitHub:

install.packages("devtools")
devtools::install_github("beerda/rmake")

Load the package:

library(rmake)

Prerequisites

System Requirements

R: Version 3.5.0 or higher
Make: GNU Make or compatible make tool
- On Linux/macOS: Usually pre-installed
- On Windows: Install Rtools (which includes make)

Environment Variables

The package requires the R_HOME environment variable to be properly set. This variable indicates the directory where R is installed and is automatically set when running from within R or RStudio.

When is R_HOME needed?

When running make from the command line (outside of R), you may need to set R_HOME manually.

Finding R_HOME

To find the correct value for your system, run this in R:

R.home()

You can also check the current values of R environment variables:

Sys.getenv("R_HOME")

Setting R_HOME

On Linux/macOS:

export R_HOME=/usr/lib/R  # Use the path from R.home()

On Windows (Command Prompt):

set R_HOME=C:\Program Files\R\R-4.3.0

On Windows (PowerShell):

$env:R_HOME = "C:\Program Files\R\R-4.3.0"

For permanent setup, add the export commands to your shell configuration file (.bashrc, .zshrc, etc. on Unix-like systems, or system environment variables on Windows).

For more information on R environment variables, see the official R documentation.

Project Initialization

Creating Skeleton Files

To start a new project with rmake:

library(rmake)
rmakeSkeleton(".")

This creates two files: - Makefile.R - R script to generate the Makefile - Makefile - The generated Makefile (initially minimal)

The initial Makefile.R contains:

library(rmake)
job <- list()
makefile(job, "Makefile")

Basic Example

Let’s walk through a simple example. Suppose we have: - data.csv - input data file - script.R - R script to process the data - Output: sums.csv - computed results

Step 1: Create the Data File

Create data.csv:

ID,V1,V2
a,2,8
b,9,1
c,3,3

Step 2: Create the Processing Script

Create script.R:

d <- read.csv("data.csv")
sums <- data.frame(ID = "sum",
                   V1 = sum(d$V1),
                   V2 = sum(d$V2))
write.csv(sums, "sums.csv", row.names = FALSE)

Step 3: Define the Build Rule

Edit Makefile.R:

library(rmake)
job <- list(rRule(target = "sums.csv", 
                  script = "script.R", 
                  depends = "data.csv"))
makefile(job, "Makefile")

Step 4: Run the Build

Execute make:

make()

Make will: 1. Regenerate Makefile (if Makefile.R changed) 2. Execute script.R to create sums.csv

Subsequent calls to make() will do nothing unless files change.

Using the Pipe Operator

The %>>% pipe operator makes rule definitions more readable:

library(rmake)
job <- "data.csv" %>>% 
  rRule("script.R") %>>% 
  "sums.csv"
makefile(job, "Makefile")

This is equivalent to the previous example but more concise.

Adding a Markdown Report

Let’s extend our example to create a PDF report. Create analysis.Rmd:

---
title: "Analysis"
output: pdf_document
---

# Sums of data rows

```{r, echo=FALSE, results='asis'}
sums <- read.csv('sums.csv')
knitr::kable(sums)
```

Update Makefile.R:

library(rmake)
job <- list(
  rRule(target = "sums.csv", script = "script.R", depends = "data.csv"),
  markdownRule(target = "analysis.pdf", script = "analysis.Rmd", 
               depends = "sums.csv")
)
makefile(job, "Makefile")

Or using pipes:

library(rmake)
job <- "data.csv" %>>% 
  rRule("script.R") %>>% 
  "sums.csv" %>>% 
  markdownRule("analysis.Rmd") %>>% 
  "analysis.pdf"
makefile(job, "Makefile")

Run make again:

make()

Running Make

From R

# Run all tasks
make()

# Run specific task
make("all")

# Clean generated files
make("clean")

# Parallel execution (8 jobs)
make("-j8")

From Command Line

make          # Run all tasks
make clean    # Clean generated files
make -j8      # Parallel execution

From RStudio

Go to Build > Configure Build Tools
Set Project build tools to Makefile
Use Build All button

Visualizing Dependencies

Visualize the dependency graph:

visualize(job, legend = FALSE)

This creates an interactive graph showing: - Squares: Data files - Diamonds: Script files
- Ovals: Rules - Arrows: Dependencies

Multiple Dependencies

Handle complex dependencies:

chain1 <- "data1.csv" %>>% rRule("preprocess1.R") %>>% "intermed1.rds"
chain2 <- "data2.csv" %>>% rRule("preprocess2.R") %>>% "intermed2.rds"
chain3 <- c("intermed1.rds", "intermed2.rds") %>>% 
  rRule("merge.R") %>>% "merged.rds" %>>% 
  markdownRule("report.Rmd") %>>% "report.pdf"

job <- c(chain1, chain2, chain3)

Alternatively, you can define all chains directly without intermediate variables:

job <- c(
  "data1.csv" %>>% rRule("preprocess1.R") %>>% "intermed1.rds",
  "data2.csv" %>>% rRule("preprocess2.R") %>>% "intermed2.rds",
  c("intermed1.rds", "intermed2.rds") %>>% 
    rRule("merge.R") %>>% "merged.rds" %>>% 
    markdownRule("report.Rmd") %>>% "report.pdf"
)

Rule Types

rmake provides several pre-defined rule types:

rRule(): Execute R scripts
markdownRule(): Render R Markdown documents
knitrRule(): Process knitr documents
copyRule(): Copy files
offlineRule(): Manual tasks with reminders

For detailed documentation on all rule types including depRule(), subdirRule(), and custom rules, see the Build Rules vignette.

Next Steps

For more information on specific topics, see these vignettes:

rmake Project Management: Learn about project initialization, running builds, cleaning, and parallel execution
Build Rules: Comprehensive reference for all rule types (rRule, markdownRule, knitrRule, copyRule, depRule, subdirRule, offlineRule)
Tasks and Templates: Advanced features including tasks, parameterized execution, and rule templates

Summary

Key takeaways: 1. Use rmakeSkeleton() to initialize projects 2. Define rules in Makefile.R 3. Use %>>% for readable rule chains 4. Run make() to execute the build process 5. Use visualize() to understand dependencies