RMarkdown + Knitr

Thank you for visiting my notes. I've noticed a large uptick in views lately which is terrific! However, these notes are quite old (2017) and some details may be outdated. I have a more recent class (2023) which covers much of the same information and is hopefully more up-to-date.

If you are using these notes, I'd love to hear from you and how you are using them! Please reach out to jerrick (at) umich (dot) edu.

Introduction

Reports and papers will usually mix text and code/output. In the case of papers, only output will typically be given - tables and graphs. For reports detailing data analysis steps done (or in applied papers), code may be given, often followed directly by its output. The classic approach to mixing text with output is to produce and save the output, then manually enter it into the document. This classic approach has the benefit of allowing post-editing; for example a plot can be manipulated in Photoshop.

However, except in rare cases, a document where the text and code/output are generated separately has a number of shortcomings.

  • By not including all code used to generate output, reproducibility is minimized.
  • If code is include, the code and output can become easily desynced (e.g. a new plot is generated, but the code in the report is not updated).
  • Updating output can be very time-consuming (e.g. a single line change in the code could require the updating of all values in tables).
  • For malicious actors, results can be more easily doctored if the code is not explicitly included.

For different languages, there are different ways to combine these two aspects of analysis. Here we focus on knitr and Rmarkdown in R.

The Technologies

First, we will quickly discuss Sweave. Then we’ll introduce knitr and RMarkdown.

Sweave

Before knitr was created, the original software for combining embedding R code inside LaTeX documents was Sweave. Sweave files generally have the extension .Rnw and are complete LaTeX files with R “chunks” which should produce the desired output. For example, in the middle of a LaTeX file an author may wish to include a plot. They could use the following chunk:

<<>>=
data("airquality")
plot(airquality$Ozone ~ airquality$Wind)
@

Options can be passed into the <<>>= header, for example we can name a chunk (to allow \ref calls to it later) and define the size of the figure:

<<plot1, height=4, width=5>>=

Note that not all chunks have to produce output. If you are writing a publication, you may only want to include some key lines of code, but you need preprocessing code before they will work. You can suppress output with the argument results=hide. Alternatively, you can include properly formatted and syntax highlighted R code that isn’t actually run by eval=FALSE.

Once you have created your .Rnw file, the function Sweave will process the file, executing the R chunks and replacing them with output as appropriate before creating the PDF document.

However, Sweave has fallen out of fashion lately with the advent of knitr. Knitr offers everything Sweave does, plus it extends it further. With Sweave, additional tools are required for advanced operations, whereas knitr supports more internally. In addition, Sweave is old and has some legacy issues connected to that, such as fragile handling of graphics.

knitr

knitr, created by Yihui Xie, was created as a replacement for Sweave. Early version of knitr were compatible with Rnw files, though more recent versions drop that compatibility1

The workhorse function in the knitr package is knit. By passing it a LaTeX file with knitr-compatible R chunks (a .Rnw file2), it will execute the R code and generate a LaTeX file that can be passed to any LaTeX typesetter.

However, knitr is a more general engine than .Rnw to .pdf. Specifically, knitr works with RMarkdown and can output markdown files which are easy to convert to other formats.

RMarkdown and markdown

Markdown is a markup language (try not to get confused there) developed by John Gruber whose original goal was enabling authors “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”3. RMarkdown is an extension to markdown which includes the ability to embed code chunks and several other extensions useful for writing technical reports.

The rmarkdown package extends the knitr package to, in one step, allow conversion between an RMarkdown file (.Rmd) into PDF, HTML, word document, amongst others. The main function is render which can be used as follows:

library(rmarkdown)
library(knitr)
render("file.Rmd", "html_document")
render("file.Rmd", "pdf_document")
render("file.Rmd", "word_document")

See this Rstudio page for a list of all the output formats supported. Behind-the-scenes, the conversions are typically done using pandoc which we will not cover here.

RStudio

If you are using RStudio, it has extensive built in support for RMarkdown. Specifically, if you open a .Rmd file, the toolbar has a “Knit” button which will process it for you directly.

A quick introduction to RMarkdown

Before diving into RMarkdown, a quick intro to Markdown

Markdown

Markdown is a set of ways to mark text to enable formatting. For example, **bold** creates bold text and *italic* creates italics. (Note that _ can be used in place of *.). Since Markdown was originally created for publishing blogs, there’s various commands that won’t see much use in our work. Here’s a summary of commonly used commands.

Sections

A section header is proceeded by a number of #’s, where the # of hashes represents the level. For example,

# Introduction

## Literature Review

### Classic Texts

## Goals

would create:

1. Introduction
  1.1. Literature Review
    1.1.1 Classic Texts
  1.2 Goals

(The default is un-numbered sections. See “Header” below to enable numbering.)

This is equivalent to \section{}, \subsection{} etc in LaTeX.

Lists

Enumerated lists are simply lines preceded by 1.. For example,

1. First item
2. Second item.
1. Numbers don't matter.
  1. Sub-list
  5. Again, numbers don't matter.
1. Back to original ordering.

1. Start of a new list.
  1. First item
  2. Second item.
  3. Numbers don’t matter.
    1. Sub-list
    2. Again, numbers don’t matter.
  4. Back to original ordering.

For unordered lists, simply preface with * or - or +:

- first
+ second
* third
    - Sub list
    + another
  • first
  • second
  • third
    • Sub list
    • another

Tables

Tables are generated using ASCII.

| Name | Height | Weight |
|:-----|:------:|:------:|
| Bob  | 6'1"   | 195    |
| Sue  | 5'4"   | 134    |
Name Height Weight
Bob 6’1“ 195
Sue 5’4“ 134

Note:

  • The second line (with the :’s and -’s) separates the header from the body. The :’s refer to alignment (--- is left-aligned, --: is right- and :-: is centered). There must be at least three characters between the |’s, though more -’s is fine.
  • White space is ignored and only included to make editing easy. The first data row could have been |Bob|6'1"|195|

RMarkdown

RMarkdown extends Markdown to easily embed R code by creating a chunk wrapped in triple backticks.

```{r}
n <- 10
x <- rnorm(n)
print(x)
```

will evaluate to

##  [1]  0.1380621 -0.2821229  0.5275906  1.4143962 -0.5825694  1.1195383
##  [7]  1.4676456 -1.9394153 -0.8218730  0.6258871

You can pass options to chunks. For example, {r, eval = FALSE} will display R code but not execute it. Here’s some useful options:

  • echo = FALSE: Display the output but suppress the code.
  • results = 'asis': Prevents further processing of the output. Useful if a table created in R (e.g. using kable or xtable). Not generally needed otherwise
  • engine ='enginename': You can actually run other code. For example, engine = 'stata' runs the code through Stata, engine = 'bash' runs linux command line code.
  • message = FALSE and warning = FALSE: Suppresses R messages and warnings respectively. Useful when loading packages.

There’s one additional chunk option that deserves special mention, cache = TRUE. If you have slow code, you may not want to run it everytime you make an update to the file. Telling RMarkdown to cache a chunk runs it once, and saves the output to display again in the future. Making any change to the code within the chunk requires it being re-cached. Note that the cacheing is a bit tempermental.

R code chunks “remember” previous chunks, so you can refer to objects again:

min(x)
## [1] -1.939415

In-line R code can be included by wrapping a call to r max(x) in single backticks: 1.4676456. Excluding the r part renders the code as code without processing it.

Equations

Markdown and RMarkdown support LaTeX math equations. Inline e.g. \(y = \beta x + \epsilon\): \(y = \beta x + \epsilon\). Stand alone equations similarly:

\[
  y = \beta x + \epsilon
\]

\[ y = \beta x + \epsilon \]

The Header

When you create a new RMarkdown file in RStudio (or you see an example online), you’ll notice a header. For example, the header for this document might be:

---
title: "RMarkdown + Knitr"
author: "Josh Errickson"
output:
  html_document:
    toc_depth: 3
---

Most of this is pretty self-explanatory, and beyond filling in the title and author and choosing between html_document or pdf_document, requires no tweaking. However, if you do want to tweak settings (For example, in this chunk I display one additonal layer in the table of contents) or try any of the dozens of other non-standard output formats (e.g. these notes are in the readthedown style) see RMarkdown formats from RStudio.

LaTeX code and HTML code

If you are creating a PDF output, you can use LaTeX code directly in the document instead of Markdown. Similarly, if you are creating a .html page, you can use HTML (or CSS) directly. This is useful when you reach the limits of the formatting of markdown.

Two caveats:

  • Not everything is supported.
  • Some LaTeX will render in web pages and some HTML with render in PDFs.

  1. There exists a function Sweave2knitr in the knitr package for converting Sweave files into knitr compliant files.

  2. See here for an elaborate write-up of the differences between Sweave-compatible .Rnw vs knitr-compatible .Rnw

  3. https://daringfireball.net/projects/markdown/

Josh Errickson