RMarkdown + Knitr
Introduction
Reports and papers will usually mix text and code/output. In the case of papers, only output will typically be given - tables and graphs. For reports detailing data analysis steps done (or in applied papers), code may be given, often followed directly by its output. The classic approach to mixing text with output is to produce and save the output, then manually enter it into the document. This classic approach has the benefit of allowing post-editing; for example a plot can be manipulated in Photoshop.
However, except in rare cases, a document where the text and code/output are generated separately has a number of shortcomings.
- By not including all code used to generate output, reproducibility is minimized.
- If code is include, the code and output can become easily desynced (e.g. a new plot is generated, but the code in the report is not updated).
- Updating output can be very time-consuming (e.g. a single line change in the code could require the updating of all values in tables).
- For malicious actors, results can be more easily doctored if the code is not explicitly included.
For different languages, there are different ways to combine these two aspects of analysis. Here we focus on knitr and Rmarkdown in R.
The Technologies
First, we will quickly discuss Sweave. Then we’ll introduce knitr and RMarkdown.
Sweave
Before knitr was created, the original software for combining embedding R code inside LaTeX documents was Sweave. Sweave files generally have the extension .Rnw and are complete LaTeX files with R “chunks” which should produce the desired output. For example, in the middle of a LaTeX file an author may wish to include a plot. They could use the following chunk:
<<>>=
data("airquality")
plot(airquality$Ozone ~ airquality$Wind)
@
Options can be passed into the <<>>=
header, for example we can name a chunk (to allow \ref
calls to it later) and define the size of the figure:
<<plot1, height=4, width=5>>=
Note that not all chunks have to produce output. If you are writing a publication, you may only want to include some key lines of code, but you need preprocessing code before they will work. You can suppress output with the argument results=hide
. Alternatively, you can include properly formatted and syntax highlighted R code that isn’t actually run by eval=FALSE
.
Once you have created your .Rnw file, the function Sweave
will process the file, executing the R chunks and replacing them with output as appropriate before creating the PDF document.
However, Sweave has fallen out of fashion lately with the advent of knitr. Knitr offers everything Sweave does, plus it extends it further. With Sweave, additional tools are required for advanced operations, whereas knitr supports more internally. In addition, Sweave is old and has some legacy issues connected to that, such as fragile handling of graphics.
knitr
knitr, created by Yihui Xie, was created as a replacement for Sweave. Early version of knitr were compatible with Rnw files, though more recent versions drop that compatibility1
The workhorse function in the knitr package is knit
. By passing it a LaTeX file with knitr-compatible R chunks (a .Rnw file2), it will execute the R code and generate a LaTeX file that can be passed to any LaTeX typesetter.
However, knitr is a more general engine than .Rnw to .pdf. Specifically, knitr works with RMarkdown and can output markdown files which are easy to convert to other formats.
RMarkdown and markdown
Markdown is a markup language (try not to get confused there) developed by John Gruber whose original goal was enabling authors “to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML)”3. RMarkdown is an extension to markdown which includes the ability to embed code chunks and several other extensions useful for writing technical reports.
The rmarkdown
package extends the knitr
package to, in one step, allow conversion between an RMarkdown file (.Rmd) into PDF, HTML, word document, amongst others. The main function is render
which can be used as follows:
library(rmarkdown)
library(knitr)
render("file.Rmd", "html_document")
render("file.Rmd", "pdf_document")
render("file.Rmd", "word_document")
See this Rstudio page for a list of all the output formats supported. Behind-the-scenes, the conversions are typically done using pandoc which we will not cover here.
RStudio
If you are using RStudio, it has extensive built in support for RMarkdown. Specifically, if you open a .Rmd file, the toolbar has a “Knit” button which will process it for you directly.
A quick introduction to RMarkdown
Before diving into RMarkdown, a quick intro to Markdown
Markdown
Markdown is a set of ways to mark text to enable formatting. For example, **bold**
creates bold text and *italic*
creates italics. (Note that _ can be used in place of *.). Since Markdown was originally created for publishing blogs, there’s various commands that won’t see much use in our work. Here’s a summary of commonly used commands.
Sections
A section header is proceeded by a number of #’s, where the # of hashes represents the level. For example,
# Introduction
## Literature Review
### Classic Texts
## Goals
would create:
1. Introduction
1.1. Literature Review
1.1.1 Classic Texts
1.2 Goals
(The default is un-numbered sections. See “Header” below to enable numbering.)
This is equivalent to \section{}
, \subsection{}
etc in LaTeX.
Lists
Enumerated lists are simply lines preceded by 1.
. For example,
1. First item
2. Second item.
1. Numbers don't matter.
1. Sub-list
5. Again, numbers don't matter.
1. Back to original ordering.
1. Start of a new list.
- First item
- Second item.
- Numbers don’t matter.
- Sub-list
- Again, numbers don’t matter.
- Back to original ordering.
For unordered lists, simply preface with * or - or +:
- first
+ second
* third
- Sub list
+ another
- first
- second
- third
- Sub list
- another
Tables
Tables are generated using ASCII.
| Name | Height | Weight |
|:-----|:------:|:------:|
| Bob | 6'1" | 195 |
| Sue | 5'4" | 134 |
Name | Height | Weight |
---|---|---|
Bob | 6’1“ | 195 |
Sue | 5’4“ | 134 |
Note:
- The second line (with the :’s and -’s) separates the header from the body. The :’s refer to alignment (
---
is left-aligned,--:
is right- and:-:
is centered). There must be at least three characters between the |’s, though more -’s is fine. - White space is ignored and only included to make editing easy. The first data row could have been
|Bob|6'1"|195|
RMarkdown
RMarkdown extends Markdown to easily embed R code by creating a chunk wrapped in triple backticks.
```{r}
n <- 10
x <- rnorm(n)
print(x)
```
will evaluate to
## [1] 0.1380621 -0.2821229 0.5275906 1.4143962 -0.5825694 1.1195383
## [7] 1.4676456 -1.9394153 -0.8218730 0.6258871
You can pass options to chunks. For example, {r, eval = FALSE}
will display R code but not execute it. Here’s some useful options:
echo = FALSE
: Display the output but suppress the code.results = 'asis'
: Prevents further processing of the output. Useful if a table created in R (e.g. using kable or xtable). Not generally needed otherwiseengine ='enginename'
: You can actually run other code. For example,engine = 'stata'
runs the code through Stata,engine = 'bash'
runs linux command line code.message = FALSE
andwarning = FALSE
: Suppresses R messages and warnings respectively. Useful when loading packages.
There’s one additional chunk option that deserves special mention, cache = TRUE
. If you have slow code, you may not want to run it everytime you make an update to the file. Telling RMarkdown to cache a chunk runs it once, and saves the output to display again in the future. Making any change to the code within the chunk requires it being re-cached. Note that the cacheing is a bit tempermental.
R code chunks “remember” previous chunks, so you can refer to objects again:
min(x)
## [1] -1.939415
In-line R code can be included by wrapping a call to r max(x)
in single backticks: 1.4676456. Excluding the r
part renders the code as code without processing it.
Equations
Markdown and RMarkdown support LaTeX math equations. Inline e.g. \(y = \beta x + \epsilon\)
: \(y = \beta x + \epsilon\). Stand alone equations similarly:
\[
y = \beta x + \epsilon
\]
\[ y = \beta x + \epsilon \]
The Header
When you create a new RMarkdown file in RStudio (or you see an example online), you’ll notice a header. For example, the header for this document might be:
---
title: "RMarkdown + Knitr"
author: "Josh Errickson"
output:
html_document:
toc_depth: 3
---
Most of this is pretty self-explanatory, and beyond filling in the title and author and choosing between html_document
or pdf_document
, requires no tweaking. However, if you do want to tweak settings (For example, in this chunk I display one additonal layer in the table of contents) or try any of the dozens of other non-standard output formats (e.g. these notes are in the readthedown
style) see RMarkdown formats from RStudio.
LaTeX code and HTML code
If you are creating a PDF output, you can use LaTeX code directly in the document instead of Markdown. Similarly, if you are creating a .html page, you can use HTML (or CSS) directly. This is useful when you reach the limits of the formatting of markdown.
Two caveats:
- Not everything is supported.
- Some LaTeX will render in web pages and some HTML with render in PDFs.