Writing R Functions

Functions

In R, every command you run is a function. Sometimes this is obvious, such as

lm(y ~ x)

where we use the function lm to fit a linear model. Other times the function is more obscured:

3 + 2
## [1] 5
`+`(3, 2)
## [1] 5

Even though the first line isn’t written like the lm call, it is still implicitly calling the `+` function.

We can examine the source code of any function by calling the function without parentheses:

library(stringr)
str_which
## function (string, pattern)
## {
##     which(str_detect(string, pattern))
## }
## <environment: namespace:stringr>

We see that str_which takes two arguments, string and pattern, and passes them into str_detect which is wrapped by which.

Some base R functions refer to the internal workings of R and we can’t see their actual code (at least not easily).

`+`
## function (e1, e2)  .Primitive("+")

Here, two arguments are passed into .Primitive, but we can’t see anything beyond that. On the other hand, consider the code for lm:

lm
## function (formula, data, subset, weights, na.action, method = "qr",
##     model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE,
##     contrasts = NULL, offset, ...)
## {
##     ret.x <- x
##     ret.y <- y
##     cl <- match.call()
##     mf <- match.call(expand.dots = FALSE)
##     m <- match(c("formula", "data", "subset", "weights", "na.action",
##         "offset"), names(mf), 0L)
##     mf <- mf[c(1L, m)]
##     mf$drop.unused.levels <- TRUE
##     mf[[1L]] <- quote(stats::model.frame)
##     mf <- eval(mf, parent.frame())
##     if (method == "model.frame")
##         return(mf)
##     else if (method != "qr")
##         warning(gettextf("method = '%s' is not supported. Using 'qr'",
##             method), domain = NA)
##     mt <- attr(mf, "terms")
##     y <- model.response(mf, "numeric")
##     w <- as.vector(model.weights(mf))
##     if (!is.null(w) && !is.numeric(w))
##         stop("'weights' must be a numeric vector")
##     offset <- as.vector(model.offset(mf))
##     if (!is.null(offset)) {
##         if (length(offset) != NROW(y))
##             stop(gettextf("number of offsets is %d, should equal %d (number of observations)",
##                 length(offset), NROW(y)), domain = NA)
##     }
##     if (is.empty.model(mt)) {
##         x <- NULL
##         z <- list(coefficients = if (is.matrix(y)) matrix(, 0,
##             3) else numeric(), residuals = y, fitted.values = 0 *
##             y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w !=
##             0) else if (is.matrix(y)) nrow(y) else length(y))
##         if (!is.null(offset)) {
##             z$fitted.values <- offset
##             z$residuals <- y - offset
##         }
##     }
##     else {
##         x <- model.matrix(mt, mf, contrasts)
##         z <- if (is.null(w))
##             lm.fit(x, y, offset = offset, singular.ok = singular.ok,
##                 ...)
##         else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok,
##             ...)
##     }
##     class(z) <- c(if (is.matrix(y)) "mlm", "lm")
##     z$na.action <- attr(mf, "na.action")
##     z$offset <- offset
##     z$contrasts <- attr(x, "contrasts")
##     z$xlevels <- .getXlevels(mt, mf)
##     z$call <- cl
##     z$terms <- mt
##     if (model)
##         z$model <- mf
##     if (ret.x)
##         z$x <- x
##     if (ret.y)
##         z$y <- y
##     if (!qr)
##         z$qr <- NULL
##     z
## }
## <bytecode: 0x7f9676ddf260>
## <environment: namespace:stats>

That does a whole lot!

Basic syntax

The most basic form of a function is :

myFunc <- function(<arguments>) {
  <internal code>
  <return statement>
}

Since everything in R is an object, so are functions, so they are assigned to names the same as vectors or matrices. A function can take arguments or it can take none at all. The internal code is simply whatever the function does, and the last line dictates what the function gives back.

Arguments

The arguments are a comma separated list of names of objects which can be passed to a function. For example, when we look str_which, we saw two arguments, string and pattern. Note that there is no restriction that string has the be a string or that pattern has to be a regular expression; its up to the author of the function to detect when an incorrect object is passed and give a useful error message. (If you’ve ever gotten a completely obscure error message, it is because the author of the primary function does not do a good job of checking the input!)

If a function takes arguments, then calls to that function must take exactly that many objects in. For example,

myFunc <- function(x, y) {
  x - y
}
myFunc(1, 3)
## [1] -2
myFunc(1)
## Error in myFunc(1): argument "y" is missing, with no default
myFunc(1, 3, 5)
## Error in myFunc(1, 3, 5): unused argument (5)

Notice that I did not specify which was x and which was y. R uses the arguments in the order they are received unless they are named.

myFunc(3, 1)
## [1] 2
myFunc(y = 3, x = 1)
## [1] -2

However, there are two ways around that restriction:

Default arguments

A function can have “default” arguments, that is, values of the arguments that are used if the call to the function does not override them. For example,

myFunc <- function(x, y = 6) {
  x - y
}
myFunc(1, 3)
## [1] -2
myFunc(1)
## [1] -5
myFunc(y = 1)
## Error in myFunc(y = 1): argument "x" is missing, with no default

In the first call, the 2nd argument overwrites the default y = 6. In the second call, I pass nothing for y, so the default 6 is used. In the last, I overwrite the default y, but don’t pass any x!

The triple dots

A special argument, ..., tells R to accept any number of named arguments beyond those specified.

myFunc(x = 1, y = 2, z =3)
## Error in myFunc(x = 1, y = 2, z = 3): unused argument (z = 3)
myFunc <- function(x, y = 6, ...) {
  x - y
}
myFunc(x = 1, y = 2, z = 3)
## [1] -1

These are most useful if your function calls another function which takes more arguments than your function, but you don’t want to bother cluttering yours. For example, table excludes NA entries. Let’s write a wrapper around table that passes useNA = 'ifany' by default. However, table has a bunch of other arguments I don’t want to have to deal with.

table1 <- function(...) {
  table(useNA = 'ifany', ...)
}
table(c(1,1,NA))
##
## 1
## 2
table(c(1, 1, NA), dnn = "test")
## test
## 1
## 2

table1(c(1,1,NA))
##
##    1 <NA>
##    2    1
table1(c(1, 1, NA), dnn = "test")
## test
##    1 <NA>
##    2    1

So basically table1 calls table with the same exact arguments, but overwrites table’s default argument for useNA.

Return

The output of the last line of code in the function is what the function returns. I’ve been implicitly using this earlier; myFunc returns x - y and table1 returns the results from table.

You can sometimes mess this up:

myFunc <- function(x, y) {
  z <- x + y
}
myFunc(2, 3)

Nothing is returned here because the assignment to z doesn’t return anything! Either add a second line that’s just z, or just don’t assign it.

The return function can be used to return early1. This is useful in conditional statements:

myFunc <- function(x) {
  if (x > 0) {
    return("Positive")
  } else {
    return("Not positive!")
  }
  return("Can never get here!")
}

That last return will never be hit - x is either greater than or less than 0 (though some of the results may be odd, consider myFunc(data.frame(x = 1, y = 2))), and in each case, we return something, so the last will never be hit.

Scoping and side effects

Scope is an important and really complicated topic for functions - the short version is this:

  • Any object created inside a function (including arguments passed to it) are destroyed when the functions returns.
  • If an object doesn’t exist inside the function, R will look for an object outside the function.
    • R functions should have no “side effects” - never modify an external object.

For example,

x <- 4
myFunc <- function() {
  print(x)
  x <- 6
  print(x)
}
myFunc()
## [1] 4
## [1] 6
x
## [1] 4

Even though we reassign x inside myFunc, it does not change the external version. The first print call inside the function refers to the x in the global environment, and the second (after assigning a local copy of x) refers to the x in the function.2

Important summary: Anything that needs to be used inside a function should be passed in as an argument. Anything that needs to be saved from a function should be returned. Avoid side effects!

Returning lists

It can be useful to return a list if you have multiple objects to return. E.g.

myFunc <- function(x, y) {
  a <- x + y
  b <- x - y
  c <- x * y
  d <- x / y
  return(list(sum = a,
              diff = b,
              prod = c,
              quot = d))
}
f <- myFunc(4, 7)
f
## $sum
## [1] 11
##
## $diff
## [1] -3
##
## $prod
## [1] 28
##
## $quot
## [1] 0.5714286
f[1]
## $sum
## [1] 11
f[[1]]
## [1] 11

  1. Making the last line return is optional.

  2. I may regret teaching this, but there is a way to assign things outside of a function, using the double headed arrow, x <<- 6. This should not be commonly used, only if absolutely needed it. I can only recall using it (appropriately) once in my 10+ years of using R.

Josh Errickson