# A base object with NULL class attribute
attr(1:5, 'class')
NULL
# An object of class factor.
attr(factor(1:5), 'class')
[1] "factor"
When we attach values to names in an R environment we generally refer to the name and value collectively as an ‘object’. We should, however, distinguish between base objects and object-oriented objects where the latter are those with a non-null class attribute. Compare the following:
# A base object with NULL class attribute
attr(1:5, 'class')
NULL
# An object of class factor.
attr(factor(1:5), 'class')
[1] "factor"
Object oriented programming is a programming paradigm built around the notions of classes, methods, and, of course, objects. There are a wide variety of object oriented languages and R has (at least) three object oriented (OO) systems you should be aware of:
We will focus on the S3 and S4 systems which predominate in R.
Before digging into R’s OO systems it will be helpful to define a few terms.
An object’s class defines its structure and behavior using attributes as well as its relationship with other classes.
Methods are functions that have definitions which depend on the class of an object.
Classes are often organized into hierarchies with “child” classes defined more strictly than its “parents”. Child classes often inherit from their parents meaning a parent’s structure or methods serve as a default when not explicilty defined for the child. More formally, these are known as superclasses and subclasses.
The S3 system in R is based on the idea of generic functions. The core idea is that a generic function is used to dispatch a class-specific method taken from an object passed to it. Some common S3 generic functions in R inlcude, print
, summary
, plot
, mean
, head
, tail
, and str
. Consider summary
. The functions operates quite differently depending on its input:
<- 1:3
vec <- matrix(1:4, nrow = 2)
mat <- lm(mpg ~ wt, data = mtcars)
mod summary(vec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.0 1.5 2.0 2.0 2.5 3.0
summary(mat)
V1 V2
Min. :1.00 Min. :3.00
1st Qu.:1.25 1st Qu.:3.25
Median :1.50 Median :3.50
Mean :1.50 Mean :3.50
3rd Qu.:1.75 3rd Qu.:3.75
Max. :2.00 Max. :4.00
summary(mod)
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
How did summary
know what the right thing to do was? One way it could work would be a series of conditionals.
<- function(object) {
summary if (is(object, "vector")) {
...else if (is(object, "matrix")) {
}
...else if (is(object, "lm")) {
}
...
} }
This isn’t tenable for two reasons:
Instead, R (and most object-oriented languages) use generics and dispatching. Let’s look at the code for summary
:
summary
function (object, ...)
UseMethod("summary")
<bytecode: 0x13066d608>
<environment: namespace:base>
When UseMethod()
is called R searches for an S3 method based on the name of the generic function and the class of its first argument. The specific function it looks for follows the naming pattern function.class()
– this is why it is advisable not to use dots when naming functions, classes, or other objects. There may exist functions summary.vector()
, summary.matrix()
, and summary.lm()
- and if a new class type needs a summary
function, you need only define summary.newclass()
, rather than modify the existing summary
functions.
As an example, let’s examine a call to head(mat)
in mode detail:
class(mat)
[1] "matrix" "array"
head(mat)
[,1] [,2]
[1,] 1 3
[2,] 2 4
The object mat
has class matrix
so UseMethod("head")
searches for a function (method) called head.matrix()
to apply to mat
:
identical(head(mat), head.matrix(mat))
[1] TRUE
head.matrix
function (x, n = 6L, ...)
{
checkHT(n, d <- dim(x))
args <- rep(alist(x, , drop = FALSE), c(1L, length(d), 1L))
ii <- which(!is.na(n[seq_along(d)]))
args[1L + ii] <- lapply(ii, function(i) seq_len(if ((ni <- n[i]) <
0L) max(d[i] + ni, 0L) else min(ni, d[i])))
do.call(`[`, args)
}
<bytecode: 0x1268d5c58>
<environment: namespace:utils>
You can see all the methods associated with a generic function using methods()
.
methods(head)
[1] head.array* head.data.frame* head.default* head.ftable*
[5] head.function* head.matrix
see '?methods' for accessing help and source code
The *
following some methods is used to denote methods that are not exported as part of the namesapce of the packages in which they are defined. For instance, the head.data.frame
method is defined in the (base) package utils
, but is not exported.
## Try utils::head.data.frame. Why does it fail?
getS3method("head", "data.frame")
function (x, n = 6L, ...)
{
checkHT(n, d <- dim(x))
args <- rep(alist(x, , drop = FALSE), c(1L, length(d), 1L))
ii <- which(!is.na(n[seq_along(d)]))
args[1L + ii] <- lapply(ii, function(i) seq_len(if ((ni <- n[i]) <
0L) max(d[i] + ni, 0L) else min(ni, d[i])))
do.call(`[`, args)
}
<bytecode: 0x1270db708>
<environment: namespace:utils>
When an object has more than one class, R searches successively until a suitable method is found. If a suitable method is not found, S3 generics revert to a default method when defined and throw an error if not.
getS3method('head', 'default')
function (x, n = 6L, ...)
{
checkHT(n, dx <- dim(x))
if (!is.null(dx))
head.array(x, n, ...)
else if (length(n) == 1L) {
n <- if (n < 0L)
max(length(x) + n, 0L)
else min(n, length(x))
x[seq_len(n)]
}
else stop(gettextf("no method found for %s(., n=%s) and class %s",
"head", deparse(n), sQuote(class(x))), domain = NA)
}
<bytecode: 0x137a5f1c8>
<environment: namespace:utils>
Defining a new S3 method is as simple as defining a function and naming it accordingly. Here we define a method head.green
.
##' @title Head of a `green` object
##' @param obj A `green` object
<- function(obj) {
head.green
# Check if its green
if ("green" %in% class(obj)) {
if (length(class(obj)) > 1) {
<- class(obj)[-grep("green", class(obj))][1]
next_class cat("This is a green ", next_class, ".\n", sep = "")
# This calls the next available method, allowing us to offload work in
# a method for the subclass to an existing method for the superclass.
NextMethod("head")
else {
} cat("This a generic green object.\n")
}else {
} cat("The object is not green!\n")
}
}
Now we can test it under various conditions.
## We previously assigned
class(mat)
[1] "matrix" "array"
head(mat)
[,1] [,2]
[1,] 1 3
[2,] 2 4
## Test head.green for generic class
class(mat) <- c("green", class(mat))
head(mat)
This is a green matrix.
[,1] [,2]
[1,] 1 3
[2,] 2 4
## Test on a non-green object
<- 1:100
red_obj class(red_obj) <- "red"
head.green(red_obj)
The object is not green!
head(red_obj)
[1] 1 2 3 4 5 6
In our definition of head.green
, notice the use of NextMethod()
to dispatch a method previously defined on one of the parent classes.
We can similarly define our own S3 generic functions via UseMethod()
.
Note that the first argument to both UseMethod()
and NextMethod()
should be the a character vector with the name of the generic.
##' @title Return color of an object
##' @param obj Any colored object
##' @return The color of the object
##' @rdname getColor
<- function(obj) {
getColor UseMethod("getColor")
}
##' @rdname getColor
<- function(obj) {
getColor.default # Are any classes colors?
<- class(obj) %in% colors()
ind if (any(ind)) {
# Yes. Return color with highest class predence.
class(obj)[which(ind)[1]]
else {
} # default to black
"black"
}
}
##' @rdname getColor
<- function(obj) {
getColor.green "darkgreen"
}
<- 1
obj getColor(obj)
[1] "black"
class(obj) <- "blue"
getColor(obj)
[1] "blue"
class(obj) <- "green"
getColor(obj)
[1] "darkgreen"
Note the use of the Roxygen @rdname
. This specifies that these functions should have their help on the same shared page.
As a quick, somewhat contrived, example of how we might use this, we could define a col_boxplot
function to pick colors according to the class of the object passed.
##' A box plot function that uses the class attribute to define colors.
##' @param dat A list of vectors
##' @param ... Additional arguments to `boxplot`
<- function(dat, ...) {
col_boxplot stopifnot(is.list(dat))
<- sapply(dat, getColor)
col boxplot(dat, col = col, ...)
}
# Define some iid data
<- rnorm(100, 1, 1)
x class(x) <- 'green'
<- rnorm(100, 0, 2)
y class(y) <- 'red'
<- rnorm(100, 0, 1)
z col_boxplot(list(x = x, y = y, z = z), las = 1)
You should be aware that the class of the object returned by some generic functions (especially primitives) can depend on the input class.
class(x + y)
[1] "green"
class(y + x)
[1] "red"
class(mean(x))
[1] "numeric"
Let’s define an S3 class to define numbers which carry their own rounding information. As we’ve seen, defining an S3 class is a very casual affair - we simply assign a class
to an object. Most S3 classes are built off one of the base R types - vector
, matrix
, list
or data.frame
, and then we define some functions along with them. We’ll create a helper function first to generate our new rounded_numeric
class.
##' @title Create `rounded_numeric`
##' @param n Numeric
##' @param digits Digits to round to
##' @return New `rounded_numeric` object
<- function(n, digits) {
new_rounded_numeric <- list(n = n,
out digits = digits)
class(out) <- c("rounded_numeric", class(out))
return(out)
}
Note the extension of the class
for fall back.
<- new_rounded_numeric(pi, 3)
rn class(rn)
[1] "rounded_numeric" "list"
print(rn)
$n
[1] 3.141593
$digits
[1] 3
attr(,"class")
[1] "rounded_numeric" "list"
Now we simply define a print.rounded_numeric
class which operates as we’d like.
<- function(rn) {
print.rounded_numeric return(round(rn$n, rn$digits))
}print(rn)
[1] 3.142
The S3 system described above is very informal and very flexible, making it easy to work with, but at the expense of the safety and uniformity of a more formal OO system.
The S4 system is a more formal OO system in R. One key difference is that S4 classes have formal definitions and classes, methods, and generics must all be explicitly defined as such. The functionality of the S4 object system comes from the (base) “methods” package.
S4 classes are defined using the setClass
function:
setClass("color_vector",
slots = c(data = "numeric",
color = "character"))
Create a new instance of an S4 class using new
:
<- new("color_vector", data = 3, color = "darkgreen")
x x
An object of class "color_vector"
Slot "data":
[1] 3
Slot "color":
[1] "darkgreen"
The function new
is used above as a constructor for creating an object with the desired class. Most S4 classes defined in packages you download have their own constructors which you should use when defined. We can create a default constructor by assigning the output of setClass
a name:
<- setClass("color_vector",
color_vector slots = c(data = "numeric",
color = "character"))
<- color_vector(data = 1:4,
y color = "red")
y
An object of class "color_vector"
Slot "data":
[1] 1 2 3 4
Slot "color":
[1] "red"
You can also define a constructor as a whole new function, thus giving you more flexibility in argument names.
<- function(x, color) {
color_vector2 return(new("color_vector", data = x, color = color))
}<- color_vector2(5:10, "green")
z z
An object of class "color_vector"
Slot "data":
[1] 5 6 7 8 9 10
Slot "color":
[1] "green"
You can access and set attributes for an S4 object using an @
symbol, the slot
function, or an attr(obj, 'name')
construction:
@color x
[1] "darkgreen"
@data <- rnorm(10, 1, 1)
xslot(x, "color")
[1] "darkgreen"
attr(x, "color") <- "purple"
names(attributes(x))
[1] "data" "color" "class"
A validator is a function that ensures an object is a valid member of a given class. Here is an example validator for our “color_vector” class.
setValidity("color_vector", function(object){
if (!(object@color %in% colors())) {
stop(paste("@color = ", object@color, "is not a valid color"))
}return(TRUE)
})
Class "color_vector" [in ".GlobalEnv"]
Slots:
Name: data color
Class: numeric character
color_vector2(1:3, color = "A")
Error in validityMethod(object): @color = A is not a valid color
The validator gets run anytime a new object of the class is created, or you can run it at will with the validObject()
function.
@color <- "a"
y y
An object of class "color_vector"
Slot "data":
[1] 1 2 3 4
Slot "color":
[1] "a"
validObject(y)
Error in validityMethod(object): @color = a is not a valid color
We can control how an object of class color_vector
gets displayed by defining a show
method (the S4 equivalent of print
).
## This is an S4 generic
show(x)
An object of class "color_vector"
Slot "data":
[1] 0.3657137 2.2081636 2.3797787 1.1081977 -0.4538433 1.3306691
[7] 2.1953265 1.8631645 0.9847613 0.9197028
Slot "color":
[1] "purple"
##' @title Display a `color_vector` object
##' @param object A `color_vector` object
setMethod("show", "color_vector",
function(object) {
cat(object@color, ":")
str(object@data)
cat("\n")
return(invisible(object))
} )
The use of return(invisible(...))
ensures that something sensible gets returned. If you truly want nothing returned, try return(invisible(NULL))
.
Now, when we call show
on an object of class color_vector
R will use the custom method.
show(x)
purple : num [1:10] 0.366 2.208 2.38 1.108 -0.454 ...
# Note: show, like print, is the default method for an unassigned object.
x
purple : num [1:10] 0.366 2.208 2.38 1.108 -0.454 ...
In addition, there are often so-called “getter” and “setter” functions for obtaining and modifying slots. These are often more formally called “accessors” and “mutators”.
First, we define a new S4 generic.
setGeneric("color",
function(object) {
standardGeneric("color")
})
[1] "color"
Next, we can define a version for color_vector
.
##' @title Return the color of a `color_vector`
##' @param object A `color_vector` object
##' @return It's color
setMethod("color", "color_vector",
function(object) {
return(slot(object, "color"))
})
color(x)
[1] "purple"
color(LETTERS)
Error: unable to find an inherited method for function 'color' for signature 'object = "character"'
Note that there isn’t the idea of a default dispatch like there is in S3 - There’s nothing like color.default
. There are some hacky ways around this (such as setMethod(..., signature(object = "any"), ...
), but generally you need to write a specific method for each class.
We can similarly define a method that allows the user to change the value in the color slot, the “setter” or “mutator”
setGeneric("color<-",
function(object, value) {
standardGeneric("color<-")
})
[1] "color<-"
##' @title Set the color of a `color_vector`
##' @param object A `color_vector` object
##' @param value New color.
##' @return The updated object
setMethod("color<-", "color_vector",
function(object, value) {
@color <- value
objectvalidObject(object) # Re-run validity check
return(object)
}
)
color(x) <- "purple"
color(x)
[1] "purple"
show(x)
purple : num [1:10] 0.366 2.208 2.38 1.108 -0.454 ...
color(x) <- "a"
Error in validityMethod(object): @color = a is not a valid color
# unchanged since "setter" failed x
purple : num [1:10] 0.366 2.208 2.38 1.108 -0.454 ...
We can also define math operations in the same fashion.
getMethod("+")
function (e1, e2) .Primitive("+")
##' @title `color_vector` arithmetic.
##'
##' New object always have the color of the left-side object.
##' @param e1 A `color_vector`
##' @param e2 A `color_vector`
##' @return A `color_vector`
setMethod("+", signature(e1 = "color_vector",
e2 = "color_vector"),
function(e1, e2) {
return(color_vector2(e1@data + e2@data,
@color))
e1
}) x
purple : num [1:10] 0.366 2.208 2.38 1.108 -0.454 ...
<- color_vector2(5, "blue")
y y
blue : num 5
+ y x
purple : num [1:10] 5.37 7.21 7.38 6.11 4.55 ...
In addition to slots, S4 classes can inherit from another class. The benefit of doing this is that, similar to how S3 functions dispatch on the list of classes, functions on an S4 class that inherit another class will dispatch to those class’s functions.
<- setClass("newclass1",
newclass1 slots = c(a = "numeric",
b = "character"))
<- setClass("newclass2",
newclass2 contains = "numeric",
slots = c(b = "character"))
<- newclass1(a = 5, b = "c")
nc1 nc1
An object of class "newclass1"
Slot "a":
[1] 5
Slot "b":
[1] "c"
<- newclass2(5, b = "c")
nc2 nc2
An object of class "newclass2"
[1] 5
Slot "b":
[1] "c"
The “contained” class can be accessed via the special slot @.Data
:
rbind(slotNames(nc1), slotNames(nc2))
[,1] [,2]
[1,] "a" "b"
[2,] ".Data" "b"
@.Data nc2
[1] 5
Neither new class has any methods defined for it, but since newclass2
“contains” a numeric, method dispatch will work for it:
sum(nc1)
Error in sum(nc1): invalid 'type' (S4) of argument
sum(nc2)
[1] 5
This can be very powerful when inheriting more “advanced” classes such asdata.frame
, or even more powerfully, something like lm
- this allows you to write your own custom function for working with a model output, without having to re-implement all the existing generic functions such as coefficients
, predict
, or residuals
.
You can of course have a custom S4 class inherit another custom S4 class.
<- setClass("newclass1",
newclass1 slots = c(a = "logical"))
<- setClass("newclass2",
newclass2 contains = "newclass1",
slots = c(b = "numeric"))
<- newclass1(a = TRUE)
c1 <- newclass2(c1, b = 5)
c2 c1
An object of class "newclass1"
Slot "a":
[1] TRUE
c2
An object of class "newclass2"
Slot "b":
[1] 5
Slot "a":
[1] TRUE
Note that the slots for the two classes “mix” together:
slotNames(c2)
[1] "b" "a"
@a c2
[1] TRUE
@b c2
[1] 5
Slot names cannot be overwritten in the superclass:
<- setClass("newclass3",
newclass3 contains = "newclass2",
slots = c(a = "numeric"))
Error in reconcilePropertiesAndPrototype(name, slots, prototype, superClasses, : Definition of slot 'a', in class "newclass3", as "numeric" conflicts with definition, inherited from class "newclass2", as "logical"