Overview of regression methods

2019/09/03

Introduction

Regression analysis is a very large branch of statistics. In this course we make use of a variety of methods for regression modeling. Below, we first define some concepts that can be used to understand the major distinctions between various approaches to regression. Then we review some specific regression methods along with their key properties.

Before proceeding, note that regression itself is somewhat difficult to define in a way that differentiates it from the rest of statistics. In most cases, regression focuses on a conditional distribution, e.g. the conditional distribution of a variable $y$ given another variable $x$. Any analysis focusing on a conditional distribution can be seen as a form of regression analysis.

Major concepts

Models, fitting procedures, and algorithms

Another important distinction to make is between the various regression model structures (e.g. different model parameterizations), and different ways for fitting a regression model structure to data. For example, the linear mean model is one prominent structural model for regression, in which the conditional mean function $E[y|x]$ is expressed as a linear function of the predictors in $x$. There are many “fitting procedures” that enable one to fit this linear model to data, including least squares, penalized least squares, and many variations of robust regression, maximum likelihood regression, and Bayesian regression. However all of these fitting procedures are fitting the same class of models to the data.

In other words, least squares is a fitting procedure that can be used to fit a model to data. The least squares fitting procedure has statistical properties (i.e. it is known to be efficient, consistent, etc. in some settings). A different (e.g. Bayesian or penalized) procedure for fitting the same class of models will have its own, potentially different properties (e.g. it may be consistent in some settings where least squares is not and vice-versa).

Algorithms are specific numerical procedures used to implement fitting procedures so that they can be used to fit models to data sets. In some cases, e.g. least squares, the algorithm is essentially exact, and therefore does not impact the statistical properties of the analysis. In a few settings, e.g. regression trees or deep neural networks, “the algorithm is the model”, and it is difficult to distinguish the model structure itself from the estimation approach used to fit the model to data.

Some specific regression analysis methods

Other forms of regression: