Linear Regression and Orthogonal Projection

This tutorial explains why and how linear regression can be viewed as an orthogonal projection on 2 and 3-dimensional spaces.

Projection with 2 Dimensions

Suppose that both X₀ and Y have 2 dimensions (e.g., 2 observations from 2 participants). It is worth pointing out that, when talking about dimensions here, we refer to the number of observations (i.e., rows in a matrix format), rather than the number of independent variables (i.e., columns in a matrix, i.e., Xs).¹

Relatedly, I purposely use X₀ (rather than X or X₁) because I want to emphasize that we are talking about one X variable here and this X variable (i.e., X₀) is an all 1’s vector.

\( X_0 =\left[\begin{array}{ccc}
1 \\
1 \end{array}
\right]\)

That is, here, we are actually just estimating the regression coefficient of b₀, without even estimating b₁, i.e., the slope. The main reason for such simplicity is that, if there is another column of data like X, and we need to estimate b₁, namely the slope, we then do not have enough degree of freedom. (Note that, 2 observations give us only 2 degrees of freedom to use.)

\( Y = b_0 X_0 \)

Thus, by replacing X₀ with the actual all 1’s vector, we can get the following.

\( Y = b_0 \left[\begin{array}{ccc}
1 \\
1 \end{array}
\right] \)

Further, the discussion here limits X₀ to 2 dimensions space (e.g., 2 observations), which is easier to plot and understand on a computer screen, i.e., the screen you see now. We can then draw a figure as follows.

Intercept as projection in linear regression (2 dimensions)

Suppose that we have 2 observations for Y as well.

\( Y =\left[\begin{array}{ccc}
1\\
3 \end{array}
\right]\)

We can then add Y into the same plane, as well as expand X₀ as a span subspace.

Here, the finding of b₀ needs to meet two requirements:

(1) we are finding the value b₀ on the subspace defined by the span of vector X₀;
(2) based on the basic idea of linear regression, we want to make sure the residuals are the minimal.

As discussed in my other tutorial on orthogonal projection, these two requirements will lead to finding an orthogonal projection on the subspace.

Further, based on my other tutorial on mean as a projection, we know that b₀ actually is the mean of all the dimensions in Y, namely,

\( b_0 = \frac{1+3}{2} = 2 \)

We can show the value of b₀ visually as follows.

Projection with 3 Dimensions

This section explains what it means to have 3 dimensions in the context of linear regression and orthogonal projection.

For 3 dimensions, we can still just calculate the mean for Y, if you only have an all 1’s vector X₀.

\( Y =\left[\begin{array}{ccc}
1\\
3 \\ 1 \end{array}
\right]; X =\left[\begin{array}{ccc}
1 \\
1 \\ 1 \end{array}
\right]\)

\( Y = b_0 X_0 \)

As we know, basically, b₀ is the mean of all the dimensions in Y.

\( b_0 = \frac{1+3+1}{3} = 1.7 \)

Intercept as projection in linear regression (3 dimensions)

Since we have 3 dimensions (i.e., 3 observations), it allows us to estimate another parameter such as the slope, i.e., b₁. The following is the data.

\( Y =\left[\begin{array}{ccc}
1\\
3 \\ 1 \end{array}
\right]; X =\left[\begin{array}{ccc}
1 & 2\\
1 & 3 \\ 1 & 3 \end{array}
\right]\)

For this, we are going to need to estimate both b₀ and b₁.

\( Y = b_0 X_0 +b_1 X_1 \)

For this data set, we can add X₁into the 3 dimension space.

linear regression as projection in a 3-dimensional space

Note that, X₁ and X₀ define a column space, col (X₀, X₁).

Note that, such a column space is not limited to two lines, but actually should be expanded beyond that. That is, the following is a more accurate visual representation.

Based on my tutorial on orthogonal projection, we know that that, to find a point on the column space is to find an orthogonal projected point. Mathematically, we can calculate it as follows.

\( b= (X^{T} \cdot X)^{-1} \cdot X^{T} \cdot Y \)

Thus,

\( b=\left[\begin{array}{ccc}
b_0 \\
b_1 \end{array}
\right] = (X^{T} \cdot X)^{-1} \cdot X^{T} \cdot Y =(\left[\begin{array}{ccc} 1 & 1 &1 \\ 2 & 3 & 3 \end{array}\right] \left[\begin{array}{ccc} 1 & 2\\ 1 & 3 \\ 1 & 3 \end{array}\right])^{-1} \left[\begin{array}{ccc} 1 & 1 &1 \\ 2 & 3 & 3 \end{array}\right] \left[\begin{array}{ccc} 1\\ 3 \\ 1 \end{array}
\right] =\left[\begin{array}{ccc}
-1 \\
1 \end{array}
\right] \)

Thus, we can get \( \hat{Y} \) as follows.

\( \hat{Y}= -1*\left[\begin{array}{ccc} 1 \\ 1 \\ 1 \end{array} \right] +1* \left[\begin{array}{ccc} 2 \\ 3 \\ 3 \end{array} \right] = \left[\begin{array}{ccc} 1 \\ 2 \\ 2 \end{array} \right]\)

Thus, we can also plot it as follows.

You might wonder why the point of (1, 2, 2) is not between the arrow lines of X₁ and X₀. Actually, you can use the parallelogram law to draw the arrow lines for vectors. See below for the illustration.

parallelogram law for linear regression in a 3-dimensional space

I also used Andy Eggers’s R code to generate the following 3D visual representation. Unfortunately, I am not sure how to make it interactable on this website.

3D demonstration for linear regression as a projection

The following is the R code used to generate the figure above. The credit should go to Andy Eggers.

# R code to plot the projection for linear regression in 3D figure format

library(rgl)
constant <- c(1,1,1)
y <- c(1,3,1)
x <- c(2,3,3)
the_lm <- lm(y ~ x)

base_plot <- function(maxval = 3){
  plot3d(x = 0, y = 0, z = 0, type = "n",  
         xlim = c(0,maxval), 
         ylim = c(0,maxval), 
         zlim = c(0,maxval),
         xlab = "Obs 1", ylab = "Obs 2", zlab = "Obs 3")
  text3d(x, texts = "X1")
  lines3d(x = rbind(c(0,0,0), x))
  text3d(constant, texts = "X0")
  lines3d(x = rbind(c(0,0,0), constant))
  text3d(y, texts = "Y")
  lines3d(x = rbind(c(0,0,0), y), col = "red")
}

base_plot()
df <- data.frame(rbind(x, constant, c(0,0,0)))
colnames(df) <- c("Obs 1", "Obs 2", "Obs 3")
plane_lm <- lm(`Obs 3` ~ `Obs 1` + `Obs 2`, data = rbind(df))
points_df <- expand.grid(x = seq(0, 3, .1), y = seq(0, 3, .1))
points_df$z <- coef(plane_lm)["`Obs 1`"]*points_df$x + coef(plane_lm)["`Obs 2`"]*points_df$y
points3d(points_df, alpha = .3) 

points3d(points_df, alpha = .3) 
the_pred <- predict(the_lm)
lines3d(rbind(c(0,0,0), the_pred), col = "red", lwd = .5)
lines3d(rbind(y, the_pred), col = "black", lwd = .5)

Reference

You can refer to this online tutorial by Andy Eggers on the geometric interpretation of linear regression on the difference between rows and columns on dimensions.
This tutorial is also inspired by Vladimir Mikulik‘s Why Linear Regression is a Projection (published on Medium), Thomas S. Robinson’s Chapter 3 Linear Projection, and Rafael Irizarry and Michael Love’s tutorial on projections.

Projection with 2 Dimensions

Projection with 3 Dimensions

Reference

Leave a Comment Cancel reply