Orthogonal Projection

This tutorial explains what an orthogonal projection is in linear algebra. Further, it provides proof that the difference between a vector and a subspace is orthogonal to that subspace.

Let’s define two vectors, \(\vec{X} \) and \(\vec{Y} \), and we want to find the shortest distance between \(\vec{Y} \) and the subspace defined by the span of \(\vec{X} \).

To simplify the discussion, we use the 2-dimensional space for visualization.

\( \vec{X}=\left[\begin{array}{ccc}
2 \end{array}

\( \vec{Y}=\left[\begin{array}{ccc}
3 \end{array}

A vector and a subspace
A vector and a subspace

Further, we name \( L \) the subspace defined by the span of the vector \(\vec{X} \). The projection of \(\vec{Y} \) → onto \( L \) is the closest point on \( L \) to \(\vec{Y} \). 

orthogonal projection
orthogonal projection

Intuitively, the shortest distance between these two points is orthogonal to the space \( L \). The proof is provided in the next section.

\( ( \hat{Y}-\vec{Y} ) \cdot \vec{X} = 0 \)

\( ( c \vec{X}-\vec{Y} ) \cdot \vec{X} = 0 \)

We can further get the following:

\( c \vec{X} \cdot \vec{X} – \vec{Y} \cdot \vec{X}= 0 \)


\( c = \frac{\vec{Y} \cdot \vec{X}}{\vec{X} \cdot \vec{X}} \)

Note that, the doct product \( \cdot \) means that \( \vec{X} \cdot \vec{Y} =x_1*y_1+…+ x_n*y_n \).


This section provides proof that the shortest distance between a point and a space is that the line connecting the point and the space is orthogonal to that space.

In particular, since \( \hat{Y} =c X \), we know that \( \hat{Y} \) is on space of \( L \). However, we do not know the exact location of \( \hat{Y} \). To determine the location of \( \hat{Y} \), it is to equivalently determine the value of c.

different projection possibilities
different projection possibilities

And, we target the make the difference between \( Y \) and \( \hat{Y} \) is minimal. Thus,

min | \( \hat{Y} – Y \)|

We can then just use the Euclidean distance to measure it. Thus, we can get the following.

min \( \sqrt{ \sum_{i} (\hat{Y_i} -Y_{i})^2 } =\sqrt{ \sum_{i} ( cX_{i} – Y_{i} )^2 } \)

Given the monotonicity, we can remove the square root notation and get the following.

min \( \sum_i (cX_{i} – Y_{i} )^2 \)

The function above has a quadratic with respect to c, and thus it has the minimum value when calculating the first-order derivative.

\( \frac{d}{dc} \sum_i (cX_{i}-Y_{i})^2 =0 \)

We can thus get the following.

\( \begin{align} \frac{d}{dc} \sum_i (cX_{i}-Y_{i})^2 &= \sum_i 2 X_{i} (cX_{i}-Y_{i}) \\ &=2 \sum_i X_{i} (cX_{i}-Y_{i}) \\ &= 2 \sum_i ( cX_{i} X_{i}-X_{i}Y_{i}) \end{align} \)

Thus, we can further get the following.

\( 2 \sum_i ( cX_{i} X_{i}-X_{i}Y_{i}) =0 \)

Thus, we remove the \( \sum \) notation by using vector dot product format.

\( cX^{T} \cdot X -X^T \cdot Y =0 \) [eq. 1]

Based on eq. 1, we can get eq. 2, which means that the vector of \( \hat{Y} – Y \) is orthogonal to the space of \( L\), which is defined by the vector of X. That is, the vector \( Y – \hat{Y} \) and \( L\) are perpendicular to each other (i.e., 90 degrees).

\( X^{T} \cdot (c X-Y) =0 \)

\( X^{T} \cdot (\hat{Y}-Y) =0 \) [eq. 2]

orthogonal projection
orthogonal projection

Further, from eq. 1, we can also get the value of c.

\( c= (X^{T} \cdot X)^{-1} \cdot X^{T} \cdot Y \)


\( \hat{Y}=c X= X c = X \cdot (X^{T} \cdot X)^{-1} \cdot X^{T} \cdot Y = P \cdot Y \)


\( P = X \cdot (X^{T} \cdot X)^{-1} \cdot X^{T} \)

Note that, P is a square matrix, and you can easily prove the following two properties of P.

\( P^2 = P \)

\( P^T = P \)

Thus, P is an orthogonal projection matrix (Wikipedia) since P2 = p = pT. Thus, this tutorial provides proof that the shortest distance between a vector arrow point and a subspace is through a line that is orthogonal to that subspace.


  1. Projections in the course of Biomedical Data Science (GitHub)
  2. Projection (linear algebra) (Wikipedia)
  3. Projections onto subspaces (MIT)

Leave a Comment