This tutorial explains what an orthogonal projection is in linear algebra. Further, it provides proof that the difference between a vector and a subspace is orthogonal to that subspace.

Let’s define two vectors, \(\vec{X} \) and \(\vec{Y} \), and we want to find the shortest distance between \(\vec{Y} \) and the subspace defined by the span of \(\vec{X} \).

To simplify the discussion, we use the 2-dimensional space for visualization.

\( \vec{X}=\left[\begin{array}{ccc}

3\\

2 \end{array}

\right]\)

\( \vec{Y}=\left[\begin{array}{ccc}

1\\

3 \end{array}

\right]\)

Further, we name \( L \) the subspace defined by the span of the vector \(\vec{X} \). The projection of \(\vec{Y} \) → onto \( L \) is the closest point on \( L \) to \(\vec{Y} \).

Intuitively, the shortest distance between these two points is orthogonal to the space \( L \). The proof is provided in the next section.

\( ( \hat{Y}-\vec{Y} ) \cdot \vec{X} = 0 \)

\( ( c \vec{X}-\vec{Y} ) \cdot \vec{X} = 0 \)

We can further get the following:

\( c \vec{X} \cdot \vec{X} – \vec{Y} \cdot \vec{X}= 0 \)

Further,

\( c = \frac{\vec{Y} \cdot \vec{X}}{\vec{X} \cdot \vec{X}} \)

Note that, the doct product \( \cdot \) means that \( \vec{X} \cdot \vec{Y} =x_1*y_1+…+ x_n*y_n \).

**Proof**

This section provides proof that the shortest distance between a point and a space is that the line connecting the point and the space is orthogonal to that space.

In particular, since \( \hat{Y} =c X \), we know that \( \hat{Y} \) is on space of \( L \). However, we do not know the exact location of \( \hat{Y} \). To determine the location of \( \hat{Y} \), it is to equivalently determine the value of c.

And, we target the make the difference between \( Y \) and \( \hat{Y} \) is minimal. Thus,

min | \( \hat{Y} – Y \)|

We can then just use the Euclidean distance to measure it. Thus, we can get the following.

min \( \sqrt{ \sum_{i} (\hat{Y_i} -Y_{i})^2 } =\sqrt{ \sum_{i} ( cX_{i} – Y_{i} )^2 } \)

Given the monotonicity, we can remove the square root notation and get the following.

min \( \sum_i (cX_{i} – Y_{i} )^2 \)

The function above has a quadratic with respect to c, and thus it has the minimum value when calculating the first-order derivative.

\( \frac{d}{dc} \sum_i (cX_{i}-Y_{i})^2 =0 \)

We can thus get the following.

\( \begin{align} \frac{d}{dc} \sum_i (cX_{i}-Y_{i})^2 &= \sum_i 2 X_{i} (cX_{i}-Y_{i}) \\ &=2 \sum_i X_{i} (cX_{i}-Y_{i}) \\ &= 2 \sum_i ( cX_{i} X_{i}-X_{i}Y_{i}) \end{align} \)

Thus, we can further get the following.

\( 2 \sum_i ( cX_{i} X_{i}-X_{i}Y_{i}) =0 \)

Thus, we remove the \( \sum \) notation by using vector dot product format.

\( cX^{T} \cdot X -X^T \cdot Y =0 \) *[eq. 1]*

Based on** eq. 1**, we can get

**, which means that the vector of \( \hat{Y} – Y \) is orthogonal to the space of \( L\), which is defined by the vector of X. That is, the vector \( Y – \hat{Y} \) and \( L\) are perpendicular to each other (i.e., 90 degrees).**

*eq. 2*\( X^{T} \cdot (c X-Y) =0 \)

\( X^{T} \cdot (\hat{Y}-Y) =0 \) *[eq. 2]*

Further, from ** eq. 1**, we can also get the value of c.

\( c= (X^{T} \cdot X)^{-1} \cdot X^{T} \cdot Y \)

Thus,

\( \hat{Y}=c X= X c = X \cdot (X^{T} \cdot X)^{-1} \cdot X^{T} \cdot Y = P \cdot Y \)

where,

\( P = X \cdot (X^{T} \cdot X)^{-1} \cdot X^{T} \)

Note that, P is a square matrix, and you can easily prove the following two properties of P.

\( P^2 = P \)

\( P^T = P \)

Thus, P is an orthogonal projection matrix (Wikipedia) since * P^{2} = p = p^{T}*. Thus, this tutorial provides proof that the shortest distance between a vector arrow point and a subspace is through a line that is orthogonal to that subspace.