Principal Component Analysis

April 28, 2025April 27, 2025

Start with your data matrix

\(X =
\begin{bmatrix}
2 & 0 \\
0 & 2 \\
3 & 3 \\
\end{bmatrix} \)

n samples (rows).
p features (columns).

Step 1: Centering (standard for PCA):

Subtract the mean of each feature (column).
This shifts the data so that each feature has a mean of 0.

\( X_{centered} =X – \bar{X} \)

Sometimes you also scale (normalize) each feature to have unit variance (standard deviation = 1). This is called standardization.

\( X_{scaled} = \frac{X – \bar{X}}{\sigma} \)

Step 2: Compute the Covariance Matrix

\( C=\frac{1}{n-1} X^T_{centered} X_{centered} \)

This gives you a \( p \times p \) matrix.

Step 3: Perform Eigen Decomposition on Covariance Matrix

Solve:

\( C v = \lambda v \)

λ = eigenvalue.
v = eigenvector.

Step 4: Select Principal Components

Choose the top k eigenvectors (based on the largest eigenvalues).
To know how much variance each principal component explains:

\( Variance \ explained \ by \ PC_i = \frac{\lambda_i}{\sum \lambda_j} \)

This helps you decide how many components to keep.

These eigenvectors form the projection matrix W.

\( W= \begin{bmatrix} | & & | \\ v_1 & \dots & v_k \\ | & & | \end{bmatrix} \)

Step 5: Project the Data

\( Z=X_{centered} W \)

Z is the transformed data in the principal component space.
Z has dimensions \( n \times k \).

Leave a Comment Cancel reply