Principal Component Analysis

Start with your data matrix

\(X =
\begin{bmatrix}
2 & 0 \\
0 & 2 \\
3 & 3 \\
\end{bmatrix} \)

  • n samples (rows).
  • p features (columns).

Step 1: Centering (standard for PCA):

  • Subtract the mean of each feature (column).
  • This shifts the data so that each feature has a mean of 0.

\( X_{centered} =X – \bar{X} \)

Sometimes you also scale (normalize) each feature to have unit variance (standard deviation = 1). This is called standardization.

\( X_{scaled} = \frac{X – \bar{X}}{\sigma} \)

Step 2: Compute the Covariance Matrix

\( C=\frac{1}{n-1} X^T_{centered} ​X_{centered}​ \)

This gives you a \( p \times p \) matrix.

Step 3: Perform Eigen Decomposition on Covariance Matrix

Solve:

\( C v = \lambda v \)

  • λ = eigenvalue.
  • v = eigenvector.

Step 4: Select Principal Components

  • Choose the top k eigenvectors (based on the largest eigenvalues).
  • To know how much variance each principal component explains:

\( Variance \ explained \ by \ PC_i = \frac{\lambda_i}{\sum \lambda_j} \)

  • This helps you decide how many components to keep.
  • These eigenvectors form the projection matrix W.

\( W= \begin{bmatrix} | & & | \\ v_1 & \dots & v_k \\ | & & | \end{bmatrix} \)

Step 5: Project the Data

\( Z=X_{centered} ​W \)

  • Z is the transformed data in the principal component space.
  • Z has dimensions \( n \times k \).

Leave a Comment