# SVD & PCA

Alex Egg,

## Singular Value Decomposition & Principal Component Analysis

PCA is used for finding the directions where most of the energy of the vectors lies.

SVD is a decomposition of a matrix into various components of which some are eigenvectors

PCA is typically computed using the SVD of the data. However, it can also be computed using the covariance matrix of the data.

The Principal Components of a PCA are the same vectors that are found through SVD.

## SVD

SVD will produce the decomposition of a rectangular matrix $M$ which is composed of $U$, $S$, and $V$. Being a decomposition, the parts can be combined to recreate $\hat{M}$ *.

## PCA

PCA works by finding the eigenvectors of the covariance matrix and ranking them by their respective eigenvalues. The eigenvectors with the greatest eigenvalues are the Principal Components of the data matrix.

Additionally, PCA can be computed form the results of SVD, however, it would be more expensive than generating it from the covariance matrix**. Computing the Covariance matrix will introduce rounding errors. Tradeoff is time vs accuracy. In my experience the loss in accuracy is negligible and most implementations of PCA use the covariance matrix and not SVD to enjoy the speed boost.

I just read through the source of the PCA module in SciKit and it appears that it uses the SVD method.

### Computing PCA w/ SVD

Here is a method to do this using SVD: As a first step, perform mean normalization and feature scaling on X. Thereafter, compute a covariance matrix Sigma for X. In the followup step, compute SVD on Sigma to obtain it’s eigenvectors.

Some interesting notes on the result of factorization:

• The columns of matrix U form the eigenvectors of Sigma.
• The S matrix is a diagonal matrix. The values of the diagonal are called eigen values and are in descending order.

The matrix U is an n x n matrix. If you reduce the number of column vectors to k (first k), then you have obtained the k-dimensional hyper-plane in this example. The values of S give you the amount of variance retained by this reduction. To be precise, the ratio of sum of top k values along the diagonal S matrix to sum of all values along the diagonal S matrix gives the amound of variance.

Let’s call the matrix U with reduced column vectors as $U_{reduced}$. The data set in the reduced dimension can be obtained as: $U_{reduced}^T * X$

* This is the motivation behind Matrix Factorization in Recommender Systems: Reconstruct the dense user/items ratings matrix $\hat{R}$ by combining the latent factors $U$ and $V$ which are learnt from the sparse ratings matrix data $R$.