Saturday, July 29, 2006

K.L. Method


The most popular statistical method for dimensionality reduction of a large data set is the Karhunen-Loeve (K-L) method, also called Principal Component Analysis.

Principal component analysis is a method of transforming the initial data set represented by vector samples into a new set of vector samples with derived dimensions. The goal of this transformation is to concentrate the information about the differences between samples into a small number of dimensions.

More formally, the basic idea can be described as follows: A set of n-dimensional vector samples X = {x1, x2, x3 …, xm} should be transformed into another set Y = {y1, y2, …, ym} of the same dimensionality, but Y have the property that most of their information content is stored in the first few dimensions. This will allow us to reduce the data set to a smaller number of dimensions with low information loss.

No comments: