Math for Data Science Beginners: Principal Component Analysis

Mathematics remains a major hindrance for beginners trying to get into data science

Benjamin Obi Tayo Ph.D.

--

Read every story from Benjamin Obi Tayo Ph.D. (and thousands of other writers on Medium). Your membership fee directly supports Benjamin Obi Tayo Ph.D. and other writers you read. You’ll also get full access to every story on Medium. Click on the following link to become a member: https://benjaminobi.medium.com/membership

Mathematics remains a major hindrance for beginners trying to get into data science. Most beginners interested in getting into the field of data science are always concerned about the math requirements. Data science is a very quantitative field that requires advanced mathematics. But to get started, you only need to master a few math topics. In this series of articles, we will dive deep and discuss the essential math topics that must be reviewed before embarking on a data science journey. The topics to be covered in the series are:

This article will focus on Principal Component Analysis. Please see links above for other articles in the series.

Principal Component Analysis (PCA) is a statistical method that is used for feature extraction. PCA is used for high-dimensional and highly correlated data. The basic idea of PCA is to transform the original space of features into the space of principal components, as shown in Figure 1below:

Figure 1: PCA algorithm transforms from old to new feature space so as to remove feature correlation. Image by Benjamin O. Tayo

A PCA transformation achieves the following:

a) Reduce the number of features to be used in the final model by focusing only on the components accounting for the majority of the variance in the dataset.

b) Removes the correlation between features.

Mathematical Basis of PCA

--

--

Benjamin Obi Tayo Ph.D.

Dr. Tayo is a data science educator, tutor, coach, mentor, and consultant. Contact me for more information about our services and pricing: benjaminobi@gmail.com