Math for Data Science Beginners: Principal Component Analysis
Mathematics remains a major hindrance for beginners trying to get into data science
Read every story from Benjamin Obi Tayo Ph.D. (and thousands of other writers on Medium). Your membership fee directly supports Benjamin Obi Tayo Ph.D. and other writers you read. You’ll also get full access to every story on Medium. Click on the following link to become a member: https://benjaminobi.medium.com/membership
Mathematics remains a major hindrance for beginners trying to get into data science. Most beginners interested in getting into the field of data science are always concerned about the math requirements. Data science is a very quantitative field that requires advanced mathematics. But to get started, you only need to master a few math topics. In this series of articles, we will dive deep and discuss the essential math topics that must be reviewed before embarking on a data science journey. The topics to be covered in the series are:
- Functions
- Plotting and Data Visualization
- Linear Regression
- Statistics and Probability
- Linear Algebra
- Principal Component Analysis
This article will focus on Principal Component Analysis. Please see links above for other articles in the series.
Principal Component Analysis (PCA) is a statistical method that is used for feature extraction. PCA is used for high-dimensional and highly correlated data. The basic idea of PCA is to transform the original space of features into the space of principal components, as shown in Figure 1below:
A PCA transformation achieves the following:
a) Reduce the number of features to be used in the final model by focusing only on the components accounting for the majority of the variance in the dataset.
b) Removes the correlation between features.