Math for Data Science Beginners: Probability and Statistics
Mathematics remains a major hindrance for beginners trying to get into data science
Mathematics remains a major hindrance for beginners trying to get into data science. Most beginners interested in getting into the field of data science are always concerned about the math requirements. Data science is a very quantitative field that requires advanced mathematics. But to get started, you only need to master a few math topics. In this series of articles, we will dive deep and discuss the essential math topics that must be reviewed before embarking on a data science journey. The topics to be covered in the series are:
This article will focus on Statistics and Probability. Please see links above for other articles in the series.
Statistics and Probability
Statistics and Probability is used for visualization of features, data preprocessing, feature transformation, data imputation, dimensionality reduction, feature engineering, model evaluation, etc. This article will focus on the fundamental Statistics and Probability concepts for beginners in the field, namely: Mean or Expectation Value, Variance and Standard Deviation, Confidence Interval, Central Limit Theorem, Correlation and Covariance, Probability Distribution, and Bayes’ Theorem.
1) Mean or Expectation Value
Let X be a random variable with N observations, then the mean value of X is given by
The mean or expectation value is a measure of central tendency.
2) Variance and Standard Deviation
Let X be a random variable with N observations, then the variance of X is given by:
The standard deviation is the square root of the variance and is a measure of uncertainty or volatility.