Essential Statistics for Data Science
Learn basic statistical concepts used in data science and machine learning
About the Author
Dr. Tayo has written close to 300 articles and tutorials in data science for educating the general public. Support Dr. Tayo’s educational mission using the links below:
Statistical concepts are used widely to extract useful information from data. This article will review essential statistical concepts applicable in data science and machine learning.
A probability distribution shows how feature values are distributed around the mean value. Using the iris dataset, the probability distributions for the sepal length, sepal width, petal length, and petal width can be generated using the code below.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
import seaborn as sns
iris = sns.load_dataset("iris")
Lets now focus on the sepal length variable. The probability distribution of the sepal length variable is shown below.
We observe that the probability distribution of the sepal length variable has a single maximum, hence it is unimodal. The value of the sepal length where the maximum occurs is the mode, which is about 5.8.
A plot of the probability distribution of the petal width variable is shown below.