Member-only story

How to Balance Simplicity and Complexity in Machine Learning

A tutorial on bias-variance tradeoff

--

Photo by Vicky Sim on Unsplash

In statistics and machine learning, the bias-variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples and vice versa. The bias-variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set:

  • The bias is an error from erroneous assumptions in the learning algorithm. High bias (high simplicity) can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
  • The variance is an error from sensitivity to small fluctuations in the training set. High variance (high complexity) can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
Figure 1. Illustrating the bias-variance problem. Image from author.

3 Reasons why a simple model is preferred over a complex model

  1. Prevents Overfitting: A high-dimensional dataset having too many features can sometimes lead to overfitting (model captures both real and random effects).
  2. Interpretability: An over-complex model having too many features can be hard to interpret especially when features are correlated with each other.
  3. Computational Efficiency: A model trained on a lower-dimensional dataset is computationally efficient (execution of algorithm requires less computational time).

When building a machine learning model with a high-dimensional dataset, it is always advisable to start with a simply model, then you may add complexity as needed. During model evaluation, it is important to perform several tests to make sure your model is not capturing random effects in your dataset. To be able to detect random effects, sound knowledge of the problem that your are trying to solve is important.

--

--

Benjamin Obi Tayo Ph.D.
Benjamin Obi Tayo Ph.D.

Written by Benjamin Obi Tayo Ph.D.

Dr. Tayo is a data science educator, tutor, coach, mentor, and consultant. Contact me for more information about our services and pricing: benjaminobi@gmail.com

No responses yet

Write a response