# Data Scaling for Beginners

*How to scale your data to render it suitable for model building*

# About the Author

**Benjamin O. Tayo** is a data science educator, tutor, coach, mentor, and consultant. Contact me for more information about our services and pricing: benjaminobi@gmail.com

Dr. Tayo has written close to 300 articles and tutorials in data science for educating the general public. Support Dr. Tayo’s educational mission using the links below:

**PayPal**: https://www.paypal.me/BenjaminTayo

**CashApp**: https://cash.app/$BenjaminTayo

# INTRODUCTION

In the machine learning process, data scaling falls under data preprocessing, or feature engineering. Scaling your data before using it for model building can accomplish the following:

- Scaling ensures that features have values in the same range
- Scaling ensures that the features used in model building are dimensionless
- Scaling can be used for detecting outliers

There are several methods for scaling data. The two most important scaling techniques are Normalization and Standardization.

# Data Scaling Using Normalization

When data is scaled using normalization, the transformed data can be calculated using this equation

where *Xmin* and *Xmax *are the minimum and maximum values of the data, respectfully. The scaled data obtained is in the range [0, 1].

# Python Implementation of Normalization

Scaling using normalization can be implemented in Python using the code below:

`from sklearn.preprocessing import Normalizer`

norm = Normalizer()

X_norm = norm.fit_transform(data)

Let X be a given data with *Xmin = 17.7* and *Xmax* = 71.4. The data X is shown in the figure below:

The normalized X is shown in the figure below: