Data Visualization — Scatter Plot
--
Learn the basics of data visualization using scatter plots
About the Author
Benjamin O. Tayo is a data science educator, tutor, coach, mentor, and consultant. Contact me for more information about our services and pricing: benjaminobi@gmail.com
Dr. Tayo has written close to 300 articles and tutorials in data science for educating the general public. Support Dr. Tayo’s educational mission using the links below:
PayPal: https://www.paypal.me/BenjaminTayo
CashApp: https://cash.app/$BenjaminTayo
INTRODUCTION
After reading this article, the reader will learn the following:
- Define a scatter plot
- Generate a scatter plot using python
- Interpret a scatter plot
To learn about data visualization using line plots, please see the link below:
A scatter plot is one of the most useful types of data visualization used in data science and machine learning. A scatter plot is a simple two-dimensional plot with the x-axis representing the independent variable, and the y-axis representing the dependent variable.
From a scatter plot, one can determine if there is a functional relationship between the independent variable x, and the dependent variable y. For example, if y increases as x increases, then x and y are said to be positively correlated.
Python Implementation of scatter plot
As an illustration, we will generate a scatter plot for the stock prices of some technology companies.
# import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# read dataset
url = 'https://raw.githubusercontent.com/bot13956/datasets/master/tech-stocks-04-2021.csv'
data = pd.read_csv(url)
# example 1: scatter plot for Apple and Tesla stock prices
x =…