Learn the basics of data visualization using scatter plots
About the Author
Dr. Tayo has written close to 300 articles and tutorials in data science for educating the general public. Support Dr. Tayo’s educational mission using the links below:
After reading this article, the reader will learn the following:
- Define a scatter plot
- Generate a scatter plot using python
- Interpret a scatter plot
To learn about data visualization using line plots, please see the link below:
Data Visualization — Line Plot
Learn the basics of data visualization using line plots
A scatter plot is one of the most useful types of data visualization used in data science and machine learning. A scatter plot is a simple two-dimensional plot with the x-axis representing the independent variable, and the y-axis representing the dependent variable.
From a scatter plot, one can determine if there is a functional relationship between the independent variable x, and the dependent variable y. For example, if y increases as x increases, then x and y are said to be positively correlated.
Python Implementation of scatter plot
As an illustration, we will generate a scatter plot for the stock prices of some technology companies.
# import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# read dataset
url = 'https://raw.githubusercontent.com/bot13956/datasets/master/tech-stocks-04-2021.csv'
data = pd.read_csv(url)
# example 1: scatter plot for Apple and Tesla stock prices