Download a file from the internet using the R functions download.file() and read.csv()
This tutorial explores two important ways of downloading a dataset from the internet. Most often, we use data from various sources to perform analysis. In most cases, the data can be downloaded from the internet. In this tutorial, we provide easy step-by-step instructions that can be used to download files from the internet. The instructions are applicable to any file type, however we will focus on comma separated values (csv) files, since most datasets are saved in this format. In this example, we show how to download the file: “introduction_to_physics_grades.csv” from the following repository: https://github.com/bot13956/datasets.
Method 1: Using the download.file() function in R
Use the function setwd() to choose the directory where the file should be saved:
Then use the function download.file(url, filename) to download the file. Here, url is a string containing the URL of the file, and filename is the filename of the destination file.
Notes on providing the correct URL
If you navigate to the github repository https://github.com/bot13956/datasets and click on the file: “introduction_to_physics_grades.csv”, it takes you to the following URL: https://github.com/bot13956/datasets/blob/master/introduction_to_physics_grades.csv
If you input this URL into your download.file() function, for example using the command:
you get the following messages:
Content type ‘text/html; charset=utf-8’ length unknown
downloaded 195 KB
The file has been downloaded incorrectly, as the content type is set to text/html. If you navigate to your working directory (that is the “C:/Users/btayo/Desktop/grade_classifier” directory in this example) and click on the downloaded “grades.csv” file in your local directory, you will notice that the file has been downloaded in html format. This is definitely not the correct format. Because we are downloading a csv file, we want the content type to be set to text/plain, not text/html.
To download the csv file in the text/plain format, navigate to the github repository: https://github.com/bot13956/datasets
Then click on the csv file: “introduction_to_physics_grades.csv”
Then click on the Raw button on the top right. This should open the file as a csv file.
Now copy the URL on this page: https://raw.githubusercontent.com/bot13956/datasets/master/introduction_to_physics_grades.csv
This is the URL that you should use as argument in the download.file() function. The correct code is thus:
Note that once this command is issued, the following messages are produced:
Content type ‘text/plain; charset=utf-8’ length 9562 bytes
downloaded 9562 bytes
This shows that the file has been downloaded in the correct format with content type set to text/plain.
To view and analyze the data contained in the downloaded “grades.csv” file, you may use the following commands:
Method 2: Using the read.csv() function in R
We can use the read.csv() function to read the data directly into our workspace and assign it to a new dataframe object using the following command:
There are so many different ways of downloading datasets from the internet. I think the functions download.file() and read.csv() provide a very simple and straightforward way to download datasets from the internet.