Use functions such as download.file(), read.csv() and pd.read_csv() to read a CSV file from the internet directly into your R or Python code

Image for post
Image for post
Image by Benjamin O. Tayo

Introduction

Before performing any analysis on data, we need to have a reliable source of data in the first place. One reliable source of data is internet data. There are so many websites where one can obtain datasets for analysis or model building. Data comes in different formats such as numerical data, text data, voice data, image data or video data. In this article, we shall focus on numerical data stored in a comma separated values (CSV) file format.

Some examples of free datasets stored in CSV files that could be downloaded for analysis include the following:

a) University of California Irvine (UCI) Machine Learning Repository

UCI currently maintains 487 datasets as a service to the machine learning community that could be used for data analysis practice, homework and projects in data science courses and workshops. …


Image for post
Image for post
Photo by Barn Images on Unsplash

As a data scientist, you can only perform tasks which you have the right tools for

I. Introduction

Data science is a very broad multi-disciplinary field that includes several subdivisions such as data visualization, machine learning, and artificial intelligence. Due to the broadness of the field and because data science is constantly changing due to technological innovations and the development of new algorithms, a successful data scientist has to maintain a big and updated toolbox at all times. Keep in mind that as a data scientist, you can only perform tasks which you have the right tools for. This article will discuss several tools that one can include in their data science toolbox.

II. Knowledge-based Tools

Knowledge-based tools can be grouped into three main categories based on the level of data science tasks involved: level 1 (basic level); level 2 (intermediate level); and level 3 (advanced level). …


Image for post
Image for post
Photo by Arno Smit on Unsplash

Data Science

Intensive variables tell us much more about a system than extensive variables.

I. Introduction

In physics, an extensive variable is one that depends on system size (like mass or volume). On the other hand, an intensive variable is one which does not depend on system size (like temperature, pressure, or density). While it may not be immediately obvious, intensive variables tell us much more about the system than extensive variables.

Comparing features based on an extensive scale is called absolute comparison. Likewise, comparing features based on an intensive scale is called relative comparison.

To illustrate the difference between extensive and intensive variables, let us consider two hypothetical players in the National Basketball Association (NBA) league. …

About

Benjamin Obi Tayo Ph.D.

Physicist, Data Science Educator, Writer. Interests: Data Science, Machine Learning, AI, Python & R, Predictive Analytics, Materials Sciences, Biophysics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store