“*Everything should be made as simple as possible, but not simpler.”* **Albert Einstein**

A machine learning algorithm (such as classification, clustering or regression) uses a training dataset to determine weight factors that can be applied to unseen data for predictive purposes. Before implementing a machine learning algorithm, it is necessary to select only relevant features in the training dataset. The process of transforming a dataset in order to select only relevant features necessary for training is called dimensionality reduction. Feature selection and dimensionality reduction are important because of three main reasons:

**Prevents Overfitting**: A high-dimensional dataset having too many features…

This article will discuss the difference between model parameters and hyperparameters. In a machine learning model, there are 2 types of parameters:

**Model Parameters:**These are the parameters in the model that must be determined using the training data set. These are the fitted parameters.**Hyperparameters:**These are adjustable parameters that must be tuned in order to obtain a model with optimal performance.

For example, suppose you want to build a simple linear regression model using an m-dimensional training data set. Then your model can be written as:

This article will discuss 10 Essential Skills You Need to Know to Start Doing Data Science. The list is not all-encompassing, but it gives the basic skills necessary for launching your data science career.

Statistics and Probability is used for visualization of features, data preprocessing, feature transformation, data imputation, dimensionality reduction, feature engineering, model evaluation, etc. Here are the topics you need to be familiar with:

a) Mean

b) Median

c) Mode

d) Standard deviation/variance

e) Correlation coefficient and the covariance matrix

f) Probability distributions (Binomial, Poisson, Normal)

g) p-value

h) MSE (mean square error)

i) R2 Score

j) Baye’s…

One way to apply your knowledge in data science is through projects. In this article, we present 4 problems for data science practice. These problems are good for beginners who are looking for challenge problems to hone their skills in data science.

Solve these problems using either R or Python. For questions, inquiries, or sample solutions, please email me at ** benjaminobi@gmail.com**.

**Problem 1 (Data Visualization — Basic)**: Use the gdp.csv dataset to generate a barplot to display the 2016 GDP (Gross Domestic Product) for the selected countries. Rank the countries from lowest to highest GDP. …

*(I) Statistics and Probability*

Statistics and Probability is used for visualization of features, data preprocessing, feature transformation, data imputation, dimensionality reduction, feature engineering, model evaluation, etc. Here are the topics you need to be familiar with:

a) Mean

b) Median

c) Mode

d) Standard deviation/variance

e) Correlation coefficient and the covariance matrix

f) Probability distributions (Binomial, Poisson, Normal)

g) p-value

h) MSE (mean square error)

i) R2 Score

j) Baye’s Theorem (Precision, Recall, Positive Predictive Value, Negative Predictive Value, Confusion Matrix, ROC Curve)

k) A/B Testing

l) Monte Carlo Simulation

*(II) Multivariable Calculus*

Most machine learning models are built with…

In a series of articles, we will examine several soft skills essential for success in data science. The topics to be covered in the series are:

**Essential Data Science Soft Skills — Growth Mindset****Essential Data Science Soft Skills — Team Player Skills****Essential Data Science Soft Skills — Communication Skills****Essential Data Science Soft Skills — Patience**

This article will focus on Growth Mindset Skill.

Whether you are a student in training or a practitioner, to be a successful data science professional, you need to cultivate a growth mindset. This is essential because the field of data science is…

The field of data science is considered to be among the hottest fields, and more and more people are attracted to the field due to the increasing demand for data science professionals. Even though the word “data science” has recently gained a lot of popularity, the field of data science itself is not entirely new. Data science techniques have been used in several branches of social, applied, and pure sciences such as mathematics, physics, economics, statistics, finance, engineering, business, etc. For example, statistical and data science techniques have been used in quantitative finance for analyzing the performance of the U…

Mathematics remains a major hindrance for beginners trying to get into data science. Most beginners interested in getting into the field of data science are always concerned about the math requirements. Data science is a very quantitative field that requires advanced mathematics. But to get started, you only need to master a few math topics. In this series of articles, we will dive deep and discuss the essential math topics that must be reviewed before embarking on a data science journey. The topics to be covered in the series are:

**Functions****Plotting and Data Visualization****Linear Regression****Statistics and Probability****…**

*Disclaimer*: *This article is meant to share some basic knowledge about personal finance and wealth building, and in no way should be considered as investment advice.*

Everyone is being told that hand-picking stocks is very risky, and investing in the entire market such as in index funds that mimic the total market is the best way to mitigate risk while ensuring a pretty descent return (historically the average return of the SP 500 is in the ~7 to 12% annually) . …

*The challenge of implementing machine learning models without a good understanding of the underlying math and programming skills may lead to a blackbox approach in data science training.*

Data Science, Machine Learning, and Analytics are considered to be among the hottest career paths. The demand for skilled data science practitioners in industry, academia, and the government is rapidly growing. This has given rise to a proliferation of massive open online courses (MOOCs) covering different areas of data science and machine learning. The most popular providers of MOOCs include the following:

a) **edx**: https://www.edx.org/

b) **Coursera**: https://www.coursera.org/

c) **DataCamp**: https://www.datacamp.com/