Jitesh Khurkhuriya

Before you decide to become a data scientist, a lot of questions might be on your mind. Is it a great career option? Can I learn data science on my own? Is it worth all the investment and time? I will try and answer all of your questions.

What is data science?

Wikipedia defines data science as,

“A concept to unify statistics, data analysis, machine learning, domain knowledge, and their related methods to understand and analyze actual phenomena with data. It uses techniques and theories drawn from many fields within the context of Mathematics, Statistics, Computer Science, domain knowledge, and information science.”

Data Science 2020 : Complete Data Science & Machine Learning

Last Updated September 2020

  • 280 lectures
  • All Levels
4.6 (1,223)

Machine Learning A-Z, Data Science, Python for Machine Learning, Math for Machine Learning, Statistics for Data Science | By Jitesh Khurkhuriya, Python, Data Science & Machine Learning A-Z Team

Explore Course

Is Data Science a good career?

Data science is a great career option. Data scientists make an average of $113,000 to $140,000 per year. Learning data science can help you work in this rewarding field. Understanding some of these skills on your own may sound hard and sometimes can be expensive.

Is it worth all the investment?

To avoid the expense and time of top university courses, you can learn data science on your own. The biggest challenge in self-learning is not knowing what to learn, how much to learn, and the sequence of topics to learn. So what should you do?

 

5 skills you need to become a data scientist

Apart from the skills of subject matter expertise, which can be different for every individual, let us see why we need to learn the following skills.

Most algorithms in machine learning will be expressed in mathematical terms. Hence, it’s an absolute must to familiarize yourself with and brush up on some of the basic concepts of exponents, logarithms, polynomial equations, factoring, quadratic equations, and functions.

Basic probability knowledge like various terms used in probability, conditional probability, random processes, and random variable is good enough.

Descriptive statistics help us get some simple but very crucial aspects of data. It is a more straightforward interpretation of the data utilizing central tendency of the data where we identify one central value around which the majority of the data can be located as well as by using the measure of dispersion, which helps us understand the spread of the data.

As the name suggests, inferential statistics help us draw inferences or conclusions about the entire data based on samples. Some of the key concepts are probability distributions, bell curve or normal distribution, central limit theorem, confidence interval, and hypothesis testing.

Inferential statistics create probability distributions to draw inferences about the data. So it is absolutely essential to have some basic understanding of the probability terms.

Sometimes, a simple visualization of the data can help us draw inferences or identify the data patterns. Some of the basic plots that can help to visualize the data are Histogram, Bar Chart, Line Plots, Scatter plots, and Boxplots. Focus on drawing the charts with default parameters first and then progress on customization.

Qualitative data is in the form of alphanumeric or text form, like the color of the car, gender, marital status, and so on. Like numerical features, the graphical visualization of qualitative data can help us identify the similarity or various relationships among different data elements.

Get a basic understanding of all the data types with particular emphasis on string, numeric, and list types of variables. Get some good hands-on for if-else, for loop, and while loops. Remember, you don’t need to be an expert in Python programming. You should know the basics needed for data science and machine learning. One of the biggest strengths that Python has over other languages is the availability of various modules and functions needed for data science. It would help if you focused on learning multiple math functions, date module, string functions, and methods, particularly the length, slicing and indexing, split, strip, and list functions and methods for search, length, and how to handle multidimensional lists.

Machine learning is extremely efficient, easy to learn, and versatile in processing all kinds of data. Pandas is a must-have tool in every data scientist’s tool kit. It is important to read the data from different types of sources. You may have to convert various texts to numerical data and need to split the data into train and test. Also, learn modules for splitting the data into train and test.

One of the first models everyone learns during their machine learning journey is regression. Regression helps us understand the relationship between different types of variables. Regression is used for predicting numerical values like what will be the future price of a stock or what will be the sale in the next quarter.

Like regression analysis, classification methods are popular for predicting categorical outcomes. We can predict outcomes like, “Will this customer buy my product? or “Will this customer default on the loan repayments?” Learning logistic regression, decision trees, and support vector machines are significant for solving classification problems.

Deep learning is one of the most important topics to learn in data science and machine learning. With deep learning, one can process a large number of features. Also, the ability to create large neural networks increases accuracy. The most significant advantage is its ability to learn features very incrementally. That reduces the need for domain expertise.

So how do you do it?

Use the chart below for guidance on what to learn and how much time to spend on it. 

SectionTopicSubtopic/Library/ModuleMin Time (hrs)Max Time (hrs)
MathematicsBasic AlgebraBasic Concepts of Exponents, Log, Polynomials, Quadratic equations, and Functions12
CalculusRate of Change
Limits of a function
Derivative
Partial Derivative
12
Linear AlgebraVectors
Matrix
Vector Transformation
Eigen Vectors and Eigen Values
46
ProbabilityBasic terms of Probability
Conditional Probability
Random Processes
Random Variables
12
StatisticsDescriptive StatisticsCentral Tendency of Data
Measure of Dispersion
Correlation among variables
24
Inferential StatisticsProbability Distributions
Normal Distributions
Central Limit Theorem
Confidence Interval
Hypothesis Testing
1216
Data VisualizationCharts for Numerical DataMatplotlib Library
Scatter Plot
Line Plot
Histogram
Bar Chart
Box Plot
24
Charts for Categorical DataMatplotlib Library
Histogram
Pie Charts
12
Chart CustomizationMatplotlib Library
Figures
Subplots
Editing chart elements
24
Python ProgrammingData TypesString
Integer and Float
List
Tuples
Dictionary
12
Control FlowIf-Else
For Loops
While Loops
24
File ProcessingProcessing of various file types like csv, tsv and text files24
Modules and FunctionsMath
Date
String functions of Split, Strip
List sort, len
46
Machine LearningData ProcessingRead Dataset using Pandas
Access Data
Check for and replace missing values
Convert categorical to numeric
scikit learn preprocessing
scikit learn model_selection.train_test_split
1620
Regressionscikit learn linear_model.LinearRegression
scikit learn preprocessing.PolynomialFeatures
1216
Classificationscikit learn linear_model.LogisticRegression
scikit learn svm.SVC
scikit learn tree.DecisionTreeClassifier
scikit learn ensemble.RandomForestClassifier
1620
Feature Selectionscikit learn feature_selection.RFE
scikit learn feature_selection.GenericUnivariateSelect
1216
Model Tuning and Model Selectionscikit learn model_selection.cross_val_score
scikit learn model_selection.GridSearchCV
scikit learn model_selection.RandomizedSearchCV
1620
Deep LearningKeras Model Building
Layers
Activation Functions
Loss Functions
Optimization
Initializers
Compile the Keras Neural Network
2432
Practice ProjectsProject 1Boston House Price Predictions816
Project 2Bike Demand Predictions1624
Project 3Automobile Price Predictions816
Project 4Iris Species Classification48
Project 5Pima Indians Diabetes Classification48
Project 6Wine Quality Predictions48
Project 7Bank Telemarketing816
Approximate Total Hours183278
Total weeks with 20 hours per week9 Weeks12 Weeks
Total weeks with 40-50 hours per week4 Weeks6 Weeks

How deep should you go with each topic?

Within 4 to 12 weeks, you would have acquired enough skills to start your journey in the field of data science. Enter the dates in the plan shared earlier and get started. Let’s build some positive pressure. So, don’t forget to print it and pin it. My course on Udemy helps you understand each of these topics in detail and will help you get started on your data science journey with great confidence. Learn how to make your own data science portfolio in this blog article.

Page Last Updated: November 2020

Top courses in Data Science

How to Start a Career in Data Science 2020
365 Careers, Ken Jee
4.7 (141)
Bestseller
The Data Science Course 2020: Complete Data Science Bootcamp
365 Careers, 365 Careers Team
4.5 (80,295)
Bestseller
R Programming A-Z™: R For Data Science With Real Exercises!
Kirill Eremenko, SuperDataScience Team
4.6 (36,646)
Bestseller
Statistics for Data Science and Business Analysis
365 Careers, 365 Careers Team
4.5 (18,567)
Bestseller
Complete Machine Learning and Data Science: Zero to Mastery
Andrei Neagoie, Daniel Bourke
4.6 (5,446)
Complete 2020 Data Science & Machine Learning Bootcamp
Philipp Muellauer, Dr. Angela Yu
4.6 (2,474)
Machine Learning Practical Workout | 8 Real-World Projects
Dr. Ryan Ahmed, Ph.D., MBA, Kirill Eremenko, Hadelin de Ponteves, SuperDataScience Team, Mitchell Bouchard
4.5 (735)
Data Science 2020 : Complete Data Science & Machine Learning
Jitesh Khurkhuriya, Python, Data Science & Machine Learning A-Z Team
4.6 (1,244)

More Data Science Courses

Data Science students also learn

Empower your team. Lead the industry.

Get a subscription to a library of online courses and digital learning tools for your organization with Udemy for Business.

Request a demo

Courses by Jitesh Khurkhuriya

A-Z Machine Learning using Azure Machine Learning (AzureML)
Jitesh Khurkhuriya, Python, Data Science & Machine Learning A-Z Team
4.4 (2,993)
Bestseller
Data Science 2020 : Complete Data Science & Machine Learning
Jitesh Khurkhuriya, Python, Data Science & Machine Learning A-Z Team
4.6 (1,244)
Machine Learning + Microsoft Azure Cognitive Services AI-900
Jitesh Khurkhuriya, Ankit Mistry
4.1 (33)
Machine Learning 2020: Complete Maths for Machine Learning
Jitesh Khurkhuriya, Python, Data Science & Machine Learning A-Z Team
4.2 (66)
Python Bootcamp 2020:Complete Python Programming Masterclass
Jitesh Khurkhuriya, Python, Data Science & Machine Learning A-Z Team
4.9 (17)
Hot & New

Courses by Jitesh Khurkhuriya