So you like the sound of working with data and doing science, but what is data science? My favorite approach to answering this question is to focus on the responsibilities you’d have when holding a particular title at an organization.

## What is data science?

We define data science best as a process, not a thing. When the business community began to be more data-driven, data professionals became in high demand. The increase in data collection required more frequent data updates and a wider variety of data being processed. This demanded a transparent iterative process that required various professionals, which in turn led to the creation of the data scientist and data engineer position. Briefly, let’s consider four roles within data science to help us answer what data science is.

### What is a data scientist?

At the top of the heap is the data scientist. Data scientists work on specific and complex problems, with innovation being the goal. Like it is with traditional scientists, data scientists also use the scientific method. For example, they may develop a new data modeling process. This kind of innovation often entails research and interdisciplinary knowledge. These qualities are what make “scientist” a fair and accurate title.

Data Science 101: Methodology, Python, and Essential Math

Last Updated April 2022

• 185 lectures
• Beginner Level
4.1 (9)

From data science methodology, to an introduction to data science in Python, to essential math for data science. | By Ermin Dedic

Additionally, data scientists will be in charge of building and prototyping models resulting from the new processes. So, if you like the idea of offering the highest possible value through actionable insights, this may be the role for you. On the whole, a computer science background is advantageous, and math skills are necessary.

Example Project: Build a model to predict which bank transactions are fraudulent in real-time.

Skills Required: Programming languages (Python, SAS, R, SQL), math (linear algebra, probability, and statistics), and machine learning (i.e., deep learning).

### What is a business analyst?

Example Project: Figure out the extent to which debt will rise with the current downturn in the economy.

Skills Required: Programming languages (SQL), data visualization (Tableau, Power BI), and domain expertise.

### What is a data analyst?

The bridge between the data scientist and business analyst is the data analyst. The bridge is sometimes needed because the business analyst might lack the technical skills to communicate with other technical roles effectively and may not have the skills to fetch the necessary data for analysis. Accordingly, the data analyst focuses on the data itself (data architecture, data cleaning, and troubleshooting). In contrast, the business analyst’s scope is on business metrics data and using that data to make business decisions. Some companies separate data analysis from data science. Others consider them the same field. For details on the differences, visit Data Science Vs. Data Analytics.

Example Project: Exploratory data analysis (EDA)

Skills Required: Programming languages (SQL, Python, or R), data visualization (Tableau, Power BI), and data integrity.

### What is a data engineer?

The data engineer is the implementer. Like a typical engineer, if the data scientist develops a process, the data engineer needs to build and implement it. To illustrate, you might create a machine learning pipeline, and the data engineer will have the necessary skills to build the pipeline according to specifications.

Example Project: Define the structure of a machine learning pipeline by indicating preprocessing steps for each selected feature.

Skills Required: Programming languages (Python, Scala, SQL), NoSQL(MongoDB), frameworks (Apache Spark, Hadoop).

## What is data?

You have seen the word “data” a lot thus far, but it’s not a word many of us consciously try to define in daily life. Nevertheless, it can help us answer the critical question: what is data science? Additionally, data can mean slightly different things based on the context, but we will focus on its relevance to data science.

Data can be regarded as pure facts that, when analyzed, can produce valuable information or insights. It comes in different forms, such as a number, value (i.e., a salary), or social media posts (text).

Data is first collected. All companies keep basic information about their customers, but many organizations store many additional variables. For example, a bank will ask for your name, address, yearly income, and job status to name some data they may have on you.

But where is the data stored? Commonly, in relational (Microsoft SQL Server) and non-relational databases (MongoDB).

Relational databases look like an Excel spreadsheet, where data is in a row-column structure. Non-relational databases look similar to JSON files (MongoDB Example).

Finally, unstructured data (data sourced from PDF files or text files) requires particular expertise to prepare and analyze. These additional skills are another aspect that makes data scientists and data analysts indispensable to an organization.

### Can I expect this clear delimitation between positions in the real world?

Not quite. The roles will vary by company as the size and data maturity of the organization are essential factors in determining the needs. The industry is also a relevant factor. Likewise, you can often solve the same problem through different approaches. Let’s consider an example where the problem is the same, but you can choose two unique paths.

### Example scenario: A company named Data Solutions Corp. needs a marketing strategy

Ignoring the size and data maturity concern, for now, Data Solutions Corp. could address this scenario in at least two ways.

#### 1st option

Hire a data scientist who will create new processes based on the research and then develop a model with a tailored marketing strategy for each customer. As a result, your marketing strategy will be unique for each customer.

Disadvantage: The solution is vastly more expensive and will take longer to build.

#### 2nd option

Hire a business analyst to use a simple, already-existing machine-learning algorithm to segment the customer base into various clusters. You can then develop unique marketing strategies to target each group/cluster.

Disadvantage: The solution is not as personalized as the first option.

Depending on the company’s specific situation, either decision can make great sense and produce positive outcomes.

## More examples of what data scientists do

Okay, so what else does a data scientist do? Previously, I mentioned that a data scientist is an innovator, can develop new modeling processes, and develops machine learning models, but what else?

In the deep learning realm, one fascinating application is the Tree of Knowledge. I studied psychology for six years, and this is an entirely novel way to understand complex human behavior. The approach allows us to bypass previous restrictions like only learning from one kind of study at a time.

Data scientists also develop recommender systems. When you buy a book on Amazon or a course on Udemy, you get recommendations of other books or other courses you might like. This is the work of a data scientist. All the top companies like YouTube, Amazon, and Udemy have one, and it is a critical aspect of their business success. If they only recommended things that were of no interest to you, they would lose out on a lot of business.

In computer vision, you might develop an object recognition/detection system. Object recognition is one vital ingredient that makes autonomous cars possible.

In natural language processing, chatbots gauge customers’ intent and can address simple customer service problems, leaving human employees to work on more complex customer service issues. If you have a Gmail account, you will also notice that they started characterizing emails into three categories (Primary, Social, and Promotions). Both of these are examples of projects that data scientists work on and that have immense value.

These and many other applications are available to work on should you have the passion and motivation to get the requisite skills. For a more in-depth view of what data scientists work on every day, check out this list of Data Science Projects for beginners and advanced learners.

## How to become a data scientist?

So now you know what data science is. Here are some tips on becoming a data scientist:

• Enroll in at least one comprehensive online course and complete 100% of it.
• Learn by doing. Enrolling in three courses but doing more practice might be better than enrolling in four classes but doing less practice.
• Focus on your weaknesses at the very beginning. If math is easy for you or you have a math-heavy degree, focus on programming languages and soft skills. This approach is helpful because it can quickly tell you if you have the motivation or interest in obtaining all the data scientist skills. (i.e., You love math but have no interest in programming.)
• Find peers who are already working on projects or problems you have an interest in solving. Meeting people with similar passions is possible through Linkedin or Kaggle, but be respectful when reaching out.
• Memorizing is not essential. It is common for data professionals to Google the name of a function, for instance.
• Be realistic. When thought of as a catch-all term, as in this article, data science is about a lot more than machine learning. Machine learning is essential, but if you end up in an analytics role, you will be doing a lot of fantastic investigative work, but it won’t be machine learning a lot of the time.

For more details on becoming a data scientist, visit Why and How to Learn Data Science and How to Become a Data Scientist.

Page Last Updated: November 2021

### Top courses in Data Science

Python for Data Science and Machine Learning Bootcamp
Jose Portilla
4.6 (126,518)
Bestseller
Machine Learning A-Z™: Python & R in Data Science [2023]
Kirill Eremenko, Hadelin de Ponteves, Ligency I Team, Ligency Team
4.5 (165,788)
Statistics for Data Science and Business Analysis
365 Careers, 365 Careers Team
4.6 (35,957)
Bestseller
Intro to Data Science: Your Step-by-Step Guide To Starting
Kirill Eremenko, Hadelin de Ponteves, Ligency I Team, Ligency Team
4.4 (4,631)
The Data Science Course 2023: Complete Data Science Bootcamp
365 Careers, 365 Careers Team
4.6 (118,964)
Bestseller
Complete Machine Learning & Data Science Bootcamp 2023
Andrei Neagoie, Daniel Bourke, Zero To Mastery
4.6 (14,350)
Data Science for Business | 6 Real-world Case Studies
Dr. Ryan Ahmed, Ph.D., MBA, Ligency I Team, Mitchell Bouchard, Stemplicity Q&A Support, Ligency Team
4.4 (1,160)
Bestseller
Data science with R: tidyverse
Marko Intihar
4.6 (208)
One Week of Data Science in Python - New 2023!
Dr. Ryan Ahmed, Ph.D., MBA, Ligency I Team, Ligency Team
4.7 (98)
Math for Data Science Masterclass
Jose Portilla, Krista King
4.6 (390)