Learn Data Analysis with Python: A Case Study

Article Summary

Data analysis with Python involves loading, exploring, and visualizing data using libraries like Pandas and Seaborn. This article walks through a real-world case study using these tools to generate summary statistics, histograms, and correlation heatmaps. You'll gain a practical foundation in data analysis with Python you can build on.

In this post, I’ll do a brief tutorial on Python for data analysis. Analyzing data can be an enjoyable, well-paying job. Payscale estimates show that entry-level positions have an average salary above USD $60k. However, there are hard-skill requirements that you must meet, such as knowledge of SQL or a data analysis programming language such as Python or R.

To illustrate using Python for data analysis, we’ll be studying infidelity in marriages, which should be an interesting project. Let’s get started!

How to begin a data analysis project using Python

First, we need a software called an integrated development environment (IDE) to perform analysis with Python. My choice is Google Colaboratory, which I feel provides the most frictionless experience. You can use it directly from your Google Drive. In the video below, I walk you through how to set up a Google Colab, and also walk you through the tutorial

Now that we’re set up, we need tools. In Python, these are called libraries. Here, we will add Pandas and Seaborn, which Python will use to manipulate, analyze, and visualize the data.

Pandas is an open-source library that assists Python in manipulating and analyzing data.
Seaborn is a library that Python uses to create meaningful data visualizations.

Recommended Udemy course

Data Mining for Business Analytics & Data Analysis in Python

Diogo Alves de Resende

We want to let Python know we’re going to use these data analysis libraries. To prepare data for analysis, here’s what we do:

#import libraries
import pandas as pd
import seaborn as sns

In Python, it’s usual to add “as something” when you import the library. This makes the code less lengthy when you call the libraries.

Next, we need data. For this tutorial, I made the data available online, which you can get by running the following code:

#load data
data = pd.read_csv("https://bit.ly/udemy_dataset")

Additionally, we always like to have a look at the data, and we can have a look at the first 5 rows by doing a very simple command. Below is a snippet of the data set:

#Looking at the data
data.head()

We have a lot of information here, and some that’s not obvious at first. For instance, why is occupation ranked from 1 to 7? The original study that created this data set has an extensive description of each variable. However, it’s a data analyst’s role to master the data. Here’s an overview of the variables:

Affairs: How often respondents engaged in infidelity during the past year. 0 = none, 1 = once, 2 = twice, 3 = three times, 7 = 4 – 10 times, 12 = monthly or more.
Gender: The person’s gender.
Age: How old the person is.
Yearsmarried: How many years the person has been married.
Children: Whether or not the person has children.
Religiousness: How religious the person says they are, on a scale of 1 to 5, with 5 being very religious.
Education: Level of education. 9 = grade school; 12 = high school; 14 = some college, 16 = college; 17 = some graduate work; 18 = master’s degree; 20 = PhD or MD.
Occupation: The person’s job. 7 = Physician or CEO of large company; 6 = professional with advanced degree; 5 = managerial administrative, business; 4 = teacher, counselor, social worker, or nurse; 3 = white-collar, like sales or clerical; 2 = farming, semi-skilled or unskilled worker; 1 = student.
Rating: How high the person rates their satisfaction in their marriage on a scale of 1 to 5, with 5 being very satisfied

Top courses in Data Analysis

Complete Data Analyst Bootcamp From Basics To Advanced Krish Naik, Jayant Topnani, KRISHAI Technologies Private Limited

Microsoft Excel: Advanced Excel Formulas & Functions Maven Analytics • 1,500,000 Learners, Chris Dutton

Complete Introduction to Looker Studio (Google Data Studio) Ian Littlejohn

Data Analyst: Professional Certificate in Data Analysis MTF Institute of Management, Technology and Finance

Data Analysis with Pandas and Python [2026] Boris Paskhaver

Data Analysis | SQL,Tableau,Power BI & Excel | Real Projects Graeme Gordon

Data Analytics Masters 2026 – From Basics To Advanced Dr. Satyajit Pattnaik, Satyajit Pattnaik

Reporting, Analytics With Excel+Macro+SQL+Power BI (43Hours) Shishir Kumar

The Complete Guide to Stata F. Buscha

Certified Data Analyst Foundations Course Data Science Academy, School of AI

Data Analytics From Beginner To Advanced Samuel Okon

Data Analysis for Beginners Matthew Walcott

Now we analyze the data using Python’s libraries

One of the most common data analysis commands is to look at the summary statistics of the numeric variables. Luckily, this is very simple.

#Summary Statistics
data.describe()

This is loads of statistics and information. We have the number of observations per variable (count), the mean (mean), standard deviation (std), the minimum (min), the maximum (max), and the values for the quartiles (25%, 50%, and 75%). The 50% quartile would also be called the median. Here are some examples of what we see:

We have 601 observations.
The average person has had 1.5 affairs, is 32.5 years old, has been married for 8 years, rates 3 in religiousness, studied 16 years, and rates his or her marriage a 3.9 out of 5. I omitted occupation because it does not make sense to look at the average.
Specifically to affairs, we see that even at 75%, the value is 0, which means that only a small percentage of the sample admits to cheating on their spouse. This gives me a new idea to analyze!

Visualizing the data with histograms

If you want to see the distribution of affairs in the sample, the simplest way is to do a histogram using the Seaborn library. A histogram is a helpful graph that organizes data into buckets or ranges, specified by the creator if required.

#Histogram
sns.histplot(data = data.affairs)

Because our data has grouped some instances, the insights we get are not extensive. We confirm that most people do not have affairs, but we cannot really see a pattern in those who are unfaithful. So, it is time to move on.

Recommended Udemy courses

Econometrics and Statistics for Business in R & Python Diogo Alves de Resende

10 Days: Prompt Engineering, Generative AI and Data Science Diogo Alves de Resende

Master Python for Data Analysis and Business Analytics 2026 Diogo Alves de Resende

Forecasting Models & Time Series Analysis for Business in R Diogo Alves de Resende

Data Mining for Business Analytics & Data Analysis in Python Diogo Alves de Resende

Master Time Series Analysis and Forecasting with Python 2026 Diogo Alves de Resende

Data Literacy and Business Analytics for Business Leaders Diogo Alves de Resende

Statistics for Business Analytics: Data Analysis with Excel Diogo Alves de Resende

Decision Making with Problem Solving & Critical Thinking Diogo Alves de Resende

RAG, AI Agents and Generative AI with Python and OpenAI 2026 Diogo Alves de Resende

AI Agents For All! Build No-Code AI Agents & Master AI 2026 Diogo Alves de Resende

AI at Work: What Every Employee Needs to Know About AI Diogo Alves de Resende

What correlates with affairs?

The correlation measures the strength of a relationship between 2 variables. The value of the correlation metric varies between -1 and 1. If the value is 1, the relationship is positively strong. As the value of one variable increases, the value of the other variable also increases. If it is -1, this indicates that as the value of one variable increases, the value of the other variable decreases.

An example of a robust positive relationship would be people using umbrellas when it’s raining. An example of a strong negative relationship would be people using umbrellas on a sunny day. If the correlation is zero, then it means there is no relationship. An example of zero correlation would be eating chocolate and swimming in the pool. There is no connection between the two—though the idea does not seem bad at all!

To do a good correlation, it’s wise to choose the variables first, and then perform the analysis. Here I’ll choose affairs, age, religiousness, and rating since I am a fan of simple statistics.

Because I like to visualize the correlation using a heatmap, we are going to learn how to do one. A heatmap is particularly useful for those who visualize with colors since the colors change with the values.

#Picking variables
data_correlation = data[["affairs", "age", "religiousness", "rating"]]

#Correlation heatmap - the command for a correlation is corr()
sns.heatmap(data = data_correlation.corr(),
            annot = True,
            fmt = '.2g',
            center = 0,
            cmap = 'coolwarm',
            linewidths = 1,
            linecolor = 'black')

There is quite a bit of information here. Affairs correlates negatively with rating. This means that happier marriages lead to less infidelity, which makes sense to me. It’s also interesting that religiousness also has a negative correlation with affairs. Finally, age and rating also have a negative correlation, which would hint towards love not increasing with age.

Python makes data analysis fun

The possibilities are endless. We could test whether there are statistically significant differences in having affairs between genders. Additionally, we could also see whether having children hinders or magnifies the likelihood of people cheating on their partners. However, I’ll stop here. Using Python for data analysis is fun and easy, but you do need to practice. I hope you start with this one and build from there. If you’re serious about becoming a data analyst, I highly recommend this post on how to become a data analyst from scratch.

Diogo Alves de Resende

Udemy Instructor

View instructor profile

4.5 instructor rating

11,789 reviews

77,480 students

16 courses

Diogo works for one of the biggest e-commerce companies, focusing on business analytics, and it is the methodologies he has used in his job that he wants to teach.

In the past, he has done radio as an anchor in his hometown, which is probably his favorite college experience. He has also worked with the United Nations on the topic of Mobile Money in Lesotho.

Learn Data Analysis with Python: A Case Study

Article Summary

How to begin a data analysis project using Python

Recommended Udemy course

Top courses in Data Analysis

Now we analyze the data using Python’s libraries

Visualizing the data with histograms

Recommended Udemy courses

What correlates with affairs?

Python makes data analysis fun

Diogo Alves de Resende

Share article: