Python CSV: How to Read and Write CSV Files in Python
To retrieve and store the information your application uses, it takes more than just your programming language, keyboard, and console. If you are using Python, you have a few options, including the Python CSV module that will allow you to store the data your app uses and processes as a CSV (Comma Separated Values) file. While many applications use a database, using a CSV file might be better at times, like when you are generating data for another application that is expecting this format. In this article, you’ll learn how to read and write the data in a CSV file with Python in two different ways: using the Python CSV module and using the third-party Pandas library.
What is a CSV file?
A CSV file is a plain text file used for storing tabular data. Because it is plain text, you can open a CSV file in a text editor to edit it. You can also open CSV files in Microsoft Excel and other spreadsheet applications to see the data displayed in a tabular format.
The name of a CSV file hints at what it looks like (i.e., Comma Separated Values). Here is the contents of an example CSV file:
name,city,state,birthday month Bob,Kansas City,MO,March Jane,St. Louis,MO,January Rick,New York,NY,April |
Last Updated July 2023
Learn Python like a Professional Start from the basics and go all the way to creating your own applications and games | By Jose Portilla
Explore CourseAnd here is how it looks when you open up this file in Excel:
Notice that a line break separates each data set, and the first line contains the headings for each column. Commas then separate the data on each line.
In standard CSV format, the character to separate each value, also known as the delimiter, is a comma, but not all CSV files use a comma as a delimiter. Sometimes, a tab, semi-colon, or colon is used, especially if commas are in the data.
The data values can also be surrounded by single quotes or double quotes, which allows you to use commas in the data value itself. Here is an example CSV line where this would be useful:
“Bob Smith”,”Kansas City”, MO” |
Sometimes CSV files use escape characters, which basically tell the code processing the file that the character after the escape character should be processed as normal text and not as a delimiter or a quote surrounding a value. Here is an example of using escape characters in the line of a CSV file:
“Bob”, “I say, \”Let’s go now!\”” |
Here, a backslash (\) is used as an escape character, which means the parser will understand to ignore the double-quote right after it and consider it as text. Here is how the second value in this line will look when processed:
I say, “Let’s go now!” |
Note that the comma does not need an escape character in this example because double quotes surround the value.
All types of applications use CSV files, including spreadsheets and databases, for ingesting data and transferring data from one application to another. Since they are in plain text, they are easy to manipulate programmatically, and any programming language with text processing capabilities can process them.
Handling CSV files with Python CSV
The Python programming language ships with a CSV module that allows for reading and writing CSV files without installing a third-party library. By default, it is designed to handle the CSV file format generated by Microsoft Excel, but it can be configured to read and write any type of CSV file.
Reading CSV files using the CSV module
To parse CSV data, you can use the CSV module’s reader object. Let’s say we have the first CSV example with the name, city, state, and birthday month values in a file called names.csv. Here is a Python code example we can use to parse the file:
# Import the csv module import csv # Use Python’s built-in open function which opens file objects with open(‘names.csv’) as csv_file: # Create a CSV Reader object reader = csv.reader(csv_file) # Set a line count so we know which line has the headings line_count = 0 for row in reader: # Print out the headings from the first line of the file if line_count == 0: print(f’The column headings are {“, “.join(row)}’) line_count += 1 # Print out the data from the remaining lines else: print(f'{row[0]} lives in {row[1]},{row[2]}, and was born in {row[3]}.’) line_count += 1 print(f’CSV File contains {line_count} lines.’) |
When this Python script runs, this is what it prints out:
The column headings are name, city, state, birthday month Bob lives in Kansas City,MO, and was born in March. Jane lives in St. Louis,MO, and was born in January. Rick lives in New York,NY, and was born in April. CSV File contains 4 lines. |
The reader object returns rows, which are a list of string items, and since the first row contains the heading for each column of data, it’s processed differently. The comments in the example code explain more of what is happening in detail.
Reading CSV files into a dictionary with the CSV module
The last method we used for parsing the CSV file worked, but it required dealing with lists of strings. A cleaner way of parsing the CSV is reading the data into Python dictionaries. We will use the same CSV data that we processed in the last script, but this time will use the CSV module’s DictReader object. Here is the code:
# Import the csv module import csv # Open the csv file with Python’s built in open() method with open(‘names.csv’) as csv_file: # Create a CSV DictReader object reader = csv.DictReader(csv_file) # Get the heading names with reader.fieldnames print(f’The column headings are {“, “.join(reader.fieldnames)}’) # Set a line counter line_count = 0 # Print out the lines of the file for row in reader: print(f'{row[“name”]} lives in {row[“city”]},{row[“state”]}, and was born in {row[“birthday month”]}.’) line_count += 1 print(f’CSV File contains {line_count} lines.’) |
This prints out the same results as the last code:
The column headings are name, city, state, birthday month Bob lives in Kansas City,MO, and was born in March. Jane lives in St. Louis,MO, and was born in January. Rick lives in New York,NY, and was born in April. CSV File contains 3 lines. |
But when you use the DictReader method, it automatically parses the headings to create the dictionaries. So instead of printing out the first line to get the headings, we must use the fieldnames attribute of the DictReader object. When we access the values in each row, we can access them by the heading name now instead of numerical indexes. Also, note that the DictReader only iterates the rows of data, not the heading line.
Reading custom CSV files with Python CSV
The CSV file we are currently using has the standard CSV format that the Python CSV module expects, so we can just pass the file to the reader object, and it works. But if you have a CSV file that uses a different delimiter, wraps the data in quotes, or uses an escape character, you must add extra parameters to either the reader or DictReader method calls in order to process the file correctly.
If we have a CSV file that uses tab delimiters (often called a TSV file), wraps the values in double-quotes, and uses a backslash as an escape character, we will have to set these extra parameters like this:
# Using reader with open(‘names.csv’) as csv_file: reader = csv.reader(csv_file, delimiter=’\t’, quotechar='”‘, escapechar=”\”) # Using DictReader with open(‘names.csv’) as csv_file: reader = csv.DictReader(csv_file, delimiter=’\t’, quotechar='”‘, escapechar=”\”) |
Writing CSV files using Python CSV
To write data to a CSV file using the Python CSV module, you can use one of the writer objects: writer or DictWriter. We will start by using writer, which uses Python lists to create the rows of the CSV file. The writer method also accepts the same parameters as the reader method: delimiter, quotechar, and escapechar. This allows you to generate any CSV format you choose. It also accepts a quoting parameter, which we can see in action in the example below:
# Open the file for writing with the Python open method with open(‘example.csv’, mode=’w’) as example_file: writer = csv.writer(example_file, delimiter=’,’, quotechar='”‘, quoting=csv.QUOTE_MINIMAL) # Write the header row with the writerow method writer.writerow([‘name’, ‘city’, ‘state’, ‘birthday month’]) # Write the data rows with the writerow method writer.writerow([‘Bob’, ‘Kansas City’, ‘MO’, ‘March’]) writer.writerow([‘Jane’, ‘St. Louis’, ‘MO’, ‘January’]) writer.writerow([‘Rick’, ‘New York’, ‘NY’, ‘April’]) |
The code above will generate a file that looks the same as the one we have been using. There are no quotes in it, even though we set the quotechar to double quotes. This is because the quoting parameter is set to csv.QUOTE_MINIMAL. This parameter tells the writer how to use quotes and has four possible values:
- csv.QUOTE_MINIMAL. This is the default setting, and the writer will only apply quotes if a field contains the delimiter or quotechar.
- csv.QUOTE_ALL. The writer will apply quotes to every field.
- csv.QUOTE_NONNUMERIC. The writer will quote all text fields and convert any numeric data to the float data type.
- csv.QUOTE_NONE. The writer will escape delimiters instead of using quotes. An escapechar parameter is required if you use this value.
Writing CSV files into a dictionary with CSV
Just like we can use the DictReader method instead of the reader method to use dictionaries for our data, we can use the DictWriter method instead of the writer method. Here is an example writing the same CSV file:
# Import the csv module import csv # Open the file for writing with the Python open method with open(‘example.csv’, mode=’w’) as example_file: # List of the headings fieldnames = [‘name’, ‘city’, ‘state’, ‘birthday month’] # Create the DictWriter object writer = csv.DictWriter(example_file, fieldnames=fieldnames, delimiter=’,’, quotechar='”‘, quoting=csv.QUOTE_MINIMAL) # Write the header row with the writeheader method writer.writeheader() # Write the data rows with the writerow method writer.writerow({‘name’: ‘Bob’, ‘city’: ‘Kansas City’, ‘state’: ‘MO’, ‘birthday month’: ‘March’}) writer.writerow({‘name’: ‘Jane’, ‘city’: ‘St. Louis’, ‘state’: ‘MO’, ‘birthday month’: ‘January’}) writer.writerow({‘name’: ‘Rick’, ‘city’: ‘New York’, ‘state’: ‘NY’, ‘birthday month’: ‘April’}) |
The DictWriter method accepts all the parameters that the writer method does, but it has another required parameter: fieldnames. This parameter is a list of the names of the headings in your CSV file that also are the keys in the dictionaries you will be using to create the rows of data. To print the header row, we call the writeheader method instead of manually creating a row like we did with the writer method.
Handling CSV files with pandas
Another option for more advanced CSV file handling is using the Python pandas module. This module provides advanced analysis and manipulation functionality for tabular data and is often used in combination with machine learning, data analysis, and data visualization libraries. It is also open source and free to use, but you will have to install it separately with your Python package manager.
If you are using pip, you can install it with this command:
pip install pandas |
And if you are using Anaconda, you can install it by running this command:
conda install pandas |
Reading CSV files with pandas
Reading a CSV file with pandas is really easy. Just make sure you have the pandas module installed, import the module, call the pandas read_csv method on the file, and print the results. You don’t have to use the Python open method or loop through the lines in the file. It’s all done for you. Here is an example:
# Import the pandas module import pandas # Read the csv file into a DataFrame dataframe = pandas.read_csv(‘names.csv’) print(dataframe) |
And here is the result:
name city state birthday month 0 Bob Kansas City MO March 1 Jane St. Louis MO January 2 Rick New York NY April |
When pandas reads a CSV file, it puts the data in a dataframe. A dataframe is a two-dimensional labeled data structure, kind of like a spreadsheet or database that is in memory. To learn more about dataframes, read our article on creating dataframes with pandas.
Writing CSV files with pandas
You can also create CSV files in pandas from a dataframe. Here is an example:
# Import the pandas module import pandas # Create a dataframe dataframe = pandas.DataFrame([ [‘Bob’, ‘Kansas City’, ‘MO’, ‘March’], [‘Jane’, ‘St. Louis’, ‘MO’, ‘January’], [‘Rick’, ‘New York’, ‘NY’, ‘April’]], columns=[‘name’, ‘city’, ‘state’, ‘birthday month’]) # Write dataframe to a CSV file dataframe.to_csv(‘example.csv’, index=False) |
This code will recreate the same CSV we have used for this whole article. The first parameter in pandas.DataFrame contains the data we want to write to the CSV as a list of lists. Each of the nested lists will become a data row. By setting the named columns parameter, we define the columns in the dataframe that will become the headers of the CSV file.
Then we call the to_csv method on the dataframe we created to generate the CSV file. The first parameter is the path of the file we want to create, and the named index parameter that is set to false tells pandas to not include the indexes it created for the dataframe. Without this, our CSV file’s first column will have an empty header and an index number for each row.
To learn more about using pandas with CSV files, see our guide to exporting pandas to CSV.
Conclusion
The CSV file format is a handy universal format for transferring data from one application to another. You can simply open the file in Excel or a text editor to examine the data you’re working with instead of needing a proprietary application. If you want to simply read and write CSV files with Python, that functionality comes built-in and is simple to use. If you plan on doing data analysis, however, the Python pandas module may be the best choice because of all the data manipulation features it has. To learn more about using the Python programming language, see our article on command line arguments in Python, and if you are ready to build real-world skills using Python, our Python courses will help you build them.
Recommended Articles
Top courses in Python
Python students also learn
Empower your team. Lead the industry.
Get a subscription to a library of online courses and digital learning tools for your organization with Udemy Business.