CSVs

Reading CSV Files in Python

You must enter and exit information from your programs via methods other than the keyboard and terminal. Text file exchange is a typical method for sharing information between programs. Programming requires the ability to read, write, and manipulate data. As a result, knowing how to manage files in Python can be useful so that you can read and write data from other sources. This is where Python’s CSV module comes in handy. CSV files are used to store vast amounts of data, and the CSV module in Python allows you to parse those files. With the help of examples, we will learn how to read CSV files in Python in a variety of formats.

Python CSV Files

CSV (Comma Separated Values) is a simple file format for storing tabular data in a spreadsheet or database. A CSV file is a plain text file that stores tabular data (numbers and text). Each line in the file represents a data record. Each record is made up of one or more fields that are separated by commas. The name of this file format comes from the use of the comma as a field separator. The standard format is defined by the rows and columns of data in a CSV file that opens into an excel sheet.

For this task, we will only use the csv module included with Python. But first, we must import the Python csv module as:

Basic Usage of csv.reader()

The reader object is used to read from a CSV file. The CSV module includes a reader() method that can be used to read a CSV file into our program. The reader function converts each line of a specified file into a list of columns. Then the Python’s built-in open() function, which returns a file object, is used to open the CSV file as a text file. This is then passed on to the reader, who does all of the heavy liftings. Let’s look at a simple example of using csv.reader() to refresh your memory.

Example: Read CSV files with csv.reader()

Assume we have a CSV file called StudentData that contains the following entries:

read csv

We can read the contents of the above CSV file with the help of the following program:

                    

import csv

with open('StudentData.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Output

                    

['Sr. No.', 'Name', 'Age', 'Specialization', 'Skills']
['1', 'David', '23', 'IT', 'Python']
['2', 'Kelly', '24', 'Marketing', 'Branding']
['3', 'Terry', '23', 'IT', 'Java']
['4', 'Lita', '23', 'Finance', 'Data Analytics']

Using the open() function, we opened the StudentData.csv file in reading mode. The file is then read using csv.reader(), which yields an iterable reader object. A for loop is then used to iterate through the reader object, printing the contents of each row.

Now we’ll look at CSV files in various formats. Then we’ll see how to modify the csv.reader() function to read them.

CSV files with Custom Delimiters

A comma is used as a delimiter in a CSV file by default. Some CSV files, however, can utilize delimiters other than a comma. The | and \t are two popular ones. Assume the tab character was used as a delimiter in the StudentData.csv file in Example 1. To read the file, we can use the csv.reader() function with an additional delimiter option.

Example: Read CSV file Having Tab Delimiter

                    

import csv

with open('StudentData.csv', 'r') as file:
    reader = csv.reader(file, delimiter = '\t')
    for row in reader:
        print(row)

Output

                    

['Sr. No.,Name,Age,Specialization,Skills']
['1,David,23,IT,Python']
['2,Kelly,24,Marketing,Branding']
['3,Terry,23,IT,Java']
['4,Lita,23,Finance,Data Analytics']

As we can see, the optional argument delimiter = ‘\t’ informs the reader object that the delimiter in the CSV file we are reading from is tabs.

CSV files with initial spaces

A space character may appear after a delimiter in some CSV files. We will get spaces in the output if we use the default csv.reader() function to read these CSV files. To remove these starting spaces, an additional argument called skipinitialspace must be passed.

Example: Read CSV files with initial spaces

                    

import csv

with open('StudentData.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, skipinitialspace = True)
    for row in reader:
        print(row)

Output

                    

['Sr. No.', 'Name', 'Age', 'Specialization', 'Skills']
['1', 'David', '23', 'IT', 'Python']
['2', 'Kelly', '24', 'Marketing', 'Branding']
['3', 'Terry', '23', 'IT', 'Java']
['4', 'Lita', '23', 'Finance', 'Data Analytics']

The program is similar to the others, but it has an extra skipinitialspace option that is set to True. This informs the reader object that the items have starting whitespace. As a result, the first spaces that followed a delimiter are eliminated.

CSV files with quotes

Few CSV files may contain quotes around some or all of the entries. As an example, consider the file FamousQuotes.csv, which contains the following entries:

When using csv.reader() in minimal mode, the output will include quote marks. To get rid of them, we’ll need to employ another optional parameter called quoting. Let’s have a look at an example of how to interpret the preceding program.

Example: Read CSV files with quotes

                    

import csv

with open('FamousQuotes.csv', 'r') as file:
    reader = csv.reader(file, quoting=csv.QUOTE_ALL, skipinitialspace=True)
    for row in reader:
        print(row)

Output

                    

['Sr. No.', 'Name', 'Quote']
['1', 'Winston Churchill', 'If you are going through hell, keep going']
['2', 'Norman Vaughan', 'Dream big and dare to fail']
['3', 'Stephen Hawking', 'Life would be tragic if it was not funny']
['4', 'Aristotle', 'Happiness depends upon ourselves']

As you can see, we’ve given the quoting argument csv.QUOTE_ALL. The csv module defines it as a constant. The csv.QUOTE_ALL instructs the reader object that all values in the CSV file are enclosed in quotation marks.

There are three more predefined constants that can be passed to the quoting parameter:

  • QUOTE_MINIMAL – Tells the writer objects to quote only fields that contain special characters like delimiter, quotechar, or any of the characters in lineterminator.
  • QUOTE_NONNUMERIC – Tells the reader object that the non-numeric entries in the CSV file are surrounded by quotations.
  • QUOTE_NONE – Tells the reader object that no entries should have quotes around them, i.e. never quote the fields.

Dialects in CSV module

In the above example, we gave multiple options to the csv.reader() function (quoting and skipinitialspace). When dealing with only one or two files, this method is fine. However, as we begin working with several CSV files of comparable forms, the code will become more redundant and unsightlier.

As a workaround, the csv module includes dialect as an optional parameter. Many unique formatting patterns, such as delimiter, skipinitialspace, quoting, and escapechar, might be grouped together under a single dialect name. It can then be provided as a parameter to many instances of the writer or reader.

Example: Read CSV files using dialect

Assume we have a CSV file (Employee.csv) with the following information:

“ID”| “Name”| “EmailID”

“A20″| “David Johnson”| “davidjohnson20@marketing.com”

“A25″| “Kim Dsouza”| “dkim1968@finance.com”

“E35″| “Arnold Silva”| “arnoldsilva@ops.com”

The CSV file contains starting spaces, quotation marks around each entry, and a | delimiter. Instead of passing three separate formatting patterns, let’s look at how to read this file using dialects.

                    

import csv

csv.register_dialect('myDialect',
                     delimiter='|',
                     skipinitialspace=True,
                     quoting=csv.QUOTE_ALL)
 
with open('Employee.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, dialect='myDialect')
    for row in reader:
        print(row)

Output

                    

['ID', 'Name', 'EmailID']
["A20", 'David Johnson', 'davidjohnson20@marketing.com']
["A25", 'Kim Dsouza', 'dkim1968@finance.com']
["E35", 'Arnold Silva', 'arnoldsilva@ops.com']

The csv.register dialect() method is used to define a custom dialect in this example. The syntax is as follows:

                    

csv.register_dialect(name[, dialect[, **fmtparams]])

A string name is required for the custom dialect. Other specifications can be made by passing a subclass of the Dialect class, or by passing individual formatting patterns, as shown in the example.

We supply dialect = ‘myDialect’ when constructing the reader object to specify that the reader instance must use that dialect. The use of dialect has the advantage of making the program more modular. We can reuse ‘myDialect’ to open other files without having to specify the CSV format again.

Read CSV files with csv.DictReader()

Instead of dealing with a list of individual text items, we may use the DictReader() function to read the csv file directly into a dictionary. The DictReader class allows you to create an object that functions similarly to a conventional CSV reader. However, it transfers the information in each line to a dictionary (dict) whose keys are defined by the first line’s values.

The entire syntax for csv.DictReader() class is as follows:

                    

csv.DictReader(file, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)

Example: Python csv.DictReader()

Assume we have a CSV file called StudentData that contains the following entries:

Let’s look at how to use csv.DictReader(),

                    

import csv

with open('StudentData.csv', 'r') as file:
    csv_file = csv.DictReader(file)
    for row in csv_file:
        print(dict(row))

Output

                    

{'Sr. No.': '1', 'Name': 'David', 'Age': '23', 'Specialization': 'IT', 'Skills': 'Python'}
{'Sr. No.': '2', 'Name': 'Kelly', 'Age': '24', 'Specialization': 'Marketing', 'Skills': 'Branding'}
{'Sr. No.': '3', 'Name': 'Terry', 'Age': '23', 'Specialization': 'IT', 'Skills': 'Java'}
{'Sr. No.': '4', 'Name': 'Lita', 'Age': '23', 'Specialization': 'Finance', 'Skills': 'Data Analytics'}

The entries in the first row, as we can see, are the dictionary keys. The dictionary values are represented by the entries in the other rows. Here csv_file is a csv.DictReader() object in this case. A for loop can be used to iterate across the object. For each row, the csv.DictReader() function returns an OrderedDict type. As a result, we used dict() to transform each entry to a dictionary.

Inside the for loop, we specifically utilized the dict() method to generate dictionaries.

Note – From Python version 3.8, csv.DictReader() returns a dictionary for each row, and we do not need to use dict() explicitly.

Using csv.Sniffer class

A program may not always recognize the format of csv files. The delimiter characters may differ from one file to the next. We can handle this circumstance with Sniffer, a class. To generate a dialect, we use the sniff approach.

To determine the format of a CSV file, the Sniffer class is utilized. There are two methods available in the Sniffer class:

  • sniff(sample, delimiters=None) – This function analyses a given sample of CSV text and produces a Dialect subclass containing all deduced parameters.

A string comprising possibly valid delimiter characters can be given as an optional delimiters parameter.

  • has_header(sample) – This method returns True or False based on whether the first row of the sample CSV contains column headers.

Example: Using csv.Sniffer() to deduce the dialect of CSV files

Assume we have a CSV file (Employee.csv) with the following information:

“ID”| “Name”| “EmailID”

“A20″| “David Johnson”| “davidjohnson20@marketing.com”

“A25″| “Kim Dsouza”| “dkim1968@finance.com”

“E35″| “Arnold Silva”| “arnoldsilva@ops.com”

Let’s take a look at how we can use the csv.Sniffer() class to determine the format of this file:

                    

import csv

with open('Employee.csv', 'r') as csvfile:
    sample = csvfile.read(64)
    has_header = csv.Sniffer().has_header(sample)
    print(has_header)

    deduced_dialect = csv.Sniffer().sniff(sample)

with open('Employee.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, deduced_dialect)

    for row in reader:
        print(row)

Output

                    

True
['ID', 'Name', 'EmailID']
["A20", 'David Johnson', 'davidjohnson20@marketing.com']
["A25", 'Kim Dsouza', 'dkim1968@finance.com']
["E35", 'Arnold Silva', 'arnoldsilva@ops.com']

As you can see, we merely read the first 64 characters of the file Employee.csv and saved them in the sample variable. After that, the sample was passed as an argument to the Sniffer().has_header() function. It determined that column headers were required in the first row. As a result, it returned True, which was subsequently printed. Similarly, the Sniffer().sniff() function was supplied sample. It returned all of the inferred parameters in the form of a Dialect subclass, which was then saved in the deduced_dialect variable.

Later, we reopened the CSV file and gave the deduced_dialect variable to csv.reader() as a parameter. It successfully predicted the delimiter, quoting, and skipinitialspace parameters in the Employee.csv file even though we did not explicitly indicate them.

Frequently Asked Questions

Q1. How do I read a csv file in Python?

The reader object is used to read from a CSV file. The CSV module includes a reader() method that can be used to read a CSV file into our program. The reader function converts each line of a specified file into a list of columns. Then the Python’s built-in open() function, which returns a file object, is used to open the CSV file as a text file. This is then passed on to the reader, who does all of the heavy lifting. Let’s look at a simple example of using csv.reader() to refresh your memory.

We can read the contents of any CSV file with the help of the following program:

                    

import csv

with open('FileName.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

Q2. How do I import a csv file in Python?

There are many different ways in which we can import CSV files in Python. This can be achieved by using the Pandas library functions in Python. Pandas is the most essential data science package in Python. Few methods to import a CSV file are mentioned below:

  1. Using read_csv() method

                    

import pandas as pd  

# making data frame  
df = pd.read_csv('File_path') 
df.head()   # displays entire data

  1. Using csv module

                    

import csv
import pandas as pd

# open the csv file
with open(r'File_name.csv') as csv_file: 

    # read the csv file
    csv_reader = csv.reader(csv_file, delimiter=',')

    # now we can use this csv files into the pandas
    df = pd.DataFrame([csv_reader], index=None)
    df.head()

# iterating values of all column
for val in list(df[]):
    print(val)

Share with friends

Customize your course in 30 seconds

Which class are you in?
5th
6th
7th
8th
9th
10th
11th
12th
Get ready for all-new Live Classes!
Now learn Live with India's best teachers. Join courses with the best schedule and enjoy fun and interactive classes.
tutor
tutor
Ashhar Firdausi
IIT Roorkee
Biology
tutor
tutor
Dr. Nazma Shaik
VTU
Chemistry
tutor
tutor
Gaurav Tiwari
APJAKTU
Physics
Get Started

Browse

CSVs
  • Reading CSV Files in Python

Leave a Reply

Your email address will not be published. Required fields are marked *

Browse

CSVs
  • Reading CSV Files in Python

Download the App

Watch lectures, practise questions and take tests on the go.

Customize your course in 30 seconds

No thanks.