Pandas: How to Select Rows Based on Column Values

This tutorial includes methods that you can select rows based on a specific column value or a few column values by using loc() or query() in Python Pandas.

Select base on a single value in a single column:

Method 1: df.loc[df[‘column_name’]==value]

Method 2: df.loc[df[‘column_name’].isin([value])]

Select multiple values in a single column:

Method 1: df.loc[(df[‘column_name’]==value1)|(df[‘column_name’]==value2) ]

Method 2: df.loc[df[‘column_name’].isin([value1,value2])]


Select rows based on a single value in a single column

Method 1: use loc()

The following is an example showing how to do it. First, we structure a dataframe from scratch, from which we are going to select rows when the location is CA.

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# select rows when column of 'Location' is equal to 'CA'
rows_selected=car_data.loc[car_data['Location']=='CA']

#print out the dataframe with selected rows
print('Dataframe with selected rows: \n', rows_selected)

The following is the output showing the original dataframe and the dataframe with selected rows when the location is equal to CA.

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2016

Dataframe with selected rows: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
4   Ford       CA  2016

Method 2: use isin()

You can also use isin() to select rows based on a single value in a column value in Pandas.

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# select rows when column of 'Location' is equal to 'CA'
rows_selected=car_data.loc[car_data['Location'].isin(['CA'])]

#print out the dataframe with selected rows based on multiple value
print('Dataframe with selected rows: \n', rows_selected)

The following is the output using isin().

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2016

Dataframe with selected rows: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
4   Ford       CA  2016

Select rows based on multiple values in a single column

Method 1: use or | operator

The following is a code example using loc() and or | operator to select rows based on multiple values (i.e., CA and WA) in a column (i.e., Location).

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# select rows when column of 'Location' is equal to 'CA' or 'MA'
rows_selected=car_data.loc[(car_data['Location']=='CA') | (car_data['Location']=='MA') ]

#print out the dataframe with selected rows based on multiple values
print('Dataframe with selected rows: \n', rows_selected)

The following is the output.

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2016

Dataframe with selected rows: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
3   Ford       MA  2019
4   Ford       CA  2016

Method 2: use isin()

We can use isin() as well. The following shows how we can do that.

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# select rows when column of 'Location' is equal to 'CA' or 'MA'
rows_selected=car_data.loc[car_data['Location'].isin(['CA','MA'])  ]

#print out the dataframe with selected rows based on multiple values
print('Dataframe with selected rows: \n', rows_selected)

ThThe following is the actual code example and output showing how to select rows based on multiple values in a column.

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2016

Dataframe with selected rows: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
3   Ford       MA  2019
4   Ford       CA  2016

Further Reading