Select Rows based on values in multiple columns in Pandas Dataframes

To select rows based on values in multiple columns in Pandas dataframes, you can use loc() and query(). The following shows the basic syntax of two methods to do so.

Method 1:

df.loc[(df[‘column1’]==value1) & (df[‘column2’]==value2) ]

Method 2:

df.query(‘column1==”value1″ & column2==”value2″‘)


Example for Method 1

The following is an example showing how to select rows based on values in multiple columns in Pandas dataframes using loc(). In particular, we are going to select rows based on values in columns of Location and Year.

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# select rows when columns of 'Location' is equal to 'CA' and 'Year' is equal to 2019
rows_selected=car_data.loc[(car_data['Location']=='CA') & (car_data['Year']=='2019') ]

#print out the dataframe with selected rows based on values in multiple columns
print('Dataframe with selected rows: \n', rows_selected)

The following is the output showing the original dataframe and the selected one using loc().

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2019

Dataframe with selected rows: 
    Brand Location  Year
0  Tesla       CA  2019
4   Ford       CA  2019

Example for Method 2

The following is another example showing how to select rows based on values in multiple columns in a Pandas dataframe using query().

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# select rows when column of 'Location' is equal to 'CA' or 'MA'
rows_selected=car_data.loc[car_data['Location'].isin(['CA','MA'])  ]

# select rows when columns of 'Location' is equal to 'CA' and 'Year' is equal to 2019
rows_selected=car_data.query('Location=="CA" & Year=="2019"')

#print out the dataframe with selected rows based on values in multiple columns
print('Dataframe with selected rows: \n', rows_selected)

The following is the output showing the original dataframe and the selected one using query().

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2019

Dataframe with selected rows: 
    Brand Location  Year
0  Tesla       CA  2019
4   Ford       CA  2019