To select rows based on values in multiple columns in Pandas dataframes, you can use loc() and query(). The following shows the basic syntax of two methods to do so.
Method 1:
df.loc[(df[‘column1’]==value1) & (df[‘column2’]==value2) ]
Method 2:
df.query(‘column1==”value1″ & column2==”value2″‘)
Example for Method 1
The following is an example showing how to select rows based on values in multiple columns in Pandas dataframes using loc()
. In particular, we are going to select rows based on values in columns of Location and Year.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select rows when columns of 'Location' is equal to 'CA' and 'Year' is equal to 2019
rows_selected=car_data.loc[(car_data['Location']=='CA') & (car_data['Year']=='2019') ]
#print out the dataframe with selected rows based on values in multiple columns
print('Dataframe with selected rows: \n', rows_selected)
The following is the output showing the original dataframe and the selected one using loc()
.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2019 Dataframe with selected rows: Brand Location Year 0 Tesla CA 2019 4 Ford CA 2019
Example for Method 2
The following is another example showing how to select rows based on values in multiple columns in a Pandas dataframe using query()
.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select rows when column of 'Location' is equal to 'CA' or 'MA'
rows_selected=car_data.loc[car_data['Location'].isin(['CA','MA']) ]
# select rows when columns of 'Location' is equal to 'CA' and 'Year' is equal to 2019
rows_selected=car_data.query('Location=="CA" & Year=="2019"')
#print out the dataframe with selected rows based on values in multiple columns
print('Dataframe with selected rows: \n', rows_selected)
The following is the output showing the original dataframe and the selected one using query()
.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2019 Dataframe with selected rows: Brand Location Year 0 Tesla CA 2019 4 Ford CA 2019