This tutorial includes methods that you can select rows based on a specific column value or a few column values by using loc() or query() in Python Pandas.
Select base on a single value in a single column:
Method 1: df.loc[df[‘column_name’]==value]
Method 2: df.loc[df[‘column_name’].isin([value])]
Select multiple values in a single column:
Method 1: df.loc[(df[‘column_name’]==value1)|(df[‘column_name’]==value2) ]
Method 2: df.loc[df[‘column_name’].isin([value1,value2])]
Select rows based on a single value in a single column
Method 1: use loc()
The following is an example showing how to do it. First, we structure a dataframe from scratch, from which we are going to select rows when the location is CA.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select rows when column of 'Location' is equal to 'CA'
rows_selected=car_data.loc[car_data['Location']=='CA']
#print out the dataframe with selected rows
print('Dataframe with selected rows: \n', rows_selected)
The following is the output showing the original dataframe and the dataframe with selected rows when the location is equal to CA.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2016 Dataframe with selected rows: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 4 Ford CA 2016
Method 2: use isin()
You can also use isin()
to select rows based on a single value in a column value in Pandas.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select rows when column of 'Location' is equal to 'CA'
rows_selected=car_data.loc[car_data['Location'].isin(['CA'])]
#print out the dataframe with selected rows based on multiple value
print('Dataframe with selected rows: \n', rows_selected)
The following is the output using isin()
.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2016 Dataframe with selected rows: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 4 Ford CA 2016
Select rows based on multiple values in a single column
Method 1: use or | operator
The following is a code example using loc()
and or |
operator to select rows based on multiple values (i.e., CA and WA) in a column (i.e., Location).
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select rows when column of 'Location' is equal to 'CA' or 'MA'
rows_selected=car_data.loc[(car_data['Location']=='CA') | (car_data['Location']=='MA') ]
#print out the dataframe with selected rows based on multiple values
print('Dataframe with selected rows: \n', rows_selected)
The following is the output.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2016 Dataframe with selected rows: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 3 Ford MA 2019 4 Ford CA 2016
Method 2: use isin()
We can use isin()
as well. The following shows how we can do that.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2016']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select rows when column of 'Location' is equal to 'CA' or 'MA'
rows_selected=car_data.loc[car_data['Location'].isin(['CA','MA']) ]
#print out the dataframe with selected rows based on multiple values
print('Dataframe with selected rows: \n', rows_selected)
ThThe following is the actual code example and output showing how to select rows based on multiple values in a column.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2016 Dataframe with selected rows: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 3 Ford MA 2019 4 Ford CA 2016