The tutorial shows how to select columns in a dataframe in Python.
method 1:
df[‘column_name’]
method 2:
df.column_name
method 3:
df.loc[:, ‘column_name’]
method 4:
df.iloc[:, column_number]
Example for method 1
The following uses df['column_name'] to subset a column of data. In particular, it subsets the column of ‘Location.’
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select the column of 'Location'
column_location=car_data['Location']
# print out the selected column
print('Selected column: \n', column_location)
The following shows the original dataframe and the selected column of Location.
Original Dataframe:
Brand Location Year
0 Tesla CA 2019
1 Tesla CA 2018
2 Tesla NY 2020
3 Ford MA 2019
4 Ford CA 2019
Selected column:
0 CA
1 CA
2 NY
3 MA
4 CA
Name: Location, dtype: object
Example for method 2
If the column name is a string that is a valid Python identifier, we can also use dot notation df.column_name to select the column.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# subset the column of 'Location'
column_location=car_data.Location
# print out the selected column
print('Selected column: \n', column_location)
The following is the output showing the original and the selected one.
Original Dataframe:
Brand Location Year
0 Tesla CA 2019
1 Tesla CA 2018
2 Tesla NY 2020
3 Ford MA 2019
4 Ford CA 2019
Selected column:
0 CA
1 CA
2 NY
3 MA
4 CA
Name: Location, dtype: object
Example for method 3
We can also subset a column from a Pandas dataframe using loc[:, 'column_name'].
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# subset a column using loc[]
selected_column=car_data.loc[:,'Location']
# print out the selected column
print('Selected column: \n', selected_column)
The following is the output.
Original Dataframe:
Brand Location Year
0 Tesla CA 2019
1 Tesla CA 2018
2 Tesla NY 2020
3 Ford MA 2019
4 Ford CA 2019
Selected column:
0 CA
1 CA
2 NY
3 MA
4 CA
Name: Location, dtype: object
Example for method 4
The following uses iloc[:,column_number] to select a column.
Note that, column numbers in iloc[] start with zero for the first column. Thus, 1 actually refers to the second column.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# subset a column using iloc[]
selected_column=car_data.iloc[:,1]
# print out the selected column
print('Selected column: \n', selected_column)
The following shows the original dataframe and the selected column using the iloc[] method.
Original Dataframe:
Brand Location Year
0 Tesla CA 2019
1 Tesla CA 2018
2 Tesla NY 2020
3 Ford MA 2019
4 Ford CA 2019
Selected column:
0 CA
1 CA
2 NY
3 MA
4 CA
Name: Location, dtype: object
Side Note:
- .loc[ ] uses the labels of rows and columns and returns Series or DataFrames.
- .iloc[ ] uses the zero-based indices of rows and columns and returns Series or DataFrames.