The tutorial shows how to select columns in a dataframe in Python.
method 1:
df[‘column_name’]
method 2:
df.column_name
method 3:
df.loc[:, ‘column_name’]
method 4:
df.iloc[:, column_number]
Example for method 1
The following uses df['column_name']
to subset a column of data. In particular, it subsets the column of ‘Location.’
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# select the column of 'Location'
column_location=car_data['Location']
# print out the selected column
print('Selected column: \n', column_location)
The following shows the original dataframe and the selected column of Location.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2019 Selected column: 0 CA 1 CA 2 NY 3 MA 4 CA Name: Location, dtype: object
Example for method 2
If the column name is a string that is a valid Python identifier, we can also use dot notation df.column_name
to select the column.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# subset the column of 'Location'
column_location=car_data.Location
# print out the selected column
print('Selected column: \n', column_location)
The following is the output showing the original and the selected one.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2019 Selected column: 0 CA 1 CA 2 NY 3 MA 4 CA Name: Location, dtype: object
Example for method 3
We can also subset a column from a Pandas dataframe using loc[:, 'column_name']
.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# subset a column using loc[]
selected_column=car_data.loc[:,'Location']
# print out the selected column
print('Selected column: \n', selected_column)
The following is the output.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2019 Selected column: 0 CA 1 CA 2 NY 3 MA 4 CA Name: Location, dtype: object
Example for method 4
The following uses iloc[:,column_number]
to select a column.
Note that, column numbers in iloc[]
start with zero for the first column. Thus, 1 actually refers to the second column.
import pandas as pd
# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'],
'Location': ['CA', 'CA','NY','MA','CA'],
'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)
#print out the original dataframe
print('Original Dataframe: \n', car_data)
# subset a column using iloc[]
selected_column=car_data.iloc[:,1]
# print out the selected column
print('Selected column: \n', selected_column)
The following shows the original dataframe and the selected column using the iloc[]
method.
Original Dataframe: Brand Location Year 0 Tesla CA 2019 1 Tesla CA 2018 2 Tesla NY 2020 3 Ford MA 2019 4 Ford CA 2019 Selected column: 0 CA 1 CA 2 NY 3 MA 4 CA Name: Location, dtype: object
Side Note:
- .loc[ ] uses the labels of rows and columns and returns Series or DataFrames.
- .iloc[ ] uses the zero-based indices of rows and columns and returns Series or DataFrames.