How to Select or Subset Dataframe Columns in Python

The tutorial shows how to select columns in a dataframe in Python.

method 1:

df[‘column_name’]

method 2:

df.column_name

method 3:

df.loc[:, ‘column_name’]

method 4:

df.iloc[:, column_number]


Example for method 1

The following uses df['column_name'] to subset a column of data. In particular, it subsets the column of ‘Location.’

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# select the column of 'Location'
column_location=car_data['Location']

# print out the selected column
print('Selected column: \n', column_location)

The following shows the original dataframe and the selected column of Location.

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2019

Selected column: 
 0    CA
1    CA
2    NY
3    MA
4    CA
Name: Location, dtype: object

Example for method 2

If the column name is a string that is a valid Python identifier, we can also use dot notation df.column_name to select the column. 

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# subset the column of 'Location'
column_location=car_data.Location

# print out the selected column
print('Selected column: \n', column_location)

The following is the output showing the original and the selected one.

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2019

Selected column: 
 0    CA
1    CA
2    NY
3    MA
4    CA
Name: Location, dtype: object

Example for method 3

We can also subset a column from a Pandas dataframe using loc[:, 'column_name'].

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# subset a column using loc[]
selected_column=car_data.loc[:,'Location']

# print out the selected column
print('Selected column: \n', selected_column)

The following is the output.

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2019

Selected column: 
 0    CA
1    CA
2    NY
3    MA
4    CA
Name: Location, dtype: object

Example for method 4

The following uses iloc[:,column_number] to select a column.

Note that, column numbers in iloc[] start with zero for the first column. Thus, 1 actually refers to the second column.

import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)

# subset a column using iloc[]
selected_column=car_data.iloc[:,1]

# print out the selected column
print('Selected column: \n', selected_column)

The following shows the original dataframe and the selected column using the iloc[] method.

Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2019

Selected column: 
 0    CA
1    CA
2    NY
3    MA
4    CA
Name: Location, dtype: object

Side Note:

  • .loc[ ] uses the labels of rows and columns and returns Series or DataFrames.
  • .iloc[ ] uses the zero-based indices of rows and columns and returns Series or DataFrames.