How to Select Columns to Form a New Dataframe in Python Pandas

This tutorial shows how to select columns to form a new dataframe in Python Pandas.

The following figure illustrates that you got 4 columns but only want to select 2 columns to form a new dataframe. The figure is from the original Pandas manual.

../../_images/03_subset_columns.svg
Credit: https://pandas.pydata.org/

The following shows the steps of building a dataframe and creating a new dataframe based on the original one.

Step 1: Create a dataframe

# importing Pandas
import pandas as pd

# Create a dataframe
car_data = {'Brand': ['Tesla', 'Tesla','Tesla','Ford','Ford'], 
     'Location': ['CA', 'CA','NY','MA','CA'],
    'Year':['2019','2018','2020','2019','2019']}
car_data=pd.DataFrame(data=car_data)

#print out the original dataframe
print('Original Dataframe: \n', car_data)
Original Dataframe: 
    Brand Location  Year
0  Tesla       CA  2019
1  Tesla       CA  2018
2  Tesla       NY  2020
3   Ford       MA  2019
4   Ford       CA  2019

Step 2: Select columns and save

The following code subset two columns, Brand and Year, and save them as a new dataframe called selected_df.

# subset two columns, Brand and Year
selected_df=car_data[["Brand","Year"]]

# print out the new dataframe
print(selected_df)

The following is the new dataframe.

   Brand  Year
0  Tesla  2019
1  Tesla  2018
2  Tesla  2020
3   Ford  2019
4   Ford  2019

The inner square brackets define a Python list of column names. The outer brackets are used to select the data from the dataframe.

 

Leave a Comment