Join Two Dataframes in Pandas based on a Column

There are two methods to join two dataframes in Pandas based on a column. (Note that, by default, join() will use index to combine two dataframes.)

Method 1:

We can use a combination of set_index('col_name') and on='col_name' for df_2. col_name is a common column in both df_1 and df_2.

df_1.join(df_2.set_index(‘col_name’),on=’col_name’)

Method 2:

We can use a combination of set_index('col_name') for both df_1 and df_2. col_name is a common column in both df_1 and df_2.

df_1.set_index(‘col_name’).join(df_2.set_index(‘col_name’))

The following is two dataframes that we are going to use in the following examples.

import pandas as pd
# create dataframe 1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'], 
     'Location': ['CA', 'CA','NY','MA'],
    'Name': ['Jake', 'Jacob','John','Jess']},index=list('abcd'))
print("df_1: \n",df_1)

# create dataframe 1
df_2= pd.DataFrame({ 
    'Name': ['Jake', 'Jacob','John','Jess','James'],
    'Age': [28, 45,50,39,40]},index=list('aijek'))
print("df_2: \n",df_2)
df_1: 
     Brand Location   Name
a   Tesla       CA   Jake
b  Toyota       CA  Jacob
c   Tesla       NY   John
d    Ford       MA   Jess
df_2: 
     Name  Age
a   Jake   28
i  Jacob   45
j   John   50
e   Jess   39
k  James   40

Method 1

# 'Name' the common column name for df_1 and df_2
# use a combination of set_index('Name') and on='Name' for df_2
joined_df=df_1.join(df_2.set_index('Name'),on='Name')
print(joined_df)
    Brand Location   Name  Age
a   Tesla       CA   Jake   28
b  Toyota       CA  Jacob   45
c   Tesla       NY   John   50
d    Ford       MA   Jess   39

Method 2

# 'Name' the common column name for df_1 and df_2
# use set_index('Name') for both df_1 and df_2
joined_df=df_1.set_index('Name').join(df_2.set_index('Name'))
print(joined_df)
        Brand Location  Age
Name                       
Jake    Tesla       CA   28
Jacob  Toyota       CA   45
John    Tesla       NY   50
Jess     Ford       MA   39

Further Reading