Merge Two Dataframes in Pandas based on a Column

Method 1

Use merge() to merge two dataframes in Pandas based on a column. When the column name is the same in both dataframes, we can use the following statement to merge two dataframes based on that column. col_name is the name of the common column in df_1 and df_2.

df_1.merge(df_2, on=”col_name”)

The following is two dataframes that we are going to merge.

import pandas as pd
# create dataframe 1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'], 
     'Location': ['CA', 'CA','NY','MA'],
    'Name': ['Jake', 'Jacob','John','Jess']},index=list('abcd'))
print("df_1: \n",df_1)

# create dataframe 1
df_2= pd.DataFrame({ 
    'Name': ['Jake', 'Jacob','John','Jess','James'],
    'Age': [28, 45,50,39,40]},index=list('aijek'))
print("df_2: \n",df_2)
df_1: 
     Brand Location   Name
a   Tesla       CA   Jake
b  Toyota       CA  Jacob
c   Tesla       NY   John
d    Ford       MA   Jess
df_2: 
     Name  Age
a   Jake   28
i  Jacob   45
j   John   50
e   Jess   39
k  James   40
# merge two dataframes in pandas based on a column
merged_df=df_1.merge(df_2,on="Name")
print(merged_df)
    Brand Location   Name  Age
0   Tesla       CA   Jake   28
1  Toyota       CA  Jacob   45
2   Tesla       NY   John   50
3    Ford       MA   Jess   39

Method 2

When the column names are not the same but we want to use these two columns as the index to merge two dataframes, the following is the statement. col_name_df_1 is the column name in dataframe 1, whereas col_name_df_2 is the column name in dataframe 2.

df_1.merge(df_2, left_on=”col_name_df_1′, right_on=’col_name_df_2′)

The following is the two dataframes that we are going to merge. The column name in dataframe 1 is “Name”, whereas the column name in dataframe 2 is “Who”.

import pandas as pd
# create dataframe 1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'], 
     'Location': ['CA', 'CA','NY','MA'],
    'Name': ['Jake', 'Jacob','John','Jess']},index=list('abcd'))
print("df_1: \n",df_1)

# create dataframe 1
df_2= pd.DataFrame({ 
    'Who': ['Jake', 'Jacob','John','Jess','James'],
    'Age': [28, 45,50,39,40]},index=list('aijek'))
print("df_2: \n",df_2)
df_1: 
     Brand Location   Name
a   Tesla       CA   Jake
b  Toyota       CA  Jacob
c   Tesla       NY   John
d    Ford       MA   Jess
df_2: 
      Who  Age
a   Jake   28
i  Jacob   45
j   John   50
e   Jess   39
k  James   40
# merge two dataframes in pandas based on two different columns
merged_df=df_1.merge(df_2,left_on='Name',right_on='Who')
print(merged_df)
    Brand Location   Name    Who  Age
0   Tesla       CA   Jake   Jake   28
1  Toyota       CA  Jacob  Jacob   45
2   Tesla       NY   John   John   50
3    Ford       MA   Jess   Jess   39

Further Reading