Method 1
Use merge() to merge two dataframes in Pandas based on a column. When the column name is the same in both dataframes, we can use the following statement to merge two dataframes based on that column. col_name is the name of the common column in df_1 and df_2.
df_1.merge(df_2, on=”col_name”)
The following is two dataframes that we are going to merge.
import pandas as pd
# create dataframe 1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'],
'Location': ['CA', 'CA','NY','MA'],
'Name': ['Jake', 'Jacob','John','Jess']},index=list('abcd'))
print("df_1: \n",df_1)
# create dataframe 1
df_2= pd.DataFrame({
'Name': ['Jake', 'Jacob','John','Jess','James'],
'Age': [28, 45,50,39,40]},index=list('aijek'))
print("df_2: \n",df_2)
df_1: Brand Location Name a Tesla CA Jake b Toyota CA Jacob c Tesla NY John d Ford MA Jess df_2: Name Age a Jake 28 i Jacob 45 j John 50 e Jess 39 k James 40
# merge two dataframes in pandas based on a column
merged_df=df_1.merge(df_2,on="Name")
print(merged_df)
Brand Location Name Age 0 Tesla CA Jake 28 1 Toyota CA Jacob 45 2 Tesla NY John 50 3 Ford MA Jess 39
Method 2
When the column names are not the same but we want to use these two columns as the index to merge two dataframes, the following is the statement. col_name_df_1 is the column name in dataframe 1, whereas col_name_df_2 is the column name in dataframe 2.
df_1.merge(df_2, left_on=”col_name_df_1′, right_on=’col_name_df_2′)
The following is the two dataframes that we are going to merge. The column name in dataframe 1 is “Name”, whereas the column name in dataframe 2 is “Who”.
import pandas as pd
# create dataframe 1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'],
'Location': ['CA', 'CA','NY','MA'],
'Name': ['Jake', 'Jacob','John','Jess']},index=list('abcd'))
print("df_1: \n",df_1)
# create dataframe 1
df_2= pd.DataFrame({
'Who': ['Jake', 'Jacob','John','Jess','James'],
'Age': [28, 45,50,39,40]},index=list('aijek'))
print("df_2: \n",df_2)
df_1: Brand Location Name a Tesla CA Jake b Toyota CA Jacob c Tesla NY John d Ford MA Jess df_2: Who Age a Jake 28 i Jacob 45 j John 50 e Jess 39 k James 40
# merge two dataframes in pandas based on two different columns
merged_df=df_1.merge(df_2,left_on='Name',right_on='Who')
print(merged_df)
Brand Location Name Who Age 0 Tesla CA Jake Jake 28 1 Toyota CA Jacob Jacob 45 2 Tesla NY John John 50 3 Ford MA Jess Jess 39