There are two methods to join two dataframes in Pandas based on a column. (Note that, by default, join()
will use index to combine two dataframes.)
Method 1:
We can use a combination of set_index('col_name')
and on='col_name'
for df_2. col_name is a common column in both df_1 and df_2.
df_1.join(df_2.set_index(‘col_name’),on=’col_name’)
Method 2:
We can use a combination of set_index('col_name')
for both df_1 and df_2. col_name is a common column in both df_1 and df_2.
df_1.set_index(‘col_name’).join(df_2.set_index(‘col_name’))
The following is two dataframes that we are going to use in the following examples.
import pandas as pd
# create dataframe 1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'],
'Location': ['CA', 'CA','NY','MA'],
'Name': ['Jake', 'Jacob','John','Jess']},index=list('abcd'))
print("df_1: \n",df_1)
# create dataframe 1
df_2= pd.DataFrame({
'Name': ['Jake', 'Jacob','John','Jess','James'],
'Age': [28, 45,50,39,40]},index=list('aijek'))
print("df_2: \n",df_2)
df_1: Brand Location Name a Tesla CA Jake b Toyota CA Jacob c Tesla NY John d Ford MA Jess df_2: Name Age a Jake 28 i Jacob 45 j John 50 e Jess 39 k James 40
Method 1
# 'Name' the common column name for df_1 and df_2
# use a combination of set_index('Name') and on='Name' for df_2
joined_df=df_1.join(df_2.set_index('Name'),on='Name')
print(joined_df)
Brand Location Name Age a Tesla CA Jake 28 b Toyota CA Jacob 45 c Tesla NY John 50 d Ford MA Jess 39
Method 2
# 'Name' the common column name for df_1 and df_2
# use set_index('Name') for both df_1 and df_2
joined_df=df_1.set_index('Name').join(df_2.set_index('Name'))
print(joined_df)
Brand Location Age Name Jake Tesla CA 28 Jacob Toyota CA 45 John Tesla NY 50 Jess Ford MA 39