Introduction
Outer Merge returns all records from both the left or right dataframes. When rows in one dataframe do not match another dataframe, the merged dataframe will have NaN for the cells.
We can use how=’outer’ in merge() to outer merge two dataframes in Pandas. The basic syntax is as follows, in which df_1 and df_2 represent two dataframes to be merged.
Method 1:
df_1.merge(df_2, how=’outer’, left_index=True,right_index=True)
Method 2:
pd.merge(df_1, df_2, how=’outer’, left_index=True,right_index=True)
The following is an example showing how to outer merge two dataframes using merge().
Sample Data
import pandas as pd
# create a new dataframe called df_1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'],
'Location': ['CA', 'CA','NY','MA']},index=list('abcd'))
print("df_1: \n",df_1)
# create a new dataframe called df_2
df_2 = pd.DataFrame({
'Name': ['Jake', 'Jacob','John','Jess','James']},index=list('abdek'))
print("df_2: \n",df_2)
df_1: Brand Location a Tesla CA b Toyota CA c Tesla NY d Ford MA df_2: Name a Jake b Jacob d John e Jess k James
Method 1
# outer merge using merge()
merged_df=df_1.merge(df_2, how='outer',left_index=True,right_index=True)
print(joined_df)
Brand Location Name a Tesla CA Jake b Toyota CA Jacob c Tesla NY NaN d Ford MA John e NaN NaN Jess k NaN NaN James
As we can see, for the index of e (i.e., Jess) and k (i.e., James) in df_2, df_1 does not have the index of e and k. The merged df has NaN in the columns of Brand and Location.
Similarly, for the index c (i.e., Tesla and NY) in df_1, there is no such index in df_2. Thus, the merged df has NaN in the columns of Name for the row of “c Tesla NY NaN“.
Method 2
# outer merge using merge()
merged_df=pd.merge(df_1,df_2, how='outer',left_index=True,right_index=True)
print(joined_df)
Brand Location Name a Tesla CA Jake b Toyota CA Jacob c Tesla NY NaN d Ford MA John e NaN NaN Jess k NaN NaN James
As we can see, Method 2 and Method 1 generate the same result.