Outer Merge in Pandas

Introduction

Outer Merge returns all records from both the left or right dataframes. When rows in one dataframe do not match another dataframe, the merged dataframe will have NaN for the cells.

Outer Merge in Pandas
Outer Merge in Pandas

We can use how=’outer’ in merge() to outer merge two dataframes in Pandas. The basic syntax is as follows, in which df_1 and df_2 represent two dataframes to be merged.

Method 1:

df_1.merge(df_2, how=’outer’, left_index=True,right_index=True)

Method 2:

pd.merge(df_1, df_2, how=’outer’, left_index=True,right_index=True)

The following is an example showing how to outer merge two dataframes using merge().


Sample Data

import pandas as pd
# create a new dataframe called df_1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'], 
     'Location': ['CA', 'CA','NY','MA']},index=list('abcd'))
print("df_1: \n",df_1)

# create a new dataframe called df_2
df_2 = pd.DataFrame({ 
    'Name': ['Jake', 'Jacob','John','Jess','James']},index=list('abdek'))
print("df_2: \n",df_2)
df_1: 
     Brand Location
a   Tesla       CA
b  Toyota       CA
c   Tesla       NY
d    Ford       MA
df_2: 
     Name
a   Jake
b  Jacob
d   John
e   Jess
k  James

Method 1

# outer merge using merge()
merged_df=df_1.merge(df_2, how='outer',left_index=True,right_index=True)
print(joined_df)
    Brand Location   Name
a   Tesla       CA   Jake
b  Toyota       CA  Jacob
c   Tesla       NY    NaN
d    Ford       MA   John
e     NaN      NaN   Jess
k     NaN      NaN  James

As we can see, for the index of e (i.e., Jess) and k (i.e., James) in df_2, df_1 does not have the index of e and k. The merged df has NaN in the columns of Brand and Location.

Similarly, for the index c (i.e., Tesla and NY) in df_1, there is no such index in df_2. Thus, the merged df has NaN in the columns of Name for the row of “c Tesla NY NaN“.


Method 2

# outer merge using merge()
merged_df=pd.merge(df_1,df_2, how='outer',left_index=True,right_index=True)
print(joined_df)
    Brand Location   Name
a   Tesla       CA   Jake
b  Toyota       CA  Jacob
c   Tesla       NY    NaN
d    Ford       MA   John
e     NaN      NaN   Jess
k     NaN      NaN  James

As we can see, Method 2 and Method 1 generate the same result.


Further Reading