Outer Join in Pandas

Outer Join returns all records from both the left or right dataframes. When rows in one dataframe do not match another dataframe, the joined dataframe will have NaN for cells of the unmatched rows.

Outer Join in Pandas
Outer Join in Pandas

We can use how=’outer’ in join() to outer join two dataframes in Pandas. The basic syntax is as follows, in which df_1 and df_2 represent two dataframes to be joined.

df_1.join(df_2, how=’outer’)

The following is an example showing how to outer join two dataframes using join().

Step 1: Prepare the data

import pandas as pd
# create a new dataframe called df_1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'], 
     'Location': ['CA', 'CA','NY','MA']},index=list('abcd'))
print("df_1: \n",df_1)

# create a new dataframe called df_2
df_2 = pd.DataFrame({ 
    'Name': ['Jake', 'Jacob','John','Jess','James']},index=list('abdek'))
print("df_2: \n",df_2)
df_1: 
     Brand Location
a   Tesla       CA
b  Toyota       CA
c   Tesla       NY
d    Ford       MA
df_2: 
     Name
a   Jake
b  Jacob
d   John
e   Jess
k  James

Step 2: Use how=’outer’ in join()

# include how='outer' in join()
joined_df=df_1.join(df_2, how='outer')
print(joined_df)
    Brand Location   Name
a   Tesla       CA   Jake
b  Toyota       CA  Jacob
c   Tesla       NY    NaN
d    Ford       MA   John
e     NaN      NaN   Jess
k     NaN      NaN  James

Note that, the matching process (or, joining process) is based on the index. Thus, as we can see, for the index of e (i.e., Jess) and k (i.e., James) in df_2, df_1 does not have the index of e and k. The joined df has NaN in the columns of Brand and Location.

Similarly, for the index c (i.e., Tesla and NY) in df_1, there is no such index in df_2. Thus, the joined df has NaN in the columns of Name for the row of “c Tesla NY NaN“.


Further Reading