Outer Join returns all records from both the left or right dataframes. When rows in one dataframe do not match another dataframe, the joined dataframe will have NaN for cells of the unmatched rows.
We can use how=’outer’ in join()
to outer join two dataframes in Pandas. The basic syntax is as follows, in which df_1 and df_2 represent two dataframes to be joined.
df_1.join(df_2, how=’outer’)
The following is an example showing how to outer join two dataframes using join()
.
Step 1: Prepare the data
import pandas as pd
# create a new dataframe called df_1
df_1 = pd.DataFrame({'Brand': ['Tesla', 'Toyota','Tesla','Ford'],
'Location': ['CA', 'CA','NY','MA']},index=list('abcd'))
print("df_1: \n",df_1)
# create a new dataframe called df_2
df_2 = pd.DataFrame({
'Name': ['Jake', 'Jacob','John','Jess','James']},index=list('abdek'))
print("df_2: \n",df_2)
df_1: Brand Location a Tesla CA b Toyota CA c Tesla NY d Ford MA df_2: Name a Jake b Jacob d John e Jess k James
Step 2: Use how=’outer’ in join()
# include how='outer' in join()
joined_df=df_1.join(df_2, how='outer')
print(joined_df)
Brand Location Name a Tesla CA Jake b Toyota CA Jacob c Tesla NY NaN d Ford MA John e NaN NaN Jess k NaN NaN James
Note that, the matching process (or, joining process) is based on the index. Thus, as we can see, for the index of e (i.e., Jess) and k (i.e., James) in df_2, df_1 does not have the index of e and k. The joined df has NaN in the columns of Brand and Location.
Similarly, for the index c (i.e., Tesla and NY) in df_1, there is no such index in df_2. Thus, the joined df has NaN in the columns of Name for the row of “c Tesla NY NaN“.