Calculate Means Group by Two Columns in Pandas (3 Examples)

The following provides 3 different methods of calculating means group by two Columns in Python.

Method 1:

df.groupby([“column_1″,”column_2”]).mean()

Method 2:

df.groupby([“column_1″,”column_2”]).agg(‘mean’)

Method 3:

pd.crosstab(index=df[‘column_1’], columns=df[‘column_2’],values=df[‘dv’],aggfunc=’mean’)

Prepare the data

import numpy as np
import pandas as pd
# generate two arrays of city and store
city = np.repeat(['City1','City2'],5)
store = np.tile(['store1','store2'], 5)

# generate a list of sales
sales=[10,20,20,50,30,10,5,4,12,4]

# combine them into a dataframe
df=pd.DataFrame({'city':city, 'store':store,'sales':sales})
print(df)

Output:

    city   store  sales
0  City1  store1     10
1  City1  store2     20
2  City1  store1     20
3  City1  store2     50
4  City1  store1     30
5  City2  store2     10
6  City2  store1      5
7  City2  store2      4
8  City2  store1     12
9  City2  store2      4

Example 1 for Calculating Means Group by Two Columns in Pandas (Method 1)

Method 1 uses groupby() within Pandas to calcualte the mean. It groups the means by city and store.

# use groupby() and mean() to calculate means group by two columns 
output=df.groupby(["city","store"]).mean()

# print out the result
print(output)
              sales
city  store        
City1 store1   20.0
      store2   35.0
City2 store1    8.5
      store2    6.0

Example 2 for Calculating Means Group by Two Columns in Pandas (Method 2)

Method 2 uses groupby() as well, but it uses agg() and 'mean' to group means.

# use groupby() and agg('mean') to calculate means group by two columns 
output=df.groupby(["city","store"]).agg('mean')

# print out the result
print(output)
              sales
city  store        
City1 store1   20.0
      store2   35.0
City2 store1    8.5
      store2    6.0

Example 3 for Calculating Means Group by Two Columns in Pandas (Method 3)

Method 3 uses crosstab() to group means in Python. The following is the example code.

# use crosstab() and aggfunc to calculate means group by two columns 
result=pd.crosstab(index=df['city'], columns=df['store'],values=df['sales'],aggfunc='mean')

# print out the result
print(result)
store  store1  store2
city                 
City1    20.0    35.0
City2     8.5     6.0

Further Reading