The following provides 3 different methods of calculating means group by two Columns in Python.
Method 1:
df.groupby([“column_1″,”column_2”]).mean()
Method 2:
df.groupby([“column_1″,”column_2”]).agg(‘mean’)
Method 3:
pd.crosstab(index=df[‘column_1’], columns=df[‘column_2’],values=df[‘dv’],aggfunc=’mean’)
Prepare the data
import numpy as np
import pandas as pd
# generate two arrays of city and store
city = np.repeat(['City1','City2'],5)
store = np.tile(['store1','store2'], 5)
# generate a list of sales
sales=[10,20,20,50,30,10,5,4,12,4]
# combine them into a dataframe
df=pd.DataFrame({'city':city, 'store':store,'sales':sales})
print(df)
Output:
city store sales 0 City1 store1 10 1 City1 store2 20 2 City1 store1 20 3 City1 store2 50 4 City1 store1 30 5 City2 store2 10 6 City2 store1 5 7 City2 store2 4 8 City2 store1 12 9 City2 store2 4
Example 1 for Calculating Means Group by Two Columns in Pandas (Method 1)
Method 1 uses groupby() within Pandas to calcualte the mean. It groups the means by city and store.
# use groupby() and mean() to calculate means group by two columns
output=df.groupby(["city","store"]).mean()
# print out the result
print(output)
sales city store City1 store1 20.0 store2 35.0 City2 store1 8.5 store2 6.0
Example 2 for Calculating Means Group by Two Columns in Pandas (Method 2)
Method 2 uses groupby() as well, but it uses agg() and 'mean'
to group means.
# use groupby() and agg('mean') to calculate means group by two columns
output=df.groupby(["city","store"]).agg('mean')
# print out the result
print(output)
sales city store City1 store1 20.0 store2 35.0 City2 store1 8.5 store2 6.0
Example 3 for Calculating Means Group by Two Columns in Pandas (Method 3)
Method 3 uses crosstab() to group means in Python. The following is the example code.
# use crosstab() and aggfunc to calculate means group by two columns
result=pd.crosstab(index=df['city'], columns=df['store'],values=df['sales'],aggfunc='mean')
# print out the result
print(result)
store store1 store2 city City1 20.0 35.0 City2 8.5 6.0