How to Calculate Means by Group in R (2 Examples)

This tutorial is about how to calculate means by group in R. There are two methods of calculating means by groups in R, either aggregate() or dplyr.

Method 1: Use base R

aggregate(df$col_being_aggregated, list(df$col_1_groupby,df$col_2_groupby,...), FUN=mean)

Method 2: Use package of dplyr

library(dplyr) 
df%>% group_by(col_1_groupby,col_2_groupby,...) %>% summarise_at(vars(col_being_aggregated), list(name = mean))

Example for Method 1

The following is the data being used for the examples. It has two categorical variables (i.e., City and Brand) and one continuous variable (i.e., sales).

# download data from Github
df <- read.csv("https://raw.githubusercontent.com/TidyPython/interactions/main/city_brand_sales.csv")

# print out the dataframe
print(df)

Output:

    City  Brand sales
1  City1 brand1    70
2  City1 brand2    10
3  City1 brand1   100
4  City1 brand2     2
5  City1 brand1    30
6  City1 brand2     2
7  City1 brand1    20
8  City1 brand2    10
9  City1 brand1    20
10 City1 brand2    10
11 City2 brand1     9
12 City2 brand2    10
13 City2 brand1     5
14 City2 brand2     4
15 City2 brand1     4
16 City2 brand2     4
17 City2 brand1     5
18 City2 brand2     4
19 City2 brand1    12
20 City2 brand2    11

The following uses the aggregate() from base R to calculate the means. We group by two categorical variables, City and Brand.

# use aggregate() in base R to calculate means grouped by City and Brand
aggregate(df$sales, list(df$City,df$Brand), FUN=mean)

The following output shows the four means of all 4 cells.

  Group.1 Group.2    x
1   City1  brand1 48.0
2   City2  brand1  7.0
3   City1  brand2  6.8
4   City2  brand2  6.6

Example for Method 2

We are going to use the same data frame, namely df. Thus, I am not going to print it out again here.

The following uses dplyr to calculate means grouped by two categorical variables, City and Brand.

# use dplyr to calculate means grouped by City and Brand
library(dplyr)
df%>% group_by(City,Brand) %>% summarise_at(vars(sales), list(name = mean))

The following is the output. We can see that it prints out 4 means for all cells.

# A tibble: 4 x 3
# Groups:   City [2]
  City  Brand   name
  <chr> <chr>  <dbl>
1 City1 brand1  48  
2 City1 brand2   6.8
3 City2 brand1   7  
4 City2 brand2   6.6

Further Reading