This tutorial is about how to calculate means by group in R. There are two methods of calculating means by groups in R, either aggregate()
or dplyr
.
Method 1: Use base R
aggregate(df$col_being_aggregated, list(df$col_1_groupby,df$col_2_groupby,...), FUN=mean)
Method 2: Use package of dplyr
library(dplyr) df%>% group_by(col_1_groupby,col_2_groupby,...) %>% summarise_at(vars(col_being_aggregated), list(name = mean))
Example for Method 1
The following is the data being used for the examples. It has two categorical variables (i.e., City and Brand) and one continuous variable (i.e., sales).
# download data from Github
df <- read.csv("https://raw.githubusercontent.com/TidyPython/interactions/main/city_brand_sales.csv")
# print out the dataframe
print(df)
Output:
City Brand sales 1 City1 brand1 70 2 City1 brand2 10 3 City1 brand1 100 4 City1 brand2 2 5 City1 brand1 30 6 City1 brand2 2 7 City1 brand1 20 8 City1 brand2 10 9 City1 brand1 20 10 City1 brand2 10 11 City2 brand1 9 12 City2 brand2 10 13 City2 brand1 5 14 City2 brand2 4 15 City2 brand1 4 16 City2 brand2 4 17 City2 brand1 5 18 City2 brand2 4 19 City2 brand1 12 20 City2 brand2 11
The following uses the aggregate()
from base R to calculate the means. We group by two categorical variables, City and Brand.
# use aggregate() in base R to calculate means grouped by City and Brand
aggregate(df$sales, list(df$City,df$Brand), FUN=mean)
The following output shows the four means of all 4 cells.
Group.1 Group.2 x 1 City1 brand1 48.0 2 City2 brand1 7.0 3 City1 brand2 6.8 4 City2 brand2 6.6
Example for Method 2
We are going to use the same data frame, namely df
. Thus, I am not going to print it out again here.
The following uses dplyr
to calculate means grouped by two categorical variables, City and Brand.
# use dplyr to calculate means grouped by City and Brand
library(dplyr)
df%>% group_by(City,Brand) %>% summarise_at(vars(sales), list(name = mean))
The following is the output. We can see that it prints out 4 means for all cells.
# A tibble: 4 x 3 # Groups: City [2] City Brand name <chr> <chr> <dbl> 1 City1 brand1 48 2 City1 brand2 6.8 3 City2 brand1 7 4 City2 brand2 6.6