One-Way ANOVA is to compare the means of different groups, to see whether the mean difference is statistically significant. For instance, you would like to compare the average household size of three cities. You can collect 3 samples from these three cities and conduct a one-way ANOVA to check the difference.
Formulas of One-way ANOVA
The full name of ANOVA is Analysis of Variance. Thus, ANOVA is about partitioning the variance into different parts. Sum of Square Total (SSB)
is the total variance of all the observations. SSB can be separated into Sum of Squares Between (SSB)
and Sum of squares Error (SSE)
.
\[SST=SSB+SSE\]
The formulas of SSB and SSE
are as follows.
\[SSB=\sum_{i=1}^kn_i(\bar{x_i}-\bar{x})^2\]
\[SSE=\sum_{i=1}^{k}\sum_{j=1}^{n_i}(x_{ij}-\bar{x_i})^2\]
We also need to consider the degree of freedom, which leads to mean squares, namely Mean Square Between (MSB)
and Mean Square Error (MSE)
.
\[MSB=\frac{SSB}{k-1}\]
\[MSE=\frac{SSE}{n-k}\]
Finally, the F value is the ratio of MSB
and MSE
.
\[F(k-1,n-k)=\frac{MSB}{MSE}=\frac{\frac{SSB}{k-1}}{\frac{SSE}{n-k}}=\frac{\frac{\sum_{i=1}^kn_i(\bar{x_i}-\bar{x})^2}{k-1}}{\frac{\sum_{i=1}^{k}\sum_{j=1}^{n_i}(x_{ij}-\bar{x_i})^2}{n-k}}\]
Manual Calculation Example
Suppose we would like to see whether 3 cities differ in terms of household size. We sample 5 households from each city. The null hypothesis and alternative hypothesis for one-way ANOVA are as follows.
\[H_0: \mu_{city1}=\mu_{city2}=\mu_{city3}\]
\[H_1: \mu_{city1},\mu_{city2},\mu_{city3} \ are \ not \ all \ equal.\]
Group | Household Size | Group Mean | Overall Mean |
---|---|---|---|
City 1 | 6 | 4 | 3.4 |
City 1 | 2 | 4 | 3.4 |
City 1 | 3 | 4 | 3.4 |
City 1 | 4 | 4 | 3.4 |
City 1 | 5 | 4 | 3.4 |
City 2 | 2 | 3 | 3.4 |
City 2 | 1 | 3 | 3.4 |
City 2 | 3 | 3 | 3.4 |
City 2 | 4 | 3 | 3.4 |
City 2 | 5 | 3 | 3.4 |
City 3 | 4 | 3.2 | 3.4 |
City 3 | 1 | 3.2 | 3.4 |
City 3 | 2 | 3.2 | 3.4 |
City 3 | 4 | 3.2 | 3.4 |
City 3 | 5 | 3.2 | 3.4 |
\[SSB=\sum_{i=1}^kn_i(\bar{x_i}-\bar{x})^2=5 \times(4-3.4)^2+5\times(3-3.4)^2+5 \times (3.2-3.4)^2=2.8\]
\[\begin{equation}
\begin{aligned}
SSE=\sum_{i=1}^{k}\sum_{j=1}^{n_i}(x_{ij}-\bar{x_i})^2= & (6-4)^2+(2-4)^2+(3-4)^2+(4-4)^2+(5-4)^2+\\ &(2-3)^2+(1-3)^2+(3-3)^2+(4-3)^2+(5-3)^2+\\
&(4-3.2)^2+(1-3.2)^2+(2-3.2)^2+(4-3.2)^2+(5-3.2)^2 \\
= &30.8
\end{aligned}
\end{equation}\]
\[MSB=\frac{SSB}{k-1}=\frac{2.8}{3-1}=1.4\]
\[MSE=\frac{SSE}{n-k}=\frac{30.8}{15-3}=2.57\]
Finally, we can calculate the F-value
by calculating the ratio of MSB
and MSE
.
\[F(k-1,n-k)=\frac{MSB}{MSE}=F(2,12)=\frac{1.4}{2.57}=0.55\]
We can then check the F(2,12)
critical value table, and it is 3.89. Since the calculated F(2,12)
= 0.55 and it is smaller than 3.89, we fail to reject the null hypothesis. Thus, we conclude that we do not have evidence to reject the claim that all these three cities have the same household size.