Advanced Business Statistics
Read: Business Statistics (A Second Course),
2nd Custom Edition for McGill University
ANOVA is a statistical test of significance for the equality of several (2 or more) population (treatment) means.
1. All populations are normally distributed.
2. All populations have the same variance:
12 = 22 = 32 = ........ = k2 = 2.
3. Independent random samples from each population.
A factor is a variable that can take one of several
levels used to differentiate one group from another.
Example: Which of four advertising offers
mailed to sample households produces the
Example: Will a lower price in a plain mailing
draw more sales on average than a higher price
in a fancy brochure?
The Idea of ANOVA- Testing Equality
of means extended to more than two groups
Analysis of Variance (ANOVA) is the technique used to determine whether two or more population means are equal.
One-way ANOVA is used for completely randomized, one-way designs; that is, observations are taken at random at the different factor levels. Two-way ANOVA is used when there are more than one independent variable and multiple observations for each independent variable. The twoway ANOVA can not only determine the main effect of contributions of each independent variable but also identifies if there is a
significant interaction effect between the independent variables.
The ANOVA setting: comparing means
We want to know if the observed differences in sample means are likely to have occurred by chance just because of the random sampling – or are the means really different.
This will likely depend
both the the
between the sample
and how muchin
Recall: The two-sample t statistic
A two sample t-test assuming equal variance and an ANOVA - F test -comparing only two groups will give you the exact same p-value (for a twosided hypothesis).
t – test approach
t-test assuming equal variance
F = t2 and both p-values are the same.
But the t-test is more flexible: You may choose a one-sided
alternative instead, or you may want to run a t-test assuming unequal variance if you are not sure that your two populations have the same standard deviation (In this case use
Satterthwaite’s formula for standard deviation).
The ANOVA data representation
The generic element
Yij = the ith observation from the jth treatment (population). Tj = the total of the jth sample.
nj = the size of the jth sample.
= Tj/nj = mean of the jth sample
n = nj = the total number of observations.
p = the number of treatments (populations).
Y = the overall (grand) mean of all data combined.
j = the mean of the jth treatment.
= the common standard deviation for all treatments.
T j Yij
nT n j n1 n2 ... n p
j 1 i 1
Similar to ANOVA for regression, the total variation comes
from two sources Treatment (Between) and Error (Within):
Treatment Variation = Between Group Variation
Error Variation = Within Group Variation
SSTR = SSB; SSE = SSW
MSTR = MSB; MSE = MSW
ANOVA Test of Hypothesis
H0: 1 = 2 = 3 = ........ = p (All the population means are equal). H1: Not all j are equal (At least one of the means is different). TS: F
CV: F ;
p - 1; nT - p
DR: Do not reject H0 if F CV, Reject H0 if F* > CV.
The ANOVA F-test
The ANOVA F-statistic compares variation due to specific sources (levels of the factor) to variation among individuals who should be similar (individuals in the same sample).
variation among sample means