Essays /

15 Anova Essay

Essay preview


McGill University
Advanced Business Statistics
Read: Business Statistics (A Second Course),
2nd Custom Edition for McGill University
Chapter 11

ANOVA is a statistical test of significance for the equality of several (2 or more) population (treatment) means.
1. All populations are normally distributed.
2. All populations have the same variance:
12 = 22 = 32 = ........ = k2 = 2.
3. Independent random samples from each population.

Some Definitions

A factor is a variable that can take one of several
levels used to differentiate one group from another.
Example: Which of four advertising offers
mailed to sample households produces the
highest sales?
Example: Will a lower price in a plain mailing
draw more sales on average than a higher price
in a fancy brochure?

The Idea of ANOVA- Testing Equality
of means extended to more than two groups
Analysis of Variance (ANOVA) is the technique used to determine whether two or more population means are equal.
One-way ANOVA is used for completely randomized, one-way designs; that is, observations are taken at random at the different factor levels. Two-way ANOVA is used when there are more than one independent variable and multiple observations for each independent variable. The twoway ANOVA can not only determine the main effect of contributions of each independent variable but also identifies if there is a

significant interaction effect between the independent variables.

The ANOVA setting: comparing means
We want to know if the observed differences in sample means are likely to have occurred by chance just because of the random sampling – or are the means really different.

This will likely depend
both the the
between the sample
and how muchin
is within

Recall: The two-sample t statistic

A two sample t-test assuming equal variance and an ANOVA - F test -comparing only two groups will give you the exact same p-value (for a twosided hypothesis).

ANOVA approach

t – test approach

H0: =
Ha: ≠

H0: =
Ha: ≠

One-way ANOVA

t-test assuming equal variance



F = t2 and both p-values are the same.
But the t-test is more flexible: You may choose a one-sided
alternative instead, or you may want to run a t-test assuming unequal variance if you are not sure that your two populations have the same standard deviation  (In this case use
Satterthwaite’s formula for standard deviation).

The ANOVA data representation
The generic element

Yij = the ith observation from the jth treatment (population). Tj = the total of the jth sample.
nj = the size of the jth sample.

= Tj/nj = mean of the jth sample
n = nj = the total number of observations.

p = the number of treatments (populations).
Y = the overall (grand) mean of all data combined.
j = the mean of the jth treatment.
 = the common standard deviation for all treatments.

Some Formulas

T j   Yij
i 1

Yj 



i 1




nT   n j  n1  n2  ...  n p
j 1

Y 


 Y
j 1 i 1



Similar to ANOVA for regression, the total variation comes
from two sources Treatment (Between) and Error (Within):
Treatment Variation = Between Group Variation
Error Variation = Within Group Variation

ANOVA Test of Hypothesis
H0:  1 =  2 =  3 = ........ =  p (All the population means are equal). H1: Not all  j are equal (At least one of the means is different). TS: F 
CV: F ;


p - 1; nT - p

DR: Do not reject H0 if F  CV, Reject H0 if F* > CV.

The ANOVA F-test
The ANOVA F-statistic compares variation due to specific sources (levels of the factor) to variation among individuals who should be similar (individuals in the same sample).

variation among sample means
 MSTR...

Read more


+3 -0.55 -1 -1031 -1367 -1505 -164 -2281 -261 -3016 -3265 -372 -41 -6.70 .. /2 /5 /8.56 /k 0 0.018 0.044 0.05 0.20 0.3585 0.6415 0.701 0.8574 0.95 0.9520 0.99 005 01 016.07 1 1/k 10 100 1010669 102 102.00 10267 103 1091 11 11310 1179 12 13 14 15 15.37 16 17 18 18.0 18.000 18.28 18.51 1801 19 195.81 196 196.00 1x1 2 2.10 2.201 2.63 2.646 2.70 2.878 2.88 2.898 2.923 20 20.63 21 21.0 21.12 21.59 22 22.000 23 24 24.0 24.000 25.72 257.07 26 26.88 27 27.0 2897 2939 294 296.55 2k 2nd 2x2 3 3.082 3.72 3.899 3.98 32 3280 3684 4 4041 42 43.32 44 47 4776 48 49 5 5.97 50 51 51.00 52 52.04 55 558.19 56 57 6 60 603.93 61 62 63 64 66 669 7 70 71 757.66 78 8 8.55 8.56 80 83 8377 84 843.93 86 863.34 87 88 9 9.90 90 92 9310 937.45 94 94.00 95 9550 97 99.27 account actual adj adjust adult advanc advertis age age/gender also altern although alway among analysi anoth anova anova/factor approach appropri aremean assum assumpt averag b base block bonferroni box brochur busi calcul carri case categori caution chanc chapter choos ci cie cis citi clear click club collect combin come comment common compar comparison complet comput conclud conclus confid consid consist construct continu contribut convert correct cost cours critic custom cv data decid decis defin definit degre dentist depend design detect determin deviat df dferror dialog differ differenti distribut divid dollar dr draw due e economist edit effect either element enough equal equival error estim even evid exact exampl exist expens experi experiment explor extend f f-statist f-test f-valu f.05 factor factori factorlevel fals false-posit famili fanci femal first fit flexibl follow formula four freedom gender general generic give given grand graphic group h h0 h1 ha higher highest ho household howev hypothes hypothesi i.e idea identifi ij impli import includ increas independ indic individu instead interact interpret interv introduct investig ith j jth k k2 know l1 l2 l3 larg larger lawyer learn least length less let level like lower made mail main maintain male male/female matrix may mcgill mean melissa member mgsc minitab minut model modifi montreal ms msb mse mstr mstr/mse msw muchin multipl n n1 n1n2 n1n2r n2 n3 necessari need new next ni night nj normal notat note nt null number observ obtain occur offer offset one one-sid one-way option order otherwis ottawa output overal overlap owner p p-valu packag pairwis percent perform physician plain plot pool popul posit possibl power prepar price probabl problem procedur produc prof profession put r r-sq random rate rational read realli recal reduc refer regress reject relat relev replic represent research respons result run s2 sale sampl satterthwait second seen select senior set sever show side signific similar simplifi simultan sinc singl situat size slide small smith societi softwar solv sophist sourc special specif spend spent sq squar ss ssb sse ssto sstr ssw standard statist stay stdev student studi sum summari sure survey t-statist t-test t.005 t.025 t2 tabl take taken techniqu tend terminolog test therefor three thus time tj tj/nj toronto total treatment true ts tukey two two-factor two-sampl two-way twosid twoway twoway/factor type typic u1 u2 u3 undertaken unequ univers unstack use valu variabl varianc variat versus visit vs want way way/factor whether width wish within without wong word work worksheet would x1 x1x3 x2 x3 y y1 y2 y3 yi yield yij yj youonsay youth youth/adult/senior µ µ1 µ2 µ3 µ4 µ5 µadult µfemal µi µj µmale µp µsenior µyouth α α-valu αvalu μ1 μ2 μ3 μj σ