Foundations of ANOVA – Assumptions and Hypotheses for One-Way ANOVA (12-3) - YouTube

Channel: Research By Design

[0]
Let's look into the assumptions, the
[11]
hypotheses, and how to determine the
[13]
critical value for a one-way ANOVA test.
[16]
Here are the requirements for a one-way
[19]
ANOVA; that is an ANOVA conducted with a
[21]
single factor, typically with three or
[24]
more levels. The one-way ANOVA is a
[26]
parametric procedure. We are using
[29]
sampled data to estimate population
[32]
parameters. Any time we look at data
[35]
drawn from a population, we are using
[37]
parametric procedures. A one-way ANOVA
[41]
can compare three or more groups.
[43]
Whenever your independent variable has
[46]
three or more levels, use ANOVA, not a
[49]
t-test. Although ANOVA is designed for
[52]
three or more groups, ANOVA can be
[54]
conducted with only two groups (i.e.when
[57]
you would typically use an independent
[59]
samples t-test.) In fact, you can convert
[62]
between t and F where t equals the
[66]
square root of F. Just remember that a
[69]
t-test can be either positive or
[71]
negative, unlike the f test that is
[74]
always positive, because it is calculated
[76]
with squared values. On the other hand,
[79]
the sign of a t-value only tells us the
[82]
direction of change; whether the first
[84]
group or the second group has a higher
[85]
mean. If you were to switch the groups,
[87]
the sign would also switch. But if you do
[90]
an ANOVA with just two groups and you
[92]
want to calculate t, just be aware that
[94]
the square root of F will always be
[97]
positive. The independent variable groups
[100]
must be independent. "Independent" means
[103]
that the scores at one level do not
[106]
influence the probability of scores in
[108]
another level. The groups are not
[111]
influencing one another. The samples
[114]
should not be related. If the samples are
[118]
related, then you would use a
[120]
repeated-measures ANOVA. So here are the
[124]
assumptions that must be satisfied for
[125]
us to use the one-way ANOVA.
[129]
The independent variable (the IV) is categorical
[133]
or nominal, with three or more
[135]
groups. The samples should be randomly
[139]
selected. The samples can be randomly
[142]
assigned, such as in an experimental
[144]
design, but you could also use them
[147]
naturally occurring, in what is called a
[149]
"quasi-experimental design." The sample
[153]
sizes for each group should be roughly
[155]
equal. Unequal sample sizes increase the
[159]
likelihood of Type I error rates.
[162]
What do I mean by "roughly equal"? As a
[164]
rule, the largest n divided by the
[168]
smallest n of your groups should not
[171]
exceed 1.5. The dependent variable must
[175]
be quantitative, at the scale level, so
[178]
that's interval or ratio. Now, some people
[181]
will tell you that if you use Likert
[182]
survey scales (like one = "strongly
[185]
disagree" up to 5 = "strongly agree"),
[188]
that those data are ordinal and you have
[190]
to use non-parametric statistics. In fact,
[194]
John Dawes (2008) demonstrated that when
[198]
Likert survey scales have five or more
[201]
item options, they function like scale
[204]
data, and they can be used with
[207]
parametric statistics, like this one way
[209]
ANOVA. Check for outliers in your data
[213]
cleaning, using the Explore command. You
[215]
should delete or Windsorize severe
[218]
outliers. Scores in one group should not
[222]
influence the probability of scores in
[224]
another group. This is called
[226]
"independence." The participants in one
[228]
group are in only one group; they are not
[231]
in neither, they are not in both.
[233]
Normality: the scores on the dependent
[237]
variable within each group should be
[239]
approximately normally distributed. The
[243]
Shapiro Wilk test, and/or a QQ plot, or a
[245]
PP plot ,or a box plot are all used to
[248]
test for normality. See my video about
[251]
the Central Limit Theorem for a fuller
[253]
discussion about the implications of
[256]
normality. If the dependent variables are
[259]
not normally distributed, or if the
[261]
sample sizes are not roughly equal, use
[264]
the nonparametric Kruskal-Wallis
[267]
one-way ANOVA test. However, the parametric
[270]
one-way ANOVA that we are learning about
[272]
now is "robust" as long as there is a
[275]
minimum of 30 subjects in each sample.
[278]
"Robust" means that the Type I error rate
[281]
does not increase if the assumptions are
[284]
violated. Also, the groups should have
[287]
Homogeneity of Variance, which you will
[290]
test for using Levene's test for
[292]
equality of variances, just as we did
[295]
with the t-test. If the dependent
[297]
variable groups fail this assumption,
[299]
rerun the ANOVA and report Welch's ANOVA
[303]
instead. Here are the hypotheses for a
[307]
one-way ANOVA. Like a t-test, the null
[310]
hypothesis for ANOVA is that all samples
[312]
are drawn from a population whose means
[314]
are equal; therefore, all of the sample
[318]
means are the same. The null
[320]
hypothesis might be written as H0:
[323]
mu 1 equals mu 2, equals mu 3. The
[330]
alternative hypothesis is that the means
[333]
of the three groups differ. And remember
[335]
we do not know which means differ, so the
[338]
alternative hypothesis might be written
[340]
as H1: mu 1 does not equal mu 2,
[343]
does not equal mu 3. Unless instructed
[347]
otherwise, you should use the alpha of
[349]
0.05 for your level of significance for
[353]
the Omnibus ANOVA test. The critical
[356]
value will be determined based upon your
[358]
degrees of freedom and the Alpha level
[361]
both of which are needed to use the
[363]
ANOVA FTL I'll explain this in detail
[366]
next you can safely assume alpha equals
[370]
point 05 two-tailed test on the other
[373]
hand you might protest that ANOVA is
[376]
always a one-tailed test because the F
[378]
distribution has only one tail true but
[383]
the ANOVA is an omnibus test and the
[386]
post-hoc tests are the ones that really
[388]
matter we could use the post-hoc tests
[392]
to hypothesize directional group
[394]
differences but instead we're just going
[397]
to assume that all groups are equal and
[399]
let the post-hoc tests tell us which
[401]
groups
[401]
differ and in which direction if we
[404]
wanted to do directional post-hoc tests
[407]
we would actually be better off to use a
[410]
different technique called planned
[412]
contrasts more about that in another
[415]
video finding the critical value for a
[419]
one-way ANOVA requires knowing the
[421]
degrees of freedom for the numerator and
[423]
the denominator of the F ratio these are
[427]
called the degrees of freedom between
[428]
the degrees of freedom within and the
[431]
degrees of freedom total the degrees of
[434]
freedom between equals K minus 1 where k
[439]
is the number of categories or levels
[442]
this is the degrees of freedom for the
[444]
numerator so in our example where we
[447]
have 20 people who are randomly assigned
[449]
to four groups k equals 4 so the degrees
[453]
of freedom between would be four
[455]
categories of diet minus one or three
[459]
degrees of freedom within equals n minus
[465]
K where n is the total number of
[468]
participants and K is the total number
[471]
of categories this is the degrees of
[473]
freedom for the denominator the degrees
[476]
of freedom within would be the total
[477]
number of participants 20 minus the
[481]
total number of categories for or 16 the
[486]
degrees of freedom total equals n minus
[489]
1 as it always does with n being the
[492]
total number of participants in all
[494]
groups there are 20 participants so the
[498]
degrees of freedom total would be 19
[501]
finding the critical value requires
[504]
turning to the ANOVA f the degrees of
[509]
freedom between are contained in the
[512]
columns at the top of the table the
[515]
values for the degrees of freedom within
[517]
are in the rows the intersection of the
[521]
column and the row is the critical value
[524]
so for three degrees of freedom between
[527]
and 16 degrees of freedom within the
[531]
critical value is 3.2 for
[534]
and that is written cv 3 comma 16 both
[539]
of those in parentheses equals 3.24
[542]
let's revisit that assumption of
[546]
homogeneity of variance the one way
[549]
ANOVA test assumes that there is equal
[552]
variance of scores across groups IE
[555]
homogeneity of variance homogeneity
[558]
means of the same nature or the same
[561]
kind of variability Levene's test for
[565]
equality of variances is a test of
[568]
whether the variances of the samples or
[570]
groups are approximately equal or
[572]
homogeneous now this is the same test
[575]
that we learned about with T tests we
[578]
will ask for this when we run the ANOVA
[580]
on the computer however as long as your
[585]
sample sizes are greater than 30 and the
[588]
groups have essentially the same number
[590]
of people in each one in other words n 1
[593]
equals n 2 equals and 3 then anova is
[596]
robust to violations of homogeneity of
[599]
variance robust means that the type 1
[602]
error rate does not increase if the
[604]
assumptions are violated another unique
[608]
feature of the ANOVA test is something
[610]
called the ANOVA summary table because
[613]
there is so much information to know
[614]
about in the ANOVA model and the sources
[617]
of variance it is much clearer to put
[620]
all of this information into a table the
[624]
ANOVA summary table tells us seven
[626]
things number one the source of
[629]
variability between within total that
[634]
goes in column one second is the degrees
[638]
of freedom also between within and total
[641]
degrees of freedom goes in column two
[644]
column 3 is the sum of squares
[647]
associated with each between within and
[651]
total column 4 is variance also called
[656]
the mean square so mean square think of
[659]
average of the squares the mean square
[663]
is the sum of squares divided by the
[665]
degrees of freedom for each row
[668]
you only need the mean square for the
[671]
first two rows between and within fifth
[674]
is the F ratio the F ratio is the mean
[679]
square between / the mean square within
[682]
the F ratio tells us that there is 17.6
[687]
19 times more variability between then
[691]
there is within groups p less than 0 0 1
[695]
as you can see if you have degrees of
[698]
freedom and the sum of squares you could
[702]
calculate the rest of the table from
[705]
these values a sixth is a column for
[708]
probability values and another column
[711]
for a 2 squared the effect size and just
[715]
so that you know those numbers in blue
[718]
one through seven above each of the
[719]
columns were added for ease of
[721]
identifying the columns you should not
[724]
include those numbers in your actual and
[725]
ova summary table when you use spss to
[729]
conduct an ANOVA the results will be
[731]
presented to you in an ANOVA table like
[733]
this you would include the ANOVA table
[736]
in your write-up for your research paper
[738]
I have also created this table titled
[741]
elements of the one-way ANOVA summary
[744]
table as a reference for learning about
[746]
the ANOVA summary table this table is a
[749]
useful guide for what information goes
[751]
in each part of the table for example
[754]
under degrees of freedom on the road
[756]
between you see k minus 1 that is the
[759]
formula for degrees of freedom between
[761]
this table is not meant to be explained
[764]
in a video but rather to serve as a
[766]
reference as you are constructing your
[768]
own anova summary table