馃攳
ANOVA Part III: F Statistic and P Value | Statistics Tutorial #27 | MarinStatsLectures - YouTube
Channel: MarinStatsLectures-R Programming & Statistics
[0]
Let's build up the test statistic for
one-way analysis of variance so recall
[5]
we were working with this example
comparing the weight loss on one of four
[9]
diets A B C or D and we can see the
observations here as well as the summary
[15]
statistics the mean weight loss and
standard deviation of weight loss for
[18]
each of the four diets, we're working with
this null assuming that all means are
[23]
equal and an alternative at least one
differs from the rest. We previously
[28]
talked about how we can take the total
variability in weight loss or the total
[32]
sum of squares and separate it into two
parts that which is explained by the
[36]
diet and that which is not explained by
the diet, so let's look at how we can use
[42]
that to build up the test statistic:
first just a quick note on notation and
[45]
again we want to focus on the concepts
not on plugging into formulas but this
[51]
helps us with understanding the formula
and what's written in the notation so i is
[57]
used to index the group: Group one two
three or four k is to signify the
[65]
number of groups J is used to represent
the observation number within a group
[71]
Yij tells us the individual observations
in group i, observation number j, so for
[78]
example Y1,3 is the observed
value for Group 1 person number 3
[86]
Yi-bar is the mean for group i, Y-bar with
no subscript is the overall or grand
[94]
mean: the mean weight loss for everyone
in the study; Si is the standard
[99]
deviation for people in the group i, and
ni is the sample size for people in
[104]
group i, so we saw that we can take the
total variability and separate it into two
[112]
parts that which is explained by diet
and that we signified as the variance
[119]
between diets or sometimes called the
mean square between and that was the sum
[126]
of squares between divided by its
degrees of freedom
[133]
the degrees of freedom between groups and looking at the formula and again we don't want to
[138]
get stuck on this but this helps us see
the concepts we just learned from a
[141]
slightly different angle you can think
of this as we're going to sum over all the
[146]
groups so Group one two three four
what's the sample size in each group and
[152]
how far is the group specific mean from
the overall mean squared divided by its
[158]
degrees of freedom if we work that out
for example we'd find that the sum of
[165]
squares between groups or the explained
sum of squares it's 97.3
[169]
degrees of freedom 3 right
four groups minus one and that's going
[174]
to come out to 32.4t, we also saw we can think of the unexplained
[179]
and again this is the variability that's
not explained by diet or not explained
[185]
by X, this is the variability going on
within a group or the mean square within
[193]
and again this is the sum of squares
within groups divided by the degrees of
[200]
freedom within and formulaically we can
think of summing over all observations
[211]
how far is each individual from their
group specific mean square divided by
[219]
three degrees of freedom.
Right again we have n observations and
[223]
we lose K degrees of freedom by
estimating the K group means, we can also
[228]
express this as summing over the groups
[235]
each group sample size minus one times the variance the
[242]
sample variance of each group divided by
its degrees of freedom and the reason
[247]
why I write it this way is you can take
a moment yourself to note that this here
[252]
is the exact formula for the pooled
variance that we talked about in the two
[256]
sample t-test assuming equal variance in
the two groups we're taking a the sample
[263]
variance of each group weighted by their
degrees of freedom. okay so you can take
[268]
a moment yourself to work your way
through and convince yourself that this
[273]
within group variance is the exact same
as the pooled variance in the two sample
[278]
t-test assuming equal variance if you
work this out for example you're going
[283]
to find the sum of squares within a
group is 297 its degrees of freedom 56
[289]
and this comes out to be 5.3 so as noted
we want to compare these two to each
[295]
other the mean square between groups to
the mean square within groups and
[300]
the average sum of squares that can
be explained by diet the average sum of
[304]
squares that cannot be explained by diet
so let's try and think our way through
[308]
some stuff first suppose if the
alternative hypothesis is true if at
[317]
least one mean differs if not all the
means are the same how would we expect
[326]
statistically how we expect the mean
squares between groups to compare to the
[333]
mean square within group if diets are
different we'd expect this one should be
[338]
larger than this okay there should be
much more variability that's explained
[342]
by diet than not explained by diet if we
take a ratio of these and this is going
[349]
to be what we call our F statistic and
or our test statistic
[353]
it's a mean squared between groups over
the mean square within groups if we
[359]
expect the top to be larger than the
bottom we expect this test statistic to
[363]
be larger than 1, if
on the other hand our null hypothesis is
[370]
true if all the means are equal at the
population level what would we expect to
[378]
see we'd expect the mean squares between, okay the variability that's explained by
[385]
diet, to be roughly the same as the mean
square within or the variability that's
[391]
not explained by diet when looking at an
F statistic you're taking the ratio of
[399]
these two we'd expect that to come out
to be roughly 1 if we do this for our
[408]
set of data our F statistic 32.4 over
the 5.3 that's going to come out to be
[418]
6.1 okay so the larger our F statistic
gets the more evidence we have that the
[425]
alternative is likely true or the null
is false well we don't want to get too
[430]
caught up on looking things up in tables
it's important to note that this F
[436]
statistic follows what's called an F
distribution it has degrees of freedom
[445]
for the numerator and degrees of freedom
for the denominator so it has degrees of
[451]
freedom for the numerator which are K
minus 1 right those are the degrees of
[457]
freedom of what's in the numerator and
has degrees of freedom for the
[461]
denominator n-k, ok so again
piece of software can do all these
[469]
calculations for you we don't want to
focus on an F table and looking up an
[473]
exact p-value from that table
so let's just jump to the interpretation
[478]
if we were going to work this out
looking at a table or using a piece of
[482]
software p value is going to tell us
like it always does what's the
[487]
probability of our observed test
statistic or one even more extreme if the
[492]
null is true and we'd expect it to be 1
so what's the probability of getting an
[498]
F stat greater
or equal to 6.1 if it should
[508]
be roughly equal to 1 if our null is
true we'd expect our test that to be
[513]
roughly 1, what's the chance of seeing
an estimate of 6.1 or more
[518]
you'll find that this comes out to 0.0011
[522]
okay roughly 0.1 percent so again if our
null is true if all these diets are the
[527]
same the chance of seeing an F stat like
this or the differences we saw or even
[532]
larger is only going to happen about 0.1
percent of the time that gives us
[537]
evidence to reject our null hypothesis
we have evidence to believe the
[541]
alternative is likely true we have
evidence to believe at least one diet
[545]
differs from the rest so now we need to
decide which diets might differ from the
[550]
others and to do that we're going to
compare all possible pairwise means
[553]
that's a topic we're going to get to
talking about in a moment
[558]
Thanks! for more videos please subscribe to marinstatslectures
Most Recent Videos:
You can go back to the homepage right here: Homepage





