馃攳
ANOVA Part IV: Bonferroni Correction | Statistics Tutorial #28 | MarinStatsLectures - YouTube
Channel: MarinStatsLectures-R Programming & Statistics
[0]
We are going to discuss a little bit
about the idea of multiple comparisons
[4]
Corrections or multiple testing
Corrections. So we're going to do this in
[9]
the context of one-way analysis of
variance although the idea of multiple
[14]
testing correction applies to more than
just one-way analysis of variance (ANOVA); so the
[20]
general idea of this correction is that
when we do more than one test or more
[25]
than one comparison, our type 1 error
rate starts to increase so for example
[30]
you'll recall that the Alpha we use for
our test is our probability of making a
[35]
type 1 error, our false positive; if
we do one test here with an alpha of 5%
[40]
there's a 5% chance of making a type 1
error if we do a second test with an
[45]
alpha of 5% that test also has a 5%
chance of making a type 1 error. Combined
[50]
over these two tests there's going to be
a greater than 5% chance of making at
[54]
least one type 1 error. So this is the
concept that we want to correct for. The
[58]
more tests that we do simultaneously the
greater chance there is for making a
[63]
type 1 error so we're going to learn a
bit about how to control for that and
[67]
we'll get to that in a moment; so
you'll recall we worked this example
[70]
comparing weight loss on four different
diets comparing the mean weight loss
[74]
over diet A,B,C and D with an alternative
hypothesis at least one of the means
[80]
differs: at least 1 diet has a mean
weight loss that's different than the
[84]
rest. To do so we ran through a one-way
analysis of variance (ANOVA), we calculated this
[89]
F statistic which was a ratio of
variability and weight loss explained by
[94]
diet to variability and weight loss
that's not explained by diet or the
[98]
variability that's happening between
groups, variability within a group; and we
[102]
had a test statistic of 6.1 and
resulting p-value point 0.0011 one and
[107]
then this led us to reject our null
hypothesis or conclude we have evidence
[111]
to believe at least one mean differs
from the rest; so now we need to decide
[116]
which one or which ones may differ so in
order to do this what we're going to do
[121]
is we're going to do all pairwise
[129]
comparisons. Okay, and what I mean by that is we're going to compare diets A and
[136]
B, diets A and C, diets A and D, B and C, B
and D and C and D; okay so we're to
[146]
compare all possible pairs of means.
Mathematically we can think of this we
[151]
have four different groups and we're
going to choose two of them to compare
[155]
how many different combinations of two
groups can we pick from these four. You
[162]
might recall this ends up being 4
factorial over 2 factorial times 2
[166]
factorial which equals 6
okay 6 possible pairwise comparisons and
[172]
in order to do this what we're going to
do is we're going to use our independent
[178]
two sample t-test type approach so we
can do a t-test or a confidence interval
[191]
comparing each of the two pairs AB, AC, AD and so on. So we can either look at the
[198]
difference between Group one and group
two plus or minus a t-value and again
[204]
this has some degrees of freedom, some
confidence level times the standard
[209]
error for the difference in means or we
can do the hypothesis test and
[218]
calculate a test statistic that's going
to help tell us how far is the
[223]
difference in means we observed from the
hypothesized value in terms of a
[229]
standard error. And as noted so here
we're going to do six different pairwise
[234]
comparisons of the 6 tests and we've
noted that as the number of tests we do
[239]
increases the chance of making a type 1
error increases as well! so we're going
[245]
to start to work on the idea of how
often will a type 1 error happen and how
[250]
can we try and reduce that rate. So to
work our way through this we're going to
[254]
assume that each of these pairwise
comparisons is independent of the others;
[258]
that may not necessarily be a fully
realistic assumption but:
[262]
first it's going to simplify some
of the calculations so we can focus on
[265]
the concepts and it's also a little bit
more conservative. so let's start by
[270]
thinking about what happens with each
test: for each test and by that I mean
[278]
each of these individual comparisons
that we're going to do if we use say an
[285]
alpha of 0.05 or 5% then
essentially what we're saying there is
[291]
the probability of making a type 1 error
on each test is 5% or we can think of it
[300]
for each of these comparisons each
confidence interval we're going to use
[304]
confidence of 95% and so again if each
of these tests has a 5% chance of making
[312]
a type 1 error the probability of not
making a type 1 error on each of the
[318]
tests is 95%!
all right 5% chance of a type 1 error
[323]
95% chance of not making a type 1 error
now if we think over all of the tests so
[334]
over all the tests or all the
comparisons the probability of making at
[340]
least one type 1 error
[347]
right again probably making a type 1
error here or here or here or here or
[352]
here or here is the probability of making at
least one type 1 error we can write as 1
[358]
minus the probability of making no type
1 errors
[366]
and again the probability of making no
type 1 errors we can write this as 1
[372]
minus the probability that we do not
make a type 1 on the test comparing A
[380]
and B, times the probability that we don't make
[386]
a type 1 error on the test comparing A
and C
[391]
all the way up to the last one, probability
of No type one error on that final
[399]
test comparing C and D. But again we can
multiply the individually here because
[405]
we assume each of these comparisons are
independent alright so it simplified the
[409]
calculation for us a little bit now
[414]
it's a probability of not making a type
1 error on any of the given tests is 95
[419]
percent so 95 percent times 95 percent
times 95 percent and you're going to see
[425]
we have that appearing six times
so if you work this out it's going to
[431]
come out to be 0.265 or 26.5% so again
remember what we worked out there the
[438]
probability of making at least one type
one error over all six of these tests is
[443]
about 26.5 percent. Okay
so you can see how our type one error
[446]
rate has inflated a lot with doing six
comparisons: multiple
[450]
comparisons! so sometimes this gets
called the familywize error rate or
[454]
other names like that so what we'd like
to do is learn how to control this type
[458]
one error rate so it doesn't inflate to
an extremely large value. there's lots of
[463]
different possible Corrections that we
can do: what we're going to talk about is
[466]
one called Bonferroni's multiple
testing correction and we're gonna do
[471]
this for a few reasons:
the first reason is that it's the
[475]
simplest to teach and understand. All the
other possible Corrections are the exact
[481]
same in concept with slight changes in
the mechanics; okay so once we understand
[485]
Bonferroni's approach we can understand
that the other ones are pretty similar
[488]
with some minor changes in there:
[491]
so Bonferroni's approach is to
[494]
use an adjusted alpha star again here we
want to use the overall type 1 error
[501]
rate that we want divided by the number
of comparisons;
[507]
here 0.05 and we're
[511]
new six different comparisons; so it's
use an adjusted alpha of 0.00833
[518]
okay for each of the individual tests or
we're looking at confidence right use
[527]
99.167 percent confidence for each of
the individual confidence intervals
[534]
Let's see what that's going to do to the
overall type 1 error rate
[537]
so again if we use this adjusted alpha
that tells us the probability of a type
[544]
1 error on each of the individual tests
is 0.00833 or 0.8%
[550]
or we're going to use 99.16% confidence over all tests.
[556]
it's again a 99.167%
[559]
chance of not making a
type 1 error on each of the individual
[563]
tests
so overall the tests it's a probability
[566]
of making at least one type one error
because one minus the probability of
[570]
making no type 1 errors and here we're
using 0.99167
[581]
and if you work that out you're gonna
find that it comes out to roughly 0.049
[590]
or 4.9 percent. okay so Bonferroni's
correction suggest use an alpha of
[598]
0.00833 for each of the individual tests or use 99.167%
[606]
confidence for each of the individual conference
[608]
intervals to have an overall type one
error rate of about five percent; so the
[614]
probability of making at least one type
one error over all six of these
[617]
comparisons is roughly five percent. So
if we were to run through and do all the
[624]
possible pairwise comparisons building
confidence intervals, we can see the common
[627]
rules for all six groups here comparing
Group A to B, A to C, A to D and so on
[634]
So we don't want to focus our effort on how to plug into this formula to run through
[638]
these calculations we can easily have a
piece of software do them for us what we
[642]
want to do is focus on the concepts and
the interpretations. So taking a look at
[647]
these confidence intervals we can see
that there's only two of them that do
[650]
not contain zero: we can see comparing
Group A to Group C, the confidence interval
[656]
does not contain zero which gives us an
indication there's a statistically
[660]
significant difference between Group A
and C or we're not willing to accept
[664]
the difference is zero; we can also see
group B and C are significantly
[669]
different. Other than that no other
significant differences show up! I'm
[673]
gonna draw a little diagram here that
helps us think about the conclusions
[677]
we're reaching here: so here we're saying
we're confident that group C is
[682]
significantly greater than groups A or B: C and A are significantly different and
[688]
we can see that the mean for A is larger
than the mean for A; C and B are
[693]
significantly different and again the
mean for C is significantly larger than
[696]
it is for B, and I'm going to draw D here
in the middle signifying we're not
[703]
convinced that C and D are different,
we're not convinced that D and A are
[706]
different!
so I want to spend a moment here now
[709]
just talking about some difficulty some
people may be having with this
[714]
conclusion here I know when I first
learned this material I struggled a
[717]
little bit with this concept; so some of
you your brain may be going in the
[721]
direction if C is the same as D and D is
the same as A, isn't C the same as A? ok
[728]
and that's true mathematically right if
C equals D, D equals A, C must equal A; ok
[733]
but we're not saying that here what
we're saying is we're not convinced C
[736]
and D are different! that's not
the same as saying that they're the same
[740]
we're saying we're not sure they're
different
[744]
again we're not convinced A and D are
different! we are convinced C and A are
[749]
different. Now to get at this idea I also
like to use an example that I think is a
[753]
an intuitive way to get at this:
so suppose we all go out and do a
[758]
10 kilometre run, so we decide as a class
let's go and get some exercise, run 10
[766]
kilometres, time ourselves and see who wins: suppose that the person who finished
[771]
first did it in 39 minutes and our
second-place runner did it in 40 minutes
[779]
now without knowing too much about variability in the race times, just trying to
[784]
think intuitively you know I'm not
convinced that the person who finished
[787]
in 39 minutes really is faster than
the person who finished in 40. Right? if
[792]
they ran the race on a different day
under different conditions
[795]
maybe they'd switch places only one
minute separated them maybe that was a
[799]
chance variation or chance
difference. Now let's suppose that the
[805]
third-place runner did it in 42 minutes
okay again I'm not really convinced the
[809]
person that did in 40 minutes and the person did in 42 are significantly different
[813]
right, again they run on different days
maybe they'd switch places only two
[817]
minutes separated them maybe that was a
chance difference
[820]
let's continue on suppose the next
person did it 43 minutes and again we're
[825]
not really convinced these two are
different let's suppose the next one is
[829]
45 minutes then 49 minutes 51 minutes
and so on now I'll tell you that I am
[838]
convinced that the person who did it in
51 minutes is significantly slower than
[844]
the person who did in 39 minutes okay
and that gap is big enough that I think
[847]
that difference is real;
any two days these to run I believe
[852]
this person will always be faster than
that one! okay so that's what we're
[855]
getting at here that the distance
between each of these might not be large
[860]
enough to be convinced that it's real;
the gap between these two is big enough
[864]
that we think is not due to chance and
that's similar to the conclusion we're
[868]
reaching here: okay the difference we saw
between C and D was not big enough for
[873]
us to be convinced that statistically C
really is larger than D the distance we
[878]
saw between the mean of D and the mean
of A wasn't large enough to be convinced
[882]
that we think they're really
significantly different but the
[886]
difference that we saw between Group C and Group A was large enough to believe that
[890]
we think statistically these are
significantly different from one another.
[893]
Now I just want to end on some important
reminders here: the first is the idea of
[897]
statistical significance: statistical
significance versus clinical or
[902]
scientific significance; okay so we've
talked a bit about this through different
[907]
videos but just a reminder that because
something is statistically significant
[911]
doesn't necessarily mean that it's
meaningful in a real world right so the
[915]
difference in mean weight loss between
these dies while statistically
[919]
significant deciding if it's you know
scientifically meaningful there's a
[924]
question that requires context looking
at the actual effect size and the
[927]
numeric difference between the two not
just if it's statistically significant;
[932]
another reminder is the trade-off
between the type 1 error rate and the
[937]
type 2 error rate: when we increase
one we decrease the other so just to
[942]
remind you here that in controlling this
type 1 error rate and using a
[946]
lower alpha to get a lower familywize
type 1 error rate we're increasing the
[952]
type 2 error rate.
Or we're trying to make fewer false
[955]
positives at the expense of we're gonna
make more false negatives! so just a
[958]
reminder that there is that trade-off
there; another important reminder this
[963]
stuff generally we're going to do
through the calculations using software
[966]
so we don't worry we want to focus our
attention on the formulas or how do we
[970]
plug things in we show the formulas that
we can understand the concepts and what
[974]
it is a piece of software is doing for
us but a reminder we don't want to put
[977]
our focus on you know what exactly is
the t value we should be using and how
[981]
do we find that in a table. right we can
do all that with software that's not the
[985]
important skill to get through this
stuff and the final reminder there's
[988]
many different methods of doing multiple
testing Corrections we talked about
[993]
Bonferroni's; these other approaches are
all the exact same: similar in concept
[997]
some of the mechanics may change slightly!
[1006]
thanks for watching and make sure to subscribe to marinstatslectures youtube channel!
Most Recent Videos:
You can go back to the homepage right here: Homepage





