ANOVA Part IV: Bonferroni Correction | Statistics Tutorial #28 | MarinStatsLectures - YouTube

Channel: MarinStatsLectures-R Programming & Statistics

[0]
We are going to discuss a little bit about the idea of multiple comparisons
[4]
Corrections or multiple testing Corrections. So we're going to do this in
[9]
the context of one-way analysis of variance although the idea of multiple
[14]
testing correction applies to more than just one-way analysis of variance (ANOVA); so the
[20]
general idea of this correction is that when we do more than one test or more
[25]
than one comparison, our type 1 error rate starts to increase so for example
[30]
you'll recall that the Alpha we use for our test is our probability of making a
[35]
type 1 error, our false positive; if we do one test here with an alpha of 5%
[40]
there's a 5% chance of making a type 1 error if we do a second test with an
[45]
alpha of 5% that test also has a 5% chance of making a type 1 error. Combined
[50]
over these two tests there's going to be a greater than 5% chance of making at
[54]
least one type 1 error. So this is the concept that we want to correct for. The
[58]
more tests that we do simultaneously the greater chance there is for making a
[63]
type 1 error so we're going to learn a bit about how to control for that and
[67]
we'll get to that in a moment; so you'll recall we worked this example
[70]
comparing weight loss on four different diets comparing the mean weight loss
[74]
over diet A,B,C and D with an alternative hypothesis at least one of the means
[80]
differs: at least 1 diet has a mean weight loss that's different than the
[84]
rest. To do so we ran through a one-way analysis of variance (ANOVA), we calculated this
[89]
F statistic which was a ratio of variability and weight loss explained by
[94]
diet to variability and weight loss that's not explained by diet or the
[98]
variability that's happening between groups, variability within a group; and we
[102]
had a test statistic of 6.1 and resulting p-value point 0.0011 one and
[107]
then this led us to reject our null hypothesis or conclude we have evidence
[111]
to believe at least one mean differs from the rest; so now we need to decide
[116]
which one or which ones may differ so in order to do this what we're going to do
[121]
is we're going to do all pairwise
[129]
comparisons. Okay, and what I mean by that is we're going to compare diets A and
[136]
B, diets A and C, diets A and D, B and C, B and D and C and D; okay so we're to
[146]
compare all possible pairs of means. Mathematically we can think of this we
[151]
have four different groups and we're going to choose two of them to compare
[155]
how many different combinations of two groups can we pick from these four. You
[162]
might recall this ends up being 4 factorial over 2 factorial times 2
[166]
factorial which equals 6 okay 6 possible pairwise comparisons and
[172]
in order to do this what we're going to do is we're going to use our independent
[178]
two sample t-test type approach so we can do a t-test or a confidence interval
[191]
comparing each of the two pairs AB, AC, AD and so on. So we can either look at the
[198]
difference between Group one and group two plus or minus a t-value and again
[204]
this has some degrees of freedom, some confidence level times the standard
[209]
error for the difference in means or we can do the hypothesis test and
[218]
calculate a test statistic that's going to help tell us how far is the
[223]
difference in means we observed from the hypothesized value in terms of a
[229]
standard error. And as noted so here we're going to do six different pairwise
[234]
comparisons of the 6 tests and we've noted that as the number of tests we do
[239]
increases the chance of making a type 1 error increases as well! so we're going
[245]
to start to work on the idea of how often will a type 1 error happen and how
[250]
can we try and reduce that rate. So to work our way through this we're going to
[254]
assume that each of these pairwise comparisons is independent of the others;
[258]
that may not necessarily be a fully realistic assumption but:
[262]
first it's going to simplify some of the calculations so we can focus on
[265]
the concepts and it's also a little bit more conservative. so let's start by
[270]
thinking about what happens with each test: for each test and by that I mean
[278]
each of these individual comparisons that we're going to do if we use say an
[285]
alpha of 0.05 or 5% then essentially what we're saying there is
[291]
the probability of making a type 1 error on each test is 5% or we can think of it
[300]
for each of these comparisons each confidence interval we're going to use
[304]
confidence of 95% and so again if each of these tests has a 5% chance of making
[312]
a type 1 error the probability of not making a type 1 error on each of the
[318]
tests is 95%! all right 5% chance of a type 1 error
[323]
95% chance of not making a type 1 error now if we think over all of the tests so
[334]
over all the tests or all the comparisons the probability of making at
[340]
least one type 1 error
[347]
right again probably making a type 1 error here or here or here or here or
[352]
here or here is the probability of making at least one type 1 error we can write as 1
[358]
minus the probability of making no type 1 errors
[366]
and again the probability of making no type 1 errors we can write this as 1
[372]
minus the probability that we do not make a type 1 on the test comparing A
[380]
and B, times the probability that we don't make
[386]
a type 1 error on the test comparing A and C
[391]
all the way up to the last one, probability of No type one error on that final
[399]
test comparing C and D. But again we can multiply the individually here because
[405]
we assume each of these comparisons are independent alright so it simplified the
[409]
calculation for us a little bit now
[414]
it's a probability of not making a type 1 error on any of the given tests is 95
[419]
percent so 95 percent times 95 percent times 95 percent and you're going to see
[425]
we have that appearing six times so if you work this out it's going to
[431]
come out to be 0.265 or 26.5% so again remember what we worked out there the
[438]
probability of making at least one type one error over all six of these tests is
[443]
about 26.5 percent. Okay so you can see how our type one error
[446]
rate has inflated a lot with doing six comparisons: multiple
[450]
comparisons! so sometimes this gets called the familywize error rate or
[454]
other names like that so what we'd like to do is learn how to control this type
[458]
one error rate so it doesn't inflate to an extremely large value. there's lots of
[463]
different possible Corrections that we can do: what we're going to talk about is
[466]
one called Bonferroni's multiple testing correction and we're gonna do
[471]
this for a few reasons: the first reason is that it's the
[475]
simplest to teach and understand. All the other possible Corrections are the exact
[481]
same in concept with slight changes in the mechanics; okay so once we understand
[485]
Bonferroni's approach we can understand that the other ones are pretty similar
[488]
with some minor changes in there:
[491]
so Bonferroni's approach is to
[494]
use an adjusted alpha star again here we want to use the overall type 1 error
[501]
rate that we want divided by the number of comparisons;
[507]
here 0.05 and we're
[511]
new six different comparisons; so it's use an adjusted alpha of 0.00833
[518]
okay for each of the individual tests or we're looking at confidence right use
[527]
99.167 percent confidence for each of the individual confidence intervals
[534]
Let's see what that's going to do to the overall type 1 error rate
[537]
so again if we use this adjusted alpha that tells us the probability of a type
[544]
1 error on each of the individual tests is 0.00833 or 0.8%
[550]
or we're going to use 99.16% confidence over all tests.
[556]
it's again a 99.167%
[559]
chance of not making a type 1 error on each of the individual
[563]
tests so overall the tests it's a probability
[566]
of making at least one type one error because one minus the probability of
[570]
making no type 1 errors and here we're using 0.99167
[581]
and if you work that out you're gonna find that it comes out to roughly 0.049
[590]
or 4.9 percent. okay so Bonferroni's correction suggest use an alpha of
[598]
0.00833 for each of the individual tests or use 99.167%
[606]
confidence for each of the individual conference
[608]
intervals to have an overall type one error rate of about five percent; so the
[614]
probability of making at least one type one error over all six of these
[617]
comparisons is roughly five percent. So if we were to run through and do all the
[624]
possible pairwise comparisons building confidence intervals, we can see the common
[627]
rules for all six groups here comparing Group A to B, A to C, A to D and so on
[634]
So we don't want to focus our effort on how to plug into this formula to run through
[638]
these calculations we can easily have a piece of software do them for us what we
[642]
want to do is focus on the concepts and the interpretations. So taking a look at
[647]
these confidence intervals we can see that there's only two of them that do
[650]
not contain zero: we can see comparing Group A to Group C, the confidence interval
[656]
does not contain zero which gives us an indication there's a statistically
[660]
significant difference between Group A and C or we're not willing to accept
[664]
the difference is zero; we can also see group B and C are significantly
[669]
different. Other than that no other significant differences show up! I'm
[673]
gonna draw a little diagram here that helps us think about the conclusions
[677]
we're reaching here: so here we're saying we're confident that group C is
[682]
significantly greater than groups A or B: C and A are significantly different and
[688]
we can see that the mean for A is larger than the mean for A; C and B are
[693]
significantly different and again the mean for C is significantly larger than
[696]
it is for B, and I'm going to draw D here in the middle signifying we're not
[703]
convinced that C and D are different, we're not convinced that D and A are
[706]
different! so I want to spend a moment here now
[709]
just talking about some difficulty some people may be having with this
[714]
conclusion here I know when I first learned this material I struggled a
[717]
little bit with this concept; so some of you your brain may be going in the
[721]
direction if C is the same as D and D is the same as A, isn't C the same as A? ok
[728]
and that's true mathematically right if C equals D, D equals A, C must equal A; ok
[733]
but we're not saying that here what we're saying is we're not convinced C
[736]
and D are different! that's not the same as saying that they're the same
[740]
we're saying we're not sure they're different
[744]
again we're not convinced A and D are different! we are convinced C and A are
[749]
different. Now to get at this idea I also like to use an example that I think is a
[753]
an intuitive way to get at this: so suppose we all go out and do a
[758]
10 kilometre run, so we decide as a class let's go and get some exercise, run 10
[766]
kilometres, time ourselves and see who wins: suppose that the person who finished
[771]
first did it in 39 minutes and our second-place runner did it in 40 minutes
[779]
now without knowing too much about variability in the race times, just trying to
[784]
think intuitively you know I'm not convinced that the person who finished
[787]
in 39 minutes really is faster than the person who finished in 40. Right? if
[792]
they ran the race on a different day under different conditions
[795]
maybe they'd switch places only one minute separated them maybe that was a
[799]
chance variation or chance difference. Now let's suppose that the
[805]
third-place runner did it in 42 minutes okay again I'm not really convinced the
[809]
person that did in 40 minutes and the person did in 42 are significantly different
[813]
right, again they run on different days maybe they'd switch places only two
[817]
minutes separated them maybe that was a chance difference
[820]
let's continue on suppose the next person did it 43 minutes and again we're
[825]
not really convinced these two are different let's suppose the next one is
[829]
45 minutes then 49 minutes 51 minutes and so on now I'll tell you that I am
[838]
convinced that the person who did it in 51 minutes is significantly slower than
[844]
the person who did in 39 minutes okay and that gap is big enough that I think
[847]
that difference is real; any two days these to run I believe
[852]
this person will always be faster than that one! okay so that's what we're
[855]
getting at here that the distance between each of these might not be large
[860]
enough to be convinced that it's real; the gap between these two is big enough
[864]
that we think is not due to chance and that's similar to the conclusion we're
[868]
reaching here: okay the difference we saw between C and D was not big enough for
[873]
us to be convinced that statistically C really is larger than D the distance we
[878]
saw between the mean of D and the mean of A wasn't large enough to be convinced
[882]
that we think they're really significantly different but the
[886]
difference that we saw between Group C and Group A was large enough to believe that
[890]
we think statistically these are significantly different from one another.
[893]
Now I just want to end on some important reminders here: the first is the idea of
[897]
statistical significance: statistical significance versus clinical or
[902]
scientific significance; okay so we've talked a bit about this through different
[907]
videos but just a reminder that because something is statistically significant
[911]
doesn't necessarily mean that it's meaningful in a real world right so the
[915]
difference in mean weight loss between these dies while statistically
[919]
significant deciding if it's you know scientifically meaningful there's a
[924]
question that requires context looking at the actual effect size and the
[927]
numeric difference between the two not just if it's statistically significant;
[932]
another reminder is the trade-off between the type 1 error rate and the
[937]
type 2 error rate: when we increase one we decrease the other so just to
[942]
remind you here that in controlling this type 1 error rate and using a
[946]
lower alpha to get a lower familywize type 1 error rate we're increasing the
[952]
type 2 error rate. Or we're trying to make fewer false
[955]
positives at the expense of we're gonna make more false negatives! so just a
[958]
reminder that there is that trade-off there; another important reminder this
[963]
stuff generally we're going to do through the calculations using software
[966]
so we don't worry we want to focus our attention on the formulas or how do we
[970]
plug things in we show the formulas that we can understand the concepts and what
[974]
it is a piece of software is doing for us but a reminder we don't want to put
[977]
our focus on you know what exactly is the t value we should be using and how
[981]
do we find that in a table. right we can do all that with software that's not the
[985]
important skill to get through this stuff and the final reminder there's
[988]
many different methods of doing multiple testing Corrections we talked about
[993]
Bonferroni's; these other approaches are all the exact same: similar in concept
[997]
some of the mechanics may change slightly!
[1006]
thanks for watching and make sure to subscribe to marinstatslectures youtube channel!