🔍

ANOVA Part IV: Bonferroni Correction | Statistics Tutorial #28 | MarinStatsLectures - YouTube

Channel: MarinStatsLectures-R Programming & Statistics

[0]

We are going to discuss a little bit about the idea of multiple comparisons

[4]

Corrections or multiple testing Corrections. So we're going to do this in

[9]

the context of one-way analysis of variance although the idea of multiple

[14]

testing correction applies to more than just one-way analysis of variance (ANOVA); so the

[20]

general idea of this correction is that when we do more than one test or more

[25]

than one comparison, our type 1 error rate starts to increase so for example

[30]

you'll recall that the Alpha we use for our test is our probability of making a

[35]

type 1 error, our false positive; if we do one test here with an alpha of 5%

[40]

there's a 5% chance of making a type 1 error if we do a second test with an

[45]

alpha of 5% that test also has a 5% chance of making a type 1 error. Combined

[50]

over these two tests there's going to be a greater than 5% chance of making at

[54]

least one type 1 error. So this is the concept that we want to correct for. The

[58]

more tests that we do simultaneously the greater chance there is for making a

[63]

type 1 error so we're going to learn a bit about how to control for that and

[67]

we'll get to that in a moment; so you'll recall we worked this example

[70]

comparing weight loss on four different diets comparing the mean weight loss

[74]

over diet A,B,C and D with an alternative hypothesis at least one of the means

[80]

differs: at least 1 diet has a mean weight loss that's different than the

[84]

rest. To do so we ran through a one-way analysis of variance (ANOVA), we calculated this

[89]

F statistic which was a ratio of variability and weight loss explained by

[94]

diet to variability and weight loss that's not explained by diet or the

[98]

variability that's happening between groups, variability within a group; and we

[102]

had a test statistic of 6.1 and resulting p-value point 0.0011 one and

[107]

then this led us to reject our null hypothesis or conclude we have evidence

[111]

to believe at least one mean differs from the rest; so now we need to decide

[116]

which one or which ones may differ so in order to do this what we're going to do

[121]

is we're going to do all pairwise

[129]

comparisons. Okay, and what I mean by that is we're going to compare diets A and

[136]

B, diets A and C, diets A and D, B and C, B and D and C and D; okay so we're to

[146]

compare all possible pairs of means. Mathematically we can think of this we

[151]

have four different groups and we're going to choose two of them to compare

[155]

how many different combinations of two groups can we pick from these four. You

[162]

might recall this ends up being 4 factorial over 2 factorial times 2

[166]

factorial which equals 6 okay 6 possible pairwise comparisons and

[172]

in order to do this what we're going to do is we're going to use our independent

[178]

two sample t-test type approach so we can do a t-test or a confidence interval

[191]

comparing each of the two pairs AB, AC, AD and so on. So we can either look at the

[198]

difference between Group one and group two plus or minus a t-value and again

[204]

this has some degrees of freedom, some confidence level times the standard

[209]

error for the difference in means or we can do the hypothesis test and

[218]

calculate a test statistic that's going to help tell us how far is the

[223]

difference in means we observed from the hypothesized value in terms of a

[229]

standard error. And as noted so here we're going to do six different pairwise

[234]

comparisons of the 6 tests and we've noted that as the number of tests we do

[239]

increases the chance of making a type 1 error increases as well! so we're going

[245]

to start to work on the idea of how often will a type 1 error happen and how

[250]

can we try and reduce that rate. So to work our way through this we're going to

[254]

assume that each of these pairwise comparisons is independent of the others;

[258]

that may not necessarily be a fully realistic assumption but:

[262]

first it's going to simplify some of the calculations so we can focus on

[265]

the concepts and it's also a little bit more conservative. so let's start by

[270]

thinking about what happens with each test: for each test and by that I mean

[278]

each of these individual comparisons that we're going to do if we use say an

[285]

alpha of 0.05 or 5% then essentially what we're saying there is

[291]

the probability of making a type 1 error on each test is 5% or we can think of it

[300]

for each of these comparisons each confidence interval we're going to use

[304]

confidence of 95% and so again if each of these tests has a 5% chance of making

[312]

a type 1 error the probability of not making a type 1 error on each of the

[318]

tests is 95%! all right 5% chance of a type 1 error

[323]

95% chance of not making a type 1 error now if we think over all of the tests so

[334]

over all the tests or all the comparisons the probability of making at

[340]

least one type 1 error

[347]

right again probably making a type 1 error here or here or here or here or

[352]

here or here is the probability of making at least one type 1 error we can write as 1

[358]

minus the probability of making no type 1 errors

[366]

and again the probability of making no type 1 errors we can write this as 1

[372]

minus the probability that we do not make a type 1 on the test comparing A

[380]

and B, times the probability that we don't make

[386]

a type 1 error on the test comparing A and C

[391]

all the way up to the last one, probability of No type one error on that final

[399]

test comparing C and D. But again we can multiply the individually here because

[405]

we assume each of these comparisons are independent alright so it simplified the

[409]

calculation for us a little bit now

[414]

it's a probability of not making a type 1 error on any of the given tests is 95

[419]

percent so 95 percent times 95 percent times 95 percent and you're going to see

[425]

we have that appearing six times so if you work this out it's going to

[431]

come out to be 0.265 or 26.5% so again remember what we worked out there the

[438]

probability of making at least one type one error over all six of these tests is

[443]

about 26.5 percent. Okay so you can see how our type one error

[446]

rate has inflated a lot with doing six comparisons: multiple

[450]

comparisons! so sometimes this gets called the familywize error rate or

[454]

other names like that so what we'd like to do is learn how to control this type

[458]

one error rate so it doesn't inflate to an extremely large value. there's lots of

[463]

different possible Corrections that we can do: what we're going to talk about is

[466]

one called Bonferroni's multiple testing correction and we're gonna do

[471]

this for a few reasons: the first reason is that it's the

[475]

simplest to teach and understand. All the other possible Corrections are the exact

[481]

same in concept with slight changes in the mechanics; okay so once we understand

[485]

Bonferroni's approach we can understand that the other ones are pretty similar

[488]

with some minor changes in there:

[491]

so Bonferroni's approach is to

[494]

use an adjusted alpha star again here we want to use the overall type 1 error

[501]

rate that we want divided by the number of comparisons;

[507]

here 0.05 and we're

[511]

new six different comparisons; so it's use an adjusted alpha of 0.00833

[518]

okay for each of the individual tests or we're looking at confidence right use

[527]

99.167 percent confidence for each of the individual confidence intervals

[534]

Let's see what that's going to do to the overall type 1 error rate

[537]

so again if we use this adjusted alpha that tells us the probability of a type

[544]

1 error on each of the individual tests is 0.00833 or 0.8%

[550]

or we're going to use 99.16% confidence over all tests.

[556]

it's again a 99.167%

[559]

chance of not making a type 1 error on each of the individual

[563]

tests so overall the tests it's a probability

[566]

of making at least one type one error because one minus the probability of

[570]

making no type 1 errors and here we're using 0.99167

[581]

and if you work that out you're gonna find that it comes out to roughly 0.049

[590]

or 4.9 percent. okay so Bonferroni's correction suggest use an alpha of

[598]

0.00833 for each of the individual tests or use 99.167%

[606]

confidence for each of the individual conference

[608]

intervals to have an overall type one error rate of about five percent; so the

[614]

probability of making at least one type one error over all six of these

[617]

comparisons is roughly five percent. So if we were to run through and do all the

[624]

possible pairwise comparisons building confidence intervals, we can see the common

[627]

rules for all six groups here comparing Group A to B, A to C, A to D and so on

[634]

So we don't want to focus our effort on how to plug into this formula to run through

[638]

these calculations we can easily have a piece of software do them for us what we

[642]

want to do is focus on the concepts and the interpretations. So taking a look at

[647]

these confidence intervals we can see that there's only two of them that do

[650]

not contain zero: we can see comparing Group A to Group C, the confidence interval

[656]

does not contain zero which gives us an indication there's a statistically

[660]

significant difference between Group A and C or we're not willing to accept

[664]

the difference is zero; we can also see group B and C are significantly

[669]

different. Other than that no other significant differences show up! I'm

[673]

gonna draw a little diagram here that helps us think about the conclusions

[677]

we're reaching here: so here we're saying we're confident that group C is

[682]

significantly greater than groups A or B: C and A are significantly different and

[688]

we can see that the mean for A is larger than the mean for A; C and B are

[693]

significantly different and again the mean for C is significantly larger than

[696]

it is for B, and I'm going to draw D here in the middle signifying we're not

[703]

convinced that C and D are different, we're not convinced that D and A are

[706]

different! so I want to spend a moment here now

[709]

just talking about some difficulty some people may be having with this

[714]

conclusion here I know when I first learned this material I struggled a

[717]

little bit with this concept; so some of you your brain may be going in the

[721]

direction if C is the same as D and D is the same as A, isn't C the same as A? ok

[728]

and that's true mathematically right if C equals D, D equals A, C must equal A; ok

[733]

but we're not saying that here what we're saying is we're not convinced C

[736]

and D are different! that's not the same as saying that they're the same

[740]

we're saying we're not sure they're different

[744]

again we're not convinced A and D are different! we are convinced C and A are

[749]

different. Now to get at this idea I also like to use an example that I think is a

[753]

an intuitive way to get at this: so suppose we all go out and do a

[758]

10 kilometre run, so we decide as a class let's go and get some exercise, run 10

[766]

kilometres, time ourselves and see who wins: suppose that the person who finished

[771]

first did it in 39 minutes and our second-place runner did it in 40 minutes

[779]

now without knowing too much about variability in the race times, just trying to

[784]

think intuitively you know I'm not convinced that the person who finished

[787]

in 39 minutes really is faster than the person who finished in 40. Right? if

[792]

they ran the race on a different day under different conditions

[795]

maybe they'd switch places only one minute separated them maybe that was a

[799]

chance variation or chance difference. Now let's suppose that the

[805]

third-place runner did it in 42 minutes okay again I'm not really convinced the

[809]

person that did in 40 minutes and the person did in 42 are significantly different

[813]

right, again they run on different days maybe they'd switch places only two

[817]

minutes separated them maybe that was a chance difference

[820]

let's continue on suppose the next person did it 43 minutes and again we're

[825]

not really convinced these two are different let's suppose the next one is

[829]

45 minutes then 49 minutes 51 minutes and so on now I'll tell you that I am

[838]

convinced that the person who did it in 51 minutes is significantly slower than

[844]

the person who did in 39 minutes okay and that gap is big enough that I think

[847]

that difference is real; any two days these to run I believe

[852]

this person will always be faster than that one! okay so that's what we're

[855]

getting at here that the distance between each of these might not be large

[860]

enough to be convinced that it's real; the gap between these two is big enough

[864]

that we think is not due to chance and that's similar to the conclusion we're

[868]

reaching here: okay the difference we saw between C and D was not big enough for

[873]

us to be convinced that statistically C really is larger than D the distance we

[878]

saw between the mean of D and the mean of A wasn't large enough to be convinced

[882]

that we think they're really significantly different but the

[886]

difference that we saw between Group C and Group A was large enough to believe that

[890]

we think statistically these are significantly different from one another.

[893]

Now I just want to end on some important reminders here: the first is the idea of

[897]

statistical significance: statistical significance versus clinical or

[902]

scientific significance; okay so we've talked a bit about this through different

[907]

videos but just a reminder that because something is statistically significant

[911]

doesn't necessarily mean that it's meaningful in a real world right so the

[915]

difference in mean weight loss between these dies while statistically

[919]

significant deciding if it's you know scientifically meaningful there's a

[924]

question that requires context looking at the actual effect size and the

[927]

numeric difference between the two not just if it's statistically significant;

[932]

another reminder is the trade-off between the type 1 error rate and the

[937]

type 2 error rate: when we increase one we decrease the other so just to

[942]

remind you here that in controlling this type 1 error rate and using a

[946]

lower alpha to get a lower familywize type 1 error rate we're increasing the

[952]

type 2 error rate. Or we're trying to make fewer false

[955]

positives at the expense of we're gonna make more false negatives! so just a

[958]

reminder that there is that trade-off there; another important reminder this

[963]

stuff generally we're going to do through the calculations using software

[966]

so we don't worry we want to focus our attention on the formulas or how do we

[970]

plug things in we show the formulas that we can understand the concepts and what

[974]

it is a piece of software is doing for us but a reminder we don't want to put

[977]

our focus on you know what exactly is the t value we should be using and how

[981]

do we find that in a table. right we can do all that with software that's not the

[985]

important skill to get through this stuff and the final reminder there's

[988]

many different methods of doing multiple testing Corrections we talked about

[993]

Bonferroni's; these other approaches are all the exact same: similar in concept

[997]

some of the mechanics may change slightly!

[1006]

thanks for watching and make sure to subscribe to marinstatslectures youtube channel!

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage