🔍

p-values: What they are and how to interpret them - YouTube

Channel: StatQuest with Josh Starmer

[0]

[Music]

[0]

gonna talk about p values yeah

[5]

statquest

[8]

hello i'm josh starmer and welcome to

[10]

statquest today we're going to talk

[12]

about what p-values are and how to

[15]

interpret them

[17]

imagine i have two drugs

[20]

drug a

[21]

and drug b

[24]

and i want to know if drug a is

[26]

different from drug b

[29]

so i give one person drug a

[32]

and i give one other person drug b

[36]

the one person using drug a is cured

[40]

hooray

[43]

the one person using drug b is not cured

[47]

bummer

[49]

can we conclude that drug a is better

[51]

than drug b

[54]

nope

[55]

drug b may have failed for a lot of

[58]

different reasons

[60]

maybe this guy is taking a medication

[62]

that has a bad interaction with drug b

[66]

or maybe this guy has a rare allergy to

[68]

drug b

[70]

or maybe this guy didn't take drug b

[72]

properly and missed a dose

[75]

or maybe drug a doesn't actually work

[78]

and the placebo effect deserves all of

[81]

the credit

[83]

there are a lot of weird random things

[86]

that can happen when doing a test

[89]

and this means that we need to try each

[91]

drug on more than just one person each

[95]

so we redo the experiment but this time

[98]

we give each drug to two different

[100]

people

[102]

this time both people taking drug a are

[105]

cured

[107]

hooray

[109]

and one person taking drug b is cured

[112]

and one person is not cured

[115]

hooray

[116]

and bummer

[118]

is drug a better

[121]

are both drugs the same

[124]

we can't answer either of those

[126]

questions because maybe something weird

[128]

happened to this guy that caused drug b

[130]

to fail

[132]

or maybe something weird happened to

[134]

this guy like maybe the drug was

[136]

mislabeled and he actually took drug a

[139]

and that's why he was cured

[142]

so now we test the drugs on a lot of

[145]

different people

[147]

and these are the results

[150]

drug a cured a whole lot of people 1043

[155]

compared to the number of people it

[157]

didn't cure

[158]

[160]

in other words

[162]

99.7

[164]

of the 1046 people using drug a were

[167]

cured

[169]

in contrast drug b only cured a few

[172]

people

[173]

two

[175]

compared to the number of people it

[177]

didn't cure one thousand four hundred

[179]

thirty two

[181]

in other words only 0.1 percent of the

[186]

1434 people using drug b were cured

[191]

if these were the results then it would

[193]

be pretty obvious that drug a was better

[196]

than drug b

[199]

in other words it would seem unrealistic

[202]

to suppose that these results were just

[204]

random chance and that there is no real

[207]

difference between drug a and drug b

[211]

it's possible that some of these people

[213]

were cured by placebo

[216]

and some of these people were not cured

[219]

because of some rare allergy

[222]

but they are just too many people cured

[224]

by drug a and too few cured by drug b

[228]

for us to seriously think that these

[230]

results are just random and that drug a

[233]

is no better or worse than drug b

[237]

in contrast what if these were the

[240]

results

[242]

now only 37 percent of the people that

[245]

took drug a were cured

[248]

compared to 31 percent that took drug b

[253]

so drug a cured a larger percentage of

[255]

people

[257]

but given that no study is perfect and

[260]

there are always a few random things

[261]

that happen

[263]

how confident can we be that drug a is

[265]

superior

[267]

that's where the p-value comes in

[271]

p-values are numbers between 0 and 1

[274]

that in this example

[276]

quantify how confident we should be that

[279]

drug a is different from drug b

[282]

the closer a p-value is to zero

[285]

the more confidence we have that drug a

[288]

and drug b are different

[291]

so the question is how small does a

[294]

p-value have to be before we are

[296]

sufficiently confident that drug a is

[299]

different from drug b

[302]

in other words what threshold can we use

[304]

to make a good decision

[307]

in practice a commonly used threshold is

[310]

0.05

[313]

it means that if there is no difference

[315]

between drug a and drug b

[318]

and if we did this exact same experiment

[320]

a bunch of times

[322]

then only 5

[323]

of those experiments would result in the

[325]

wrong decision

[328]

yes

[329]

this is an awkward sentence

[332]

so let's go through an example and work

[335]

this out one step at a time

[339]

imagine i gave the same drug drug a to

[342]

two different groups

[345]

now

[346]

any differences in the results are 100

[348]

percent attributable to weird random

[351]

things

[352]

like a rare allergy in one person or a

[355]

strong placebo effect in another

[358]

in this case the p-value would be 0.9

[362]

which is way larger than 0.05

[367]

thus we would say that we fail to see a

[370]

difference between the two groups

[373]

if we repeated this same experiment a

[375]

lot of times

[377]

most of the time we would get similarly

[379]

large p values

[382]

however

[383]

every once in a while all of the people

[385]

with rare allergies might end up in the

[388]

group on the left

[390]

and all of the people with the strong

[392]

placebo reactions might end up in the

[394]

group on the right

[397]

as a result the p-value for this

[399]

specific run of the experiment is 0.01

[403]

since the results are pretty different

[407]

thus in this case we would say that the

[410]

two groups are different even though

[412]

they both took the same drug

[415]

oh no it's the dreaded terminology alert

[419]

getting a small p value when there is no

[421]

difference is called a false positive

[426]

[427]

0.05 threshold for p values means that 5

[431]

of the experiments where the only

[433]

differences come from weird random

[436]

things we'll generate a p-value smaller

[438]

than 0.05

[442]

in other words if there's no difference

[445]

between drug a and drug b

[447]

5 percent of the time we do the

[449]

experiment we will get a p-value less

[452]

than 0.05

[454]

aka a false positive

[457]

note if it is extremely important that

[460]

we are correct when we say the drugs are

[462]

different then we can use a smaller

[465]

threshold like 0.00001

[471]

using a threshold of 0.00001

[476]

means we would only get a false positive

[479]

once every 100 000 experiments

[483]

likewise if it's not that important for

[486]

example if we're trying to decide if the

[488]

ice cream truck will arrive on time

[491]

then we can use a larger threshold like

[493]

0.2

[496]

using a threshold of 0.2 means we are

[499]

willing to get a false positive two

[502]

times out of 10.

[504]

that said the most common threshold is

[507]

0.05

[509]

because trying to reduce the number of

[511]

false positives below 5

[514]

often costs more than it's worth

[517]

so if we calculate a p-value for this

[520]

experiment

[522]

and the p-value is less than 0.05

[526]

then we will decide that drug a is

[528]

different from drug b

[531]

that said

[532]

the p-value is actually 0.24

[536]

so we are not confident that drug a is

[538]

different from drug b

[541]

bam

[543]

okay

[544]

before we're done let me say two more

[546]

things about p-values

[549]

unfortunately the first thing i want to

[551]

say is just more terminology

[554]

in fancy statistical lingo the idea of

[558]

trying to determine if these drugs are

[560]

the same or not is called hypothesis

[562]

testing

[564]

the null hypothesis is that the drugs

[567]

are the same

[568]

and the p-value helps us decide if we

[571]

should reject the null hypothesis or not

[575]

small bam

[577]

okay

[579]

now that we have that fancy terminology

[581]

out of the way

[582]

the second thing i want to say is way

[584]

more interesting

[586]

while a small p-value helps us decide if

[589]

drug a is different from drug b

[592]

it does not tell us how different they

[594]

are

[596]

in other words you can have a small

[598]

p-value regardless of the size of

[601]

difference between drug a and drug b

[604]

the difference can be tiny or huge

[608]

for example this experiment gives us a

[611]

relatively large p-value

[613]

0.24

[614]

even though there is a six-point

[616]

difference between drug a and drug b

[621]

in contrast this experiment which

[623]

involves a lot more people gives us a

[626]

smaller p-value 0.04

[630]

even though

[631]

given the new data there is a one point

[634]

difference between drug a and drug b

[639]

in summary a small p-value does not

[642]

imply that the effect size or difference

[645]

between drug a and drug b is large

[649]

double bam

[653]

hooray

[654]

we've made it to the end of another

[656]

exciting stat quest if you liked this

[659]

stat quest and want to see more please

[661]

[662]

and if you want to support statquest

[664]

consider contributing to my patreon

[666]

campaign becoming a channel member

[669]

buying one or two of my original songs

[671]

or a t-shirt or a hoodie or just donate

[674]

the links are in the description below

[677]

alright until next time quest on

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage