Null Hypothesis, p-Value, Statistical Significance, Type 1 Error and Type 2 Error - YouTube

Channel: unknown

[0]
Distinguished future physicians welcome to Stomp on Step 1 the only free videos series
[5]
that helps you study more efficiently by focusing on the highest yield material.
[10]
I’m Brian McDaniel and I will be your guide on this journey through Null Hypothesis, Alternative
[15]
Hypothesis, Type I and Type II Error, p-Value, alpha, beta, power & Statistical Significance.
[23]
This is the 11th video in my playlist covering all of biostatistics and Epidemiology for
[30]
the USMLE Step 1 Medical Board Exam.
[33]
There is a lot to cover but, we will try to move through things quickly and break them
[37]
down into bite sized pieces.
[40]
We will start with the Null Hypothesis which is represented by H subscript zero.
[46]
The null hypothesis states that there no difference between the groups being studied.
[51]
In other words there is no relationship between the risk factor or treatment being studied
[56]
and occurrence of the health outcomes.
[61]
For example, if we are comparing a placebo group to a group receiving a new diabetes
[67]
medication then then null hypothesis states that the blood sugars or medical complications
[74]
would be roughly the same in each group.
[76]
We will talk about this more in a second, but by default you assume the null hypothesis
[82]
is correct until you have enough evidence to support rejecting this hypothesis.
[89]
If you are the researcher it is usually kind of a bummer when the null hypothesis is valid,
[94]
because it means you didn’t find a treatment that works or that the risk factor you are
[99]
studying isn’t as important as you were hoping.
[103]
The Alternative Hypothesis is denoted by H subscript a or H1.
[110]
As you might expect it is the opposite of the null hypothesis.
[115]
This hypothesis states that there is a difference between groups.
[120]
The research groups are different with regard to what is being studied.
[124]
In other words there is a relationship between the risk factor or treatment and occurrence
[130]
of the health outcome Obviously, the researcher wants the alternative
[135]
hypothesis to be true.
[136]
If the Ha is true it means they discovered a treatment that improves patient outcomes
[143]
or identified a risk factor that is important in the development of a health outcome.
[149]
However, you never prove the alternative hypothesis is true.
[154]
You can only reject a hypothesis (say it is false) or fail to reject a hypothesis (could
[162]
be true but you can never be totally sure).
[166]
So a researcher really wants to reject the null hypothesis, because that is as close
[171]
as they can get to proving the alternative hypothesis is true.
[176]
In other words you can’t prove a given treatment caused a change in outcomes, but you can show
[182]
that that conclusion is valid by showing that the opposite hypothesis (or the null hypothesis)
[189]
is highly improbable given your data.
[193]
Anytime you reject a hypothesis there is a chance you made a mistake.
[198]
This would mean you rejected a hypothesis that is true or failed to reject a hypothesis
[203]
that is false.
[205]
Type 1 Error is when you incorrectly rejecting the null hypothesis.
[211]
The researcher says there is a difference between the groups when there really isn’t.
[216]
It can be thought of as a false positive study result.
[221]
Usually we focus on the null hypothesis and type 1 error, because the researchers want
[226]
to show a difference between groups.
[228]
If there is any intentional or unintentional bias it more likely exaggerates the differences
[235]
between groups based on this desire.
[239]
The probability of making a Type I Error is called alpha.
[243]
You can remember this by thinking that alpha is the first letter in the greek alphabet
[248]
so it goes with type 1 error.
[251]
I’m gonna hold off on talking about alpha and p-value for a few slides.
[256]
Type 2 Error is when you fail to reject the null when you should have rejected the null
[261]
hypothesis.
[263]
The researcher says there is no difference between the groups when there is a real difference.
[269]
It can be thought of as a false negative study result.
[273]
The probability of making a Type II Error is called beta.
[277]
You can remember this by thinking that ÎČ is the second letter in the greek alphabet.
[284]
Power is the probability of finding a difference between groups if one truly exists.
[289]
It is the percentage chance that you will be able to reject the null hypothesis if it
[297]
is really false.
[300]
Power can also be thought of as the probability of not making a type 2 error.
[306]
In equation form, Power equals 1 minus beta.
[311]
It is good for a study to have high power.
[315]
A cutoff for differentiating high from low power would be roughly around 0.8 or 80%.
[325]
In other words, having a beta less than 20% for a given study is good.
[331]
Where power comes into play most often is while the study is being designed.
[336]
Before you even start the study you may do power calculations based on projections.
[341]
That way you can tweak the design of the study before you start it and potentially avoid
[347]
performing an entire study that has really low power since you are unlikely to learn
[353]
anything.
[355]
Power increases as you increase sample size, because you have more data from which to make
[360]
a conclusion.
[362]
Power also increases as the effect size or actual difference between the group’s increases.
[368]
If you are trying to detect a huge difference between groups it is a lot easier than detecting
[374]
a very small difference between groups.
[377]
Increasing the precision (or decreasing standard deviation) of your results also increases
[383]
power.
[385]
If all of the results you have are very similar it is easier to come to a conclusion than
[390]
if your results are all over the place.
[394]
p-value is the probability of obtaining a result at least as extreme as the current
[399]
one, assuming that the null hypothesis is true.
[404]
Imagine we did a study comparing a placebo group to a group that received a new blood
[409]
pressure medication and the mean blood pressure in the treatment group was 20 mm Hg lower
[416]
than the placebo group.
[419]
Assuming the null hypothesis is correct the p-value is the probability that if we repeated
[425]
the study the observed difference between the group averages would be at least 20.
[431]
Now you have probably picked up on the fact that I keep adding the caveat that this definition
[437]
of the p-value only holds true if the null hypothesis is correct (AKA if is no real difference
[444]
between the groups).
[445]
However, don’t let that throw you off.
[448]
You just assume this is the case in order to perform this test because we have to start
[453]
from somewhere.
[454]
It is not as if you have to prove the null hypothesis is true before you utilize the
[458]
p-value.
[460]
The p-value is a measurement to tell us how much the observed data disagrees with the
[465]
null hypothesis.
[468]
When the p-value is very small there is more disagreement of our data with the null hypothesis
[474]
and we can begin to consider rejecting the null hypothesis (AKA saying there is a real
[480]
difference between the groups being studied).
[483]
In other words, when the p-value is very small our data suggests it is less likely that the
[489]
groups being studied are the same.
[491]
Therefore, when the p-value is very low our data is incompatible with the null hypothesis
[497]
and we will reject the null hypothesis.
[499]
When the p-value is high there is less disagreement between our data and the null hypothesis.
[507]
In other words, when the p-value is high it is more likely that the groups being studied
[512]
are the same.
[513]
In this scenario we will likely fail to reject the null hypothesis.
[519]
You may be wondering what determines whether a p-value is “low” or “high.”
[525]
That is where the selected “Level of Significance” or Alpha comes in.
[530]
As we have already discussed Alpha is the probability of making a Type I Error (or the
[536]
probability of incorrectly rejecting the null hypothesis).
[539]
It is a selected cut off point that determines whether we consider a p-value acceptably high
[546]
or low.
[547]
If our p-value is lower than alpha we conclude that there is a statistically significant
[553]
difference between groups.
[555]
When the p-value is higher than our significance level we conclude that the observed difference
[561]
between groups is not statistically significant.
[566]
Alpha is arbitrarily defined.
[569]
A 5% level of significance is most commonly used in medicine based only on the consensus
[575]
of researchers.
[577]
Using a 5% alpha implies that having a 5% probability of incorrectly rejecting the null
[584]
hypothesis is acceptable.
[587]
Therefore, other alphas such as 10% or 1% are used in certain situations.
[595]
So here is the key that you need to understand.
[597]
In most cases in medicine, if the p value of a study is less than 5% then there is a
[606]
statistically significant difference between groups.
[609]
If the p-value is more than 5% than there is not a statistically significant difference
[616]
between groups.
[618]
There are a couple caveats that complicate things a bit.
[621]
Both are related to how you can’t take statistics out of context to make conclusions.
[628]
Statistical significance is not the same things as clinical significance.
[633]
Clinical Significance is the practical importance of the finding.
[638]
There may be statistically significant difference between 2 drugs, but the difference is so
[644]
small that using one over the other is not a big deal.
[649]
For example, you might show a new blood pressure medication is a statistically significant
[654]
improvement over an older drug, but if the new drug only lowers blood pressure on average
[660]
by 1 more mm Hg it won’t have a meaningful impact on the outcomes that are important
[667]
to patients.
[669]
It is also often incorrectly stated (by students, researchers, review books etc.) that “p-Value
[676]
can be used to determine that the observed difference between groups is due to chance
[681]
(or random sampling error).”
[684]
In other words, “if my p-Value is less than alpha then there is less than a 5% probability
[690]
that the null hypothesis is true.”
[693]
While this may be easier to understand and perhaps may even be enough of an understanding
[698]
to get test questions right it is a misinterpretation of p-value.
[703]
For a number of reasons p-Value is a tool that can only help us determine the observed
[709]
data’s level of agreement or disagreement with the null hypothesis and cannot necessarily
[716]
be used for a bigger picture discussion about whether our results were caused by random
[721]
error.
[722]
The p-Value alone cannot answer these larger questions.
[728]
In order to make larger conclusions about research results you need to also consider
[732]
additional factors such as the design of the study and the results of other studies on
[737]
similar topics.
[740]
It is possible for a study to have a p-value of less than 0.05, but also be poorly designed
[748]
and/or disagree with all of the available research on the topic.
[753]
Statistics cannot be viewed in a vacuum when attempting to make conclusions and the results
[759]
of a single study can only cast doubt on the null hypothesis if the assumptions made during
[765]
the design of the study are true.
[768]
A simple way to illustrate this is to remember that by definition the p-value is calculated
[774]
using the assumption that the null hypothesis is correct.
[778]
Therefore, there is no way that the p-Value can be used to prove that the alternative
[783]
hypothesis is true.
[785]
Another way to show the pitfalls of blinding applying p-Value is to imagine a situation
[790]
where a researcher flips a coin 5 times and gets 5 heads in a row.
[796]
If you performed a one-tailed test you would get a p-value of 0.03.
[803]
Using the standard alpha of 0.05 this result would be deemed statically significant and
[810]
we would reject the null hypothesis.
[814]
Based solely on this data our conclusion would be that there is at least a 95% chance on
[821]
subsequent flips of the coin that heads will show up significantly more often than tails.
[827]
However, we know this conclusion is incorrect, because the studies sample size was too small
[834]
and there is plenty of external data to suggest that coins are fair (given enough flips of
[840]
the coin you will get heads about 50% of the time and tails about 50% of the time).
[846]
In actuality the chance of the null hypothesis being true is not 3% like we calculated, but
[853]
is actually 100%.
[855]
Lastly we have Statistical hypothesis testing which is how we test the null hypothesis & determine
[861]
statistical significance.
[863]
For the USMLE Step 1 Medical Board Exam all you need to know when to use the different
[870]
tests.
[871]
You don’t need to know how to actually perform them.
[875]
When you are comparing the mean or average of 2 groups you use the t-Test.
[880]
When you are comparing the mean of 3 or more groups you use an ANOVA test.
[886]
When you are using categorical variables instead of numerical variables you use a chi-squared
[892]
test.
[894]
When using categorical values rather than having a continuous numerical value that is
[899]
measurable you have categories such as gender or the presence or absence of a disease.
[907]
That brings us to the end of the video.
[909]
I’d like to give a big thanks to Brittany Hale & dave carlson for going to my website
[915]
StompOnStep1.com and making donations which helped to fund this video.
[919]
If you found this video useful please comment below as it really helps me out.
[925]
And if you would like to be taken directly to the next video in the series which will
[930]
cover confidence intervals you can click on this black box here if you are watching on
[936]
a computer.
[937]
That video will be very much related to this one so I definitely suggest checking it out.
[943]
Thank you so much for watching and good luck with the rest of your studying.