🔍

Null Hypothesis, p-Value, Statistical Significance, Type 1 Error and Type 2 Error - YouTube

Channel: unknown

[0]

Distinguished future physicians welcome to Stomp on Step 1 the only free videos series

[5]

that helps you study more efficiently by focusing on the highest yield material.

[10]

I’m Brian McDaniel and I will be your guide on this journey through Null Hypothesis, Alternative

[15]

Hypothesis, Type I and Type II Error, p-Value, alpha, beta, power & Statistical Significance.

[23]

This is the 11th video in my playlist covering all of biostatistics and Epidemiology for

[30]

the USMLE Step 1 Medical Board Exam.

[33]

There is a lot to cover but, we will try to move through things quickly and break them

[37]

down into bite sized pieces.

[40]

We will start with the Null Hypothesis which is represented by H subscript zero.

[46]

The null hypothesis states that there no difference between the groups being studied.

[51]

In other words there is no relationship between the risk factor or treatment being studied

[56]

and occurrence of the health outcomes.

[61]

For example, if we are comparing a placebo group to a group receiving a new diabetes

[67]

medication then then null hypothesis states that the blood sugars or medical complications

[74]

would be roughly the same in each group.

[76]

We will talk about this more in a second, but by default you assume the null hypothesis

[82]

is correct until you have enough evidence to support rejecting this hypothesis.

[89]

If you are the researcher it is usually kind of a bummer when the null hypothesis is valid,

[94]

because it means you didn’t find a treatment that works or that the risk factor you are

[99]

studying isn’t as important as you were hoping.

[103]

The Alternative Hypothesis is denoted by H subscript a or H1.

[110]

As you might expect it is the opposite of the null hypothesis.

[115]

This hypothesis states that there is a difference between groups.

[120]

The research groups are different with regard to what is being studied.

[124]

In other words there is a relationship between the risk factor or treatment and occurrence

[130]

of the health outcome Obviously, the researcher wants the alternative

[135]

hypothesis to be true.

[136]

If the Ha is true it means they discovered a treatment that improves patient outcomes

[143]

or identified a risk factor that is important in the development of a health outcome.

[149]

However, you never prove the alternative hypothesis is true.

[154]

You can only reject a hypothesis (say it is false) or fail to reject a hypothesis (could

[162]

be true but you can never be totally sure).

[166]

So a researcher really wants to reject the null hypothesis, because that is as close

[171]

as they can get to proving the alternative hypothesis is true.

[176]

In other words you can’t prove a given treatment caused a change in outcomes, but you can show

[182]

that that conclusion is valid by showing that the opposite hypothesis (or the null hypothesis)

[189]

is highly improbable given your data.

[193]

Anytime you reject a hypothesis there is a chance you made a mistake.

[198]

This would mean you rejected a hypothesis that is true or failed to reject a hypothesis

[203]

that is false.

[205]

Type 1 Error is when you incorrectly rejecting the null hypothesis.

[211]

The researcher says there is a difference between the groups when there really isn’t.

[216]

It can be thought of as a false positive study result.

[221]

Usually we focus on the null hypothesis and type 1 error, because the researchers want

[226]

to show a difference between groups.

[228]

If there is any intentional or unintentional bias it more likely exaggerates the differences

[235]

between groups based on this desire.

[239]

The probability of making a Type I Error is called alpha.

[243]

You can remember this by thinking that alpha is the first letter in the greek alphabet

[248]

so it goes with type 1 error.

[251]

I’m gonna hold off on talking about alpha and p-value for a few slides.

[256]

Type 2 Error is when you fail to reject the null when you should have rejected the null

[261]

hypothesis.

[263]

The researcher says there is no difference between the groups when there is a real difference.

[269]

It can be thought of as a false negative study result.

[273]

The probability of making a Type II Error is called beta.

[277]

You can remember this by thinking that β is the second letter in the greek alphabet.

[284]

Power is the probability of finding a difference between groups if one truly exists.

[289]

It is the percentage chance that you will be able to reject the null hypothesis if it

[297]

is really false.

[300]

Power can also be thought of as the probability of not making a type 2 error.

[306]

In equation form, Power equals 1 minus beta.

[311]

It is good for a study to have high power.

[315]

A cutoff for differentiating high from low power would be roughly around 0.8 or 80%.

[325]

In other words, having a beta less than 20% for a given study is good.

[331]

Where power comes into play most often is while the study is being designed.

[336]

Before you even start the study you may do power calculations based on projections.

[341]

That way you can tweak the design of the study before you start it and potentially avoid

[347]

performing an entire study that has really low power since you are unlikely to learn

[353]

anything.

[355]

Power increases as you increase sample size, because you have more data from which to make

[360]

a conclusion.

[362]

Power also increases as the effect size or actual difference between the group’s increases.

[368]

If you are trying to detect a huge difference between groups it is a lot easier than detecting

[374]

a very small difference between groups.

[377]

Increasing the precision (or decreasing standard deviation) of your results also increases

[383]

power.

[385]

If all of the results you have are very similar it is easier to come to a conclusion than

[390]

if your results are all over the place.

[394]

p-value is the probability of obtaining a result at least as extreme as the current

[399]

one, assuming that the null hypothesis is true.

[404]

Imagine we did a study comparing a placebo group to a group that received a new blood

[409]

pressure medication and the mean blood pressure in the treatment group was 20 mm Hg lower

[416]

than the placebo group.

[419]

Assuming the null hypothesis is correct the p-value is the probability that if we repeated

[425]

the study the observed difference between the group averages would be at least 20.

[431]

Now you have probably picked up on the fact that I keep adding the caveat that this definition

[437]

of the p-value only holds true if the null hypothesis is correct (AKA if is no real difference

[444]

between the groups).

[445]

However, don’t let that throw you off.

[448]

You just assume this is the case in order to perform this test because we have to start

[453]

from somewhere.

[454]

It is not as if you have to prove the null hypothesis is true before you utilize the

[458]

p-value.

[460]

The p-value is a measurement to tell us how much the observed data disagrees with the

[465]

null hypothesis.

[468]

When the p-value is very small there is more disagreement of our data with the null hypothesis

[474]

and we can begin to consider rejecting the null hypothesis (AKA saying there is a real

[480]

difference between the groups being studied).

[483]

In other words, when the p-value is very small our data suggests it is less likely that the

[489]

groups being studied are the same.

[491]

Therefore, when the p-value is very low our data is incompatible with the null hypothesis

[497]

and we will reject the null hypothesis.

[499]

When the p-value is high there is less disagreement between our data and the null hypothesis.

[507]

In other words, when the p-value is high it is more likely that the groups being studied

[512]

are the same.

[513]

In this scenario we will likely fail to reject the null hypothesis.

[519]

You may be wondering what determines whether a p-value is “low” or “high.”

[525]

That is where the selected “Level of Significance” or Alpha comes in.

[530]

As we have already discussed Alpha is the probability of making a Type I Error (or the

[536]

probability of incorrectly rejecting the null hypothesis).

[539]

It is a selected cut off point that determines whether we consider a p-value acceptably high

[546]

or low.

[547]

If our p-value is lower than alpha we conclude that there is a statistically significant

[553]

difference between groups.

[555]

When the p-value is higher than our significance level we conclude that the observed difference

[561]

between groups is not statistically significant.

[566]

Alpha is arbitrarily defined.

[569]

A 5% level of significance is most commonly used in medicine based only on the consensus

[575]

of researchers.

[577]

Using a 5% alpha implies that having a 5% probability of incorrectly rejecting the null

[584]

hypothesis is acceptable.

[587]

Therefore, other alphas such as 10% or 1% are used in certain situations.

[595]

So here is the key that you need to understand.

[597]

In most cases in medicine, if the p value of a study is less than 5% then there is a

[606]

statistically significant difference between groups.

[609]

If the p-value is more than 5% than there is not a statistically significant difference

[616]

between groups.

[618]

There are a couple caveats that complicate things a bit.

[621]

Both are related to how you can’t take statistics out of context to make conclusions.

[628]

Statistical significance is not the same things as clinical significance.

[633]

Clinical Significance is the practical importance of the finding.

[638]

There may be statistically significant difference between 2 drugs, but the difference is so

[644]

small that using one over the other is not a big deal.

[649]

For example, you might show a new blood pressure medication is a statistically significant

[654]

improvement over an older drug, but if the new drug only lowers blood pressure on average

[660]

by 1 more mm Hg it won’t have a meaningful impact on the outcomes that are important

[667]

to patients.

[669]

It is also often incorrectly stated (by students, researchers, review books etc.) that “p-Value

[676]

can be used to determine that the observed difference between groups is due to chance

[681]

(or random sampling error).”

[684]

In other words, “if my p-Value is less than alpha then there is less than a 5% probability

[690]

that the null hypothesis is true.”

[693]

While this may be easier to understand and perhaps may even be enough of an understanding

[698]

to get test questions right it is a misinterpretation of p-value.

[703]

For a number of reasons p-Value is a tool that can only help us determine the observed

[709]

data’s level of agreement or disagreement with the null hypothesis and cannot necessarily

[716]

be used for a bigger picture discussion about whether our results were caused by random

[721]

error.

[722]

The p-Value alone cannot answer these larger questions.

[728]

In order to make larger conclusions about research results you need to also consider

[732]

additional factors such as the design of the study and the results of other studies on

[737]

similar topics.

[740]

It is possible for a study to have a p-value of less than 0.05, but also be poorly designed

[748]

and/or disagree with all of the available research on the topic.

[753]

Statistics cannot be viewed in a vacuum when attempting to make conclusions and the results

[759]

of a single study can only cast doubt on the null hypothesis if the assumptions made during

[765]

the design of the study are true.

[768]

A simple way to illustrate this is to remember that by definition the p-value is calculated

[774]

using the assumption that the null hypothesis is correct.

[778]

Therefore, there is no way that the p-Value can be used to prove that the alternative

[783]

hypothesis is true.

[785]

Another way to show the pitfalls of blinding applying p-Value is to imagine a situation

[790]

where a researcher flips a coin 5 times and gets 5 heads in a row.

[796]

If you performed a one-tailed test you would get a p-value of 0.03.

[803]

Using the standard alpha of 0.05 this result would be deemed statically significant and

[810]

we would reject the null hypothesis.

[814]

Based solely on this data our conclusion would be that there is at least a 95% chance on

[821]

subsequent flips of the coin that heads will show up significantly more often than tails.

[827]

However, we know this conclusion is incorrect, because the studies sample size was too small

[834]

and there is plenty of external data to suggest that coins are fair (given enough flips of

[840]

the coin you will get heads about 50% of the time and tails about 50% of the time).

[846]

In actuality the chance of the null hypothesis being true is not 3% like we calculated, but

[853]

is actually 100%.

[855]

Lastly we have Statistical hypothesis testing which is how we test the null hypothesis & determine

[861]

statistical significance.

[863]

For the USMLE Step 1 Medical Board Exam all you need to know when to use the different

[870]

tests.

[871]

You don’t need to know how to actually perform them.

[875]

When you are comparing the mean or average of 2 groups you use the t-Test.

[880]

When you are comparing the mean of 3 or more groups you use an ANOVA test.

[886]

When you are using categorical variables instead of numerical variables you use a chi-squared

[892]

test.

[894]

When using categorical values rather than having a continuous numerical value that is

[899]

measurable you have categories such as gender or the presence or absence of a disease.

[907]

That brings us to the end of the video.

[909]

I’d like to give a big thanks to Brittany Hale & dave carlson for going to my website

[915]

StompOnStep1.com and making donations which helped to fund this video.

[919]

If you found this video useful please comment below as it really helps me out.

[925]

And if you would like to be taken directly to the next video in the series which will

[930]

cover confidence intervals you can click on this black box here if you are watching on

[936]

a computer.

[937]

That video will be very much related to this one so I definitely suggest checking it out.

[943]

Thank you so much for watching and good luck with the rest of your studying.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage