🔍
Hypothesis testing. Null vs alternative - YouTube
Channel: 365 Data Science
[0]
Hi, and welcome back.
[1]
This is the main section of this course.
[2]
It is based on the knowledge that you acquired
previously, so if you haven’t been through
[3]
it, you may have a hard time keeping up.
[4]
Make sure you have seen all the videos about
confidence intervals, distributions, z-tables
[5]
and t-tables, and have done all the exercises.
[6]
If you’ve completed them already, you are
good to go.
[7]
Confidence intervals provide us with an estimation
of where the parameters are located.
[8]
However, when you are making a decision, you
need a yes/no answer.
[9]
The correct approach in this case is to use
a test.
[10]
In this section, we will learn how to perform
one of the fundamental tasks in statistics
[11]
- hypothesis testing!
[12]
Okay.
[13]
There are four steps in data-driven decision-making.
[14]
First, you must formulate a hypothesis.
[15]
Second, once you have formulated a hypothesis,
you will have to find the right test for your
[16]
hypothesis.
[17]
Third, you execute the test.
[18]
And fourth, you make a decision based on the
result.
[21]
Let’s start from the beginning.
[24]
What is a hypothesis?
[26]
Though there are many ways to define it, the
most intuitive I’ve seen is:
[30]
“A hypothesis is an idea that can be tested.”
[34]
This is not the formal definition, but it
explains the point very well.
[38]
So, if I tell you that apples in New York
are expensive, this is an idea, or a statement,
[44]
but is not testable, until I have something
to compare it with.
[48]
For instance, if I define expensive as: any
price higher than $1.75 dollars per pound,
[55]
then it immediately becomes a hypothesis.
[57]
Alright, what’s something that cannot be
a hypothesis?
[62]
An example may be: would the USA do better
or worse under a Clinton administration, compared
[68]
to a Trump administration?
[71]
Statistically speaking, this is an idea, but
there is no data to test it, therefore it
[75]
cannot be a hypothesis of a statistical test.
[79]
Actually, it is more likely to be a topic
of another discipline.
[83]
Conversely, in statistics, we may compare
different US presidencies that have already
[88]
been completed, such as the Obama administration
and the Bush administration, as we have data
[93]
on both.
[94]
Alright, let’s get out of politics and get
into hypotheses.
[99]
Here’s a simple topic that can be tested.
[103]
According to Glassdoor (the popular salary
information website), the mean data scientist
[108]
salary in the US is 113,000 dollars.
[111]
So, we want to test if their estimate is correct.
[116]
There are two hypotheses that are made: the
null hypothesis, denoted H zero, and the alternative
[122]
hypothesis, denoted H one or H A.
[126]
The null hypothesis is the one to be tested
and the alternative is everything else.
[131]
In our example,
The null hypothesis would be: The mean data
[136]
scientist salary is 113,000 dollars,
While the alternative: The mean data scientist
[142]
salary is not 113,000 dollars.
[145]
Now, you would want to check if 113,000 is
close enough to the true mean, predicted by
[151]
our sample.
[153]
In case it is, you would accept the null hypothesis.
[156]
Otherwise, you would reject the null hypothesis.
[160]
The concept of the null hypothesis is similar
to: innocent until proven guilty.
[166]
We assume that the mean salary is 113,000
dollars and we try to prove otherwise.
[172]
Alright.
[174]
This was an example of a two-sided or а two-tailed
test.
[178]
You can also form one sided or one-tailed
tests.
[181]
Say your friend, Paul, told you that he thinks
data scientists earn more than 125,000 dollars
[186]
per year.
[188]
You doubt him so you design a test to see
who’s right.
[192]
The null hypothesis of this test would be:
The mean data scientist salary is more than
[197]
125,000 dollars.
[200]
The alternative will cover everything else,
thus: The mean data scientist salary is less
[205]
than or equal to 125,000 dollars.
[207]
It is important to note that outcomes of tests
refer to the population parameter rather than
[214]
the sample statistic!
[216]
As such, the result that we get is for the
population.
[222]
Another crucial consideration is that, generally,
the researcher is trying to reject the null
[226]
hypothesis.
[228]
Think about the null hypothesis as the status
quo and the alternative as the change or innovation
[234]
that challenges that status quo.
[236]
In our example, Paul was representing the
status quo, which we were challenging.
[240]
Alright.
[241]
That’s all for now.
[242]
In the next lectures, we will see some examples
and learn how to make data-driven decisions.
Most Recent Videos:
You can go back to the homepage right here: Homepage





