🔍

Hypothesis testing. Null vs alternative - YouTube

Channel: 365 Data Science

[0]

Hi, and welcome back.

[1]

This is the main section of this course.

[2]

It is based on the knowledge that you acquired previously, so if you haven’t been through

[3]

it, you may have a hard time keeping up.

[4]

Make sure you have seen all the videos about confidence intervals, distributions, z-tables

[5]

and t-tables, and have done all the exercises.

[6]

If you’ve completed them already, you are good to go.

[7]

Confidence intervals provide us with an estimation of where the parameters are located.

[8]

However, when you are making a decision, you need a yes/no answer.

[9]

The correct approach in this case is to use a test.

[10]

In this section, we will learn how to perform one of the fundamental tasks in statistics

[11]

- hypothesis testing!

[12]

Okay.

[13]

There are four steps in data-driven decision-making.

[14]

First, you must formulate a hypothesis.

[15]

Second, once you have formulated a hypothesis, you will have to find the right test for your

[16]

hypothesis.

[17]

Third, you execute the test.

[18]

And fourth, you make a decision based on the result.

[21]

Let’s start from the beginning.

[24]

What is a hypothesis?

[26]

Though there are many ways to define it, the most intuitive I’ve seen is:

[30]

“A hypothesis is an idea that can be tested.”

[34]

This is not the formal definition, but it explains the point very well.

[38]

So, if I tell you that apples in New York are expensive, this is an idea, or a statement,

[44]

but is not testable, until I have something to compare it with.

[48]

For instance, if I define expensive as: any price higher than $1.75 dollars per pound,

[55]

then it immediately becomes a hypothesis.

[57]

Alright, what’s something that cannot be a hypothesis?

[62]

An example may be: would the USA do better or worse under a Clinton administration, compared

[68]

to a Trump administration?

[71]

Statistically speaking, this is an idea, but there is no data to test it, therefore it

[75]

cannot be a hypothesis of a statistical test.

[79]

Actually, it is more likely to be a topic of another discipline.

[83]

Conversely, in statistics, we may compare different US presidencies that have already

[88]

been completed, such as the Obama administration and the Bush administration, as we have data

[93]

on both.

[94]

Alright, let’s get out of politics and get into hypotheses.

[99]

Here’s a simple topic that can be tested.

[103]

According to Glassdoor (the popular salary information website), the mean data scientist

[108]

salary in the US is 113,000 dollars.

[111]

So, we want to test if their estimate is correct.

[116]

There are two hypotheses that are made: the null hypothesis, denoted H zero, and the alternative

[122]

hypothesis, denoted H one or H A.

[126]

The null hypothesis is the one to be tested and the alternative is everything else.

[131]

In our example, The null hypothesis would be: The mean data

[136]

scientist salary is 113,000 dollars, While the alternative: The mean data scientist

[142]

salary is not 113,000 dollars.

[145]

Now, you would want to check if 113,000 is close enough to the true mean, predicted by

[151]

our sample.

[153]

In case it is, you would accept the null hypothesis.

[156]

Otherwise, you would reject the null hypothesis.

[160]

The concept of the null hypothesis is similar to: innocent until proven guilty.

[166]

We assume that the mean salary is 113,000 dollars and we try to prove otherwise.

[172]

Alright.

[174]

This was an example of a two-sided or а two-tailed test.

[178]

You can also form one sided or one-tailed tests.

[181]

Say your friend, Paul, told you that he thinks data scientists earn more than 125,000 dollars

[186]

per year.

[188]

You doubt him so you design a test to see who’s right.

[192]

The null hypothesis of this test would be: The mean data scientist salary is more than

[197]

125,000 dollars.

[200]

The alternative will cover everything else, thus: The mean data scientist salary is less

[205]

than or equal to 125,000 dollars.

[207]

It is important to note that outcomes of tests refer to the population parameter rather than

[214]

the sample statistic!

[216]

As such, the result that we get is for the population.

[222]

Another crucial consideration is that, generally, the researcher is trying to reject the null

[226]

hypothesis.

[228]

Think about the null hypothesis as the status quo and the alternative as the change or innovation

[234]

that challenges that status quo.

[236]

In our example, Paul was representing the status quo, which we were challenging.

[240]

Alright.

[241]

That’s all for now.

[242]

In the next lectures, we will see some examples and learn how to make data-driven decisions.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage