🔍

ANOVA: Crash Course Statistics #33 - YouTube

Channel: CrashCourse

[2]

Hi, I’m Adriene Hill, and welcome back to Crash Course Statistics.

[6]

In many of our episodes we’ve looked at t-tests, which among other things, are good

[10]

for testing the difference between two groups.

[13]

Like people with or without cats.

[15]

Families below the poverty line...and families above it.

[18]

Petri dishes of cells that are treated with a chemical and those that aren't.

[22]

But the world isn’t always so binary.

[24]

We often want to compare measurements of MORE than two groups.

[27]

Things like ethnicity, medical diagnosis, country of origin, or job title.

[31]

So today, we’re going to apply the General Linear Model Framework we learned in the last episode

[36]

to test the difference between multiple groups using a new model called the ANOVA.

[41]

INTRO

[50]

The GLM Framework takes all the information that our data contain, and partitions it into

[54]

two piles: information that can be explained by a model that represents the way we think

[60]

things work, and error, which is the amount of information that our model fails to explain.

[65]

So let’s apply that to a new model: the ANOVA.

[68]

ANOVA is an acronym for ANalysis Of VAriance.

[71]

It’s actually very similar to Regression, except we’re using a categorical variable

[76]

to predict a continuous one.

[78]

Like using a soccer player’s position to predict the number of yards he runs in a game.

[82]

Or using highest completed degree to predict a person’s salary, note that this alone

[87]

isn’t evidence that getting a degree causes a higher salary, just that knowing someone’s

[93]

degree might help estimate how much they get paid.

[96]

Like Regression, the ANOVA builds a model of how the world works.

[100]

For example, my model for how many bunnies I’ll see on my walk into work might be that

[104]

if it’s raining I’ll see 1 bunny, and if it’s sunny, I’ll see 5.

[107]

I walk through a bunny preserve...

[109]

1 and 5 are my predictions for how many bunnies I’ll see, based on whether or not it’s raining.

[114]

Yesterday it rained.

[116]

And I saw two bunnies!

[117]

My model predicted 1, and my error is 1.

[120]

And we can represent this model as a sort of Regression where there are ONLY two possible

[125]

values that the Variable Weather can have.

[128]

0--if it rains--or 1--if it doesn’t.

[130]

In this case, expected number of bunnies on a rainy day is 1 and beta is the difference

[135]

between the two means, 5-1 = 4.

[139]

Which means our ANOVA model looks like this:

[141]

In a Regression we did a statistical test of the slope and that’s what this simple

[145]

ANOVA is doing too.

[147]

Since we assigned rainy days to be coded as 0, and sunny days as 1, the change in the

[152]

X-direction is just one (1-0).

[155]

So the slope of this line is the difference between mean bunny count on sunny days, five,

[160]

minus mean bunny count on rainy days, one.

[163]

This difference of 4 is the change in the Y direction.

[167]

We test this difference in the same way that we tested the regression slope.

[170]

And this slope tells us the difference between the means of the two groups.

[174]

Usually we’ll like to think of this slope as the difference between two group means.

[177]

But, knowing that our model treats it like a slope helps us understand how ANOVAs relate

[182]

to regression.

[184]

In a regression the slope tells you how much an increase in one unit of X affects Y.

[190]

Like for example, how much an increase of 1 year increases shoe size in kids.

[194]

An ANOVA actually does the same thing.

[196]

It looks at how much an increase from 0 (rainy days) to 1 (non-rainy days) affects the number

[202]

of bunnies you’d see.

[204]

Now...to another example.

[205]

Let’s look at the ratings of various chocolate bars based on the type of cocoa bean used.

[209]

We’ll use a dataset you can find at Kaggle.com courtesy of Brady Brelinski.

[214]

Our three groups are chocolate bars made with Criollo beans, Forastero beans, or Trinitario beans.

[220]

Chocolate making is complex, so we took a small sample of bars that only contained 1

[226]

of these three beans.

[227]

And the chocolate taster used a scale--with 5 as the highest score --transcending beyond

[232]

the ordinary limits.

[233]

1 was “mostly unpalatable”...

[236]

But is there really “mostly unpalatable” chocolate out there?

[240]

We want to know if the type of bean affects our taster’s ratings.

[243]

To find out, we need the ANOVA model!

[246]

Like Regression, we can calculate a Sums of Squares Total by adding up the squared differences

[250]

between each chocolate rating, and the overall mean chocolate rating.

[254]

This gives us our Sums of Squares Total, or SST.

[257]

If that sounds like how we calculated variance, that’s because it is!

[261]

SST is just N times Variance.

[263]

This Sum represents the total amount of variation, or information, in the data.

[267]

Now, we need to partition this variation.

[270]

When we previously used a simple linear regression model, we partitioned this variation into

[274]

two parts: Sums of Squares for Regression, and Sums of Squares for Error.

[278]

And the ANOVA does the same thing.

[280]

The first step is to figure out how much of the variation is explained by our model.

[285]

In an ANOVA--what we’re using here--our best guess of a chocolate bar’s rating is

[290]

its group mean.

[291]

For bars made with Criollo beans 3.1, Forastero beans 3.25, and Trinitario beans 3.27.

[301]

So we sum up the squared distances between each point and its group mean.

[305]

This is called our Model Sums of Squares (or SSM) because it’s the variation our model explains.

[311]

So now that we have the amount of variation explained by the model.

[315]

In other words, how much variation is accounted for if we just assumed each rating value were

[321]

it’s group mean rating.

[322]

We’re also going to need the amount of variation that it DOESN’T explain.

[326]

In other words, how much ratings vary within each group of Cacao beans.

[330]

So, we can sum up the squared differences between each data point and its group mean

[336]

to get our Sums of Squares for Error: the amount of information that our model doesn’t explain.

[341]

Now that we have that information, we can calculate our F-statistic, just like we did

[345]

for regression.

[346]

The F-statistic compares how much variation our model accounts for vs. how much it can’t

[353]

account for.

[354]

The larger that F is, the more information our model is able to give us about our chocolate

[359]

bar ratings.

[360]

Again, SSM is the variation our model explains and SSE is the variation it doesn’t explain.

[364]

We want to compare the two.

[365]

But we also need to account for the amount of independent information that each one uses.

[370]

So, we divide each Sums of Squares by its degrees of freedom.

[374]

Our ANOVA model has 2 degrees of freedom.

[376]

In general, the formula for degrees of freedom for categorical variables (like cocoa bean

[381]

types) in an ANOVA is k-1, where k is the number of groups. In our case we have 3 groups.

[387]

Our Sums of Squares for Error has 787 degrees of freedom because we originally had 790 data

[394]

points, but we calculated 3 means.

[397]

The general formula for degrees of freedom for your errors is n minus k where n is the

[402]

sample size and k is the number of groups.

[405]

For our test, we got an F-statistic of 7.7619.

[409]

This F-statistic--sometimes called an F-ratio--has a distribution that looks like this:

[413]

And we’re going to use this distribution to find our p-value.

[416]

We want to know whether the effect of bean type on chocolate bar ratings is significant.

[421]

In this case we have a p-value of 0.000459.

[425]

Small enough to reject the null.

[427]

So we’ve found evidence that beans influenced the chocolate bar ratings.

[431]

A statistically significant result means that there is SOME statistically significant difference

[436]

SOMEWHERE in the groups, but it doesn’t tell you where that difference is.

[440]

Maybe Trinitario is significantly different from Criollo but not Forastero beans..

[445]

An F-test is an example of an Omnibus test, which means it’s a test that contains many

[450]

items or groups.

[452]

When we get a significant F-statistic, it means that there’s SOME statistically significant

[458]

difference somewhere between the groups, but we still have to look for it.

[461]

It’s kinda like walking into your kitchen and smelling something realllllllly stinky.

[465]

You know there’s SOMETHING gross, but you have to do more work to find out exactly what

[470]

is rotting...

[471]

We already have tools to do this, in statistics at least, because you can follow up a significant

[475]

F-test in an ANOVA with multiple t-tests, one for every unique pair of categories your

[481]

variable had.

[483]

We had 3, which means we only need to do 3 t-tests in order to find the statistically

[489]

significant difference or differences.

[490]

To conduct these T-tests, we take just the data in the two categories for that t-test,

[496]

and calculate the t-statistic and p-value.

[499]

For our first t-test we just look at the bars with Trinitario and Criollo beans.

[504]

First, we follow our Test statistic general formula:

[507]

We take the difference between the mean rating of chocolates made with Trinitario and Criollo beans.

[511]

And divide by the standard error.

[513]

And once we do this for all three comparisons, we can see where our statistically significant

[518]

differences are.

[519]

It looks--from our graph--like ratings of chocolate bars made with Criollo beans are

[523]

different...in a statistically significant way... than those made with Trinitario or

[528]

Forastero beans.

[530]

And our graph and group means show that Criollo bars have a slightly lower mean rating.

[533]

But bars made with Trinitario beans are NOT statistically significantly different than

[538]

those made with Forastero beans.

[541]

So our ANOVA F-test told us that there WERE some differences, and our follow up t-tests

[546]

told us WHERE they were.

[548]

And this is interesting.

[549]

Criollo beans are generally considered a delicacy and of a much higher quality than Forastero.

[555]

And Trinitario are hybrid of the two.

[558]

But we found...in this data set... that Criollo bars had statistically significantly lower ratings.

[564]

This might be because we excluded bars with combinations of our three bean types...or

[568]

because the rater has a different preference...or even be caused by some other unknown factor

[574]

that our model does not include.

[576]

Like who made the chocolate.

[577]

Or the country of origin of the beans.

[580]

We can also use ANOVAs for more than 3 groups.

[582]

For example, the ANOVA was first created by the statistician R.A. Fisher when he was on

[587]

a potato farm looking at studies of fertilizer.

[590]

In one of the first experiments he described, he looked at 12 different species of potato

[595]

and the effect of various fertilizers.

[597]

Let’s look at a simple version of Fisher’s potato study.

[601]

Here we have 12 different varieties of potato.

[604]

We’ll represent each of them with a letter A through L.

[607]

There are 21 of each of the potato plants, for a total of 252 potato plants.

[612]

We give our future french fries about a season to grow, then we dig them up and weigh each one.

[617]

This graph shows the potato weights that we recorded, as well as the total mean potato

[622]

weight and each group mean potato weight.

[625]

Using these numbers, we can calculate our Total Sums of Squares, Model Sums of Squares,

[629]

and Sums of Squares error.

[630]

We’re going to let a computer do that for us this time.

[633]

And our computer spit out this: the degrees of freedom, sums of squares, mean squares,

[639]

F-statistic, and p-value.

[641]

This is called an ANOVA table and it organizes all the information our ANOVA models give us.

[646]

Here we can see that our Model had an F-statistic--or F-value--of around 3, and a p-value of 0.000829.

[654]

So we reject the null hypothesis.

[656]

We found evidence that the potato varieties don’t all have the same mean weight.

[661]

But since this was an Omnibus test, our statistically significant F-test just means that there is

[666]

some statistically significant difference somewhere in those 12 potato varieties.

[672]

We don’t know where it is.

[674]

In that way, ANOVAs can be thought of as a first step.

[677]

We do an overall test that tells us whether there’s a needle in our haystack.

[682]

If we find out there is a needle, then we go looking for it.

[686]

However, if our test tells us there’s no needle, we’re done.

[689]

No need to look for something that probably doesn’t exist.

[692]

But you can see that this significant F-statistic for potato varieties will require MANY follow

[697]

up tests.

[698]

12 choose 2.

[700]

Or 66.

[701]

We showed a lot of calculations today, but there’s two big ANOVA ideas to take away

[705]

from this.

[706]

First, a lot of these different statistical models are more similar than they are actually different.

[711]

ANOVAs and Regressions both use the General Linear Model form to create a story about

[717]

how the world might work.

[719]

The ANOVA says that the best guess for a data point--like the rating of a new chocolate

[723]

bar--is the mean rating of whatever Group it belongs to.

[727]

Whether that’s Criollo, Trinitario , or Forastero.

[731]

If we don’t know anything else, we’d guess that the rating of a Criollo chocolate bar

[736]

is the mean rating for all Criollo bars.

[738]

Also, an ANOVA is a great example of filtering.

[741]

If there’s no evidence that bean type has an overall effect on chocolate-bar ratings,

[746]

we don’t want to go chasing more specific effects.

[749]

Our time is precious...and we want to use it as best as we can.

[753]

So we have more time out in the world...to look for bunnies.

[756]

Thanks for watching, I’ll see you next time.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage