🔍

Chi-square distribution introduction | Probability and Statistics | Khan Academy - YouTube

Channel: Khan Academy

[0]

In this video, we'll just talk a little bit

[2]

about what the chi-square distribution is, sometimes

[10]

called the chi-squared distribution.

[12]

And then in the next few videos, we'll

[13]

actually use it to really test how well

[17]

theoretical distributions explain observed ones,

[20]

or how good a fit observed results

[23]

are for theoretical distributions.

[25]

So let's just think about it a little bit.

[27]

So let's say I have some random variables.

[29]

And each of them are independent,

[31]

standard, normal, normally distributed random variables.

[35]

So let me just remind you what that means.

[37]

So let's say I have the random variable X. If X is normally

[41]

distributed, we could write that X

[43]

is a normal random variable with a mean of 0

[49]

and a variance of 1.

[51]

Or you could say that the expected value of X,

[56]

is equal to 0, or in that the variance

[59]

of our random variable X is equal to 1.

[63]

Or just to visualize it is that, when

[66]

we take an instantiation of this very variable,

[69]

we're sampling from a normal distribution,

[72]

a standardized normal distribution that

[74]

looks like this.

[75]

Mean of 0 and then a variance of 1, which would also mean,

[79]

of course, a standard deviation of 1.

[82]

So that could be the standard deviation, or the variance,

[86]

or the standard deviation, that would be equal to 1.

[90]

So a chi-square distribution, if you just

[91]

take one of these random variables--

[94]

and let me define it this way.

[95]

Let me define a new random variable.

[98]

Let me define a new random variable

[100]

Q that is equal to-- you're essentially

[107]

sampling from this the standard normal distribution

[110]

and then squaring whatever number you got.

[112]

So it is equal to this random variable X squared.

[123]

The distribution for this random variable right

[126]

here is going to be an example of the chi-square distribution.

[129]

Actually what we're going to see in this video is

[131]

that the chi-square, or the chi-squared distribution

[134]

is actually a set of distributions

[136]

depending on how many sums you have.

[138]

Right now, we only have one random variable

[141]

that we're squaring.

[142]

So this is just one of the examples.

[145]

And we'll talk more about them in a second.

[147]

So this right here, this we could

[149]

write that Q is a chi-squared distributed random variable.

[155]

Or that we could use this notation right here.

[157]

Q is-- we could write it like this.

[160]

So this isn't an X anymore.

[162]

This is the Greek letter chi, although it

[163]

looks a lot like a curvy X. So it's a member of chi-squared.

[169]

And since we're only taking one sum over here--

[171]

we're only taking the sum of one independent,

[174]

normally distributed, standard or normally distributed

[177]

variable, we say that this only has 1 degree of freedom.

[182]

And we write that over here.

[183]

So this right here is our degree of freedom.

[189]

We have 1 degree of freedom right over there.

[194]

So let's call this Q1.

[196]

Let's say I have another random variable.

[199]

Let's call this Q-- let me do it in a different color.

[201]

Let me do Q2 in blue.

[204]

Let's say I have another random variable, Q2,

[206]

that is defined as-- let's say I have one independent, standard,

[211]

normally distributed variable.

[213]

I'll call that X1.

[214]

And I square it.

[215]

And then I have another independent, standard,

[224]

normally distributed variable, X2.

[227]

And I square it.

[228]

So you could imagine both of these guys

[229]

have distributions like this.

[231]

And they're independent.

[232]

So get to sample Q2, you essentially sample

[237]

X1 from this distribution, square that value, sample X2

[241]

from the same distribution, essentially, square that value,

[245]

and then add the two.

[246]

And you're going to get Q2.

[248]

This over here-- here we would write-- so this is Q1.

[252]

Q2 here, Q2 we would write is a chi-squared,

[256]

distributed random variable with 2 degrees of freedom.

[261]

Right here.

[262]

2 degrees of freedom.

[267]

And just to visualize kind of the set

[269]

of chi-squared distributions, let's look at this over here.

[273]

So this, I got this off of Wikipedia.

[276]

This shows us some of the probability density functions

[279]

for some of the chi-square distributions.

[282]

This first one over here, for k of equal to 1,

[285]

that's the degrees of freedom.

[287]

So this is essentially our Q1.

[289]

This is our probability density function for Q1.

[292]

And notice it really spikes close to 0.

[294]

And that makes sense.

[295]

Because if you are sampling just once

[299]

from this standard normal distribution,

[302]

there's a very high likelihood that you're

[304]

going to get something pretty close to 0.

[306]

And then if you square something close to 0-- remember,

[308]

these are decimals, they're going to be less than 1,

[311]

pretty close to 0-- it's going to become even smaller.

[314]

So you have a high probability of getting a very small value.

[317]

You have high probabilities of getting values

[319]

less than some threshold, this right here, less than,

[323]

I guess, this is 1 right here.

[324]

So the less than 1/2.

[325]

And you have a very low probability

[327]

of getting a large number.

[328]

I mean, to get a 4, you would have to sample a 2

[334]

from this distribution.

[334]

And we know that 2 is-- actually it's 2 variances

[339]

or 2 standard deviations from the mean.

[341]

So it's less likely.

[342]

And actually that's to get a 4.

[345]

So to get even larger numbers are

[346]

going to be even less likely.

[349]

So that's why you see this shape over here.

[351]

Now when you have 2 degrees of freedom,

[353]

it moderates a little bit.

[354]

This is the shape, this blue line right

[359]

here is the shape of Q2.

[361]

And notice you're a little bit less likely to get values

[364]

close to 0 and a little bit more likely to get numbers

[367]

further out.

[368]

But it still is kind of shifted or heavily weighted

[371]

towards small numbers.

[372]

And then if we had another random variable,

[374]

another chi-squared distributed random variable--

[379]

so then we have, let's say, Q3.

[381]

And let's define it as the sum of 3

[385]

of these independent variables, each of them

[389]

that have a standard normal distribution.

[391]

So X1, X2 squared plus X3 squared.

[396]

Then all of a sudden, our Q3-- this is Q2 right here--

[399]

has a chi-squared distribution with 3 degrees of freedom.

[404]

And so this guy right over here--

[405]

that will be this green line.

[407]

Maybe I should have done this in green.

[408]

This will be this green line over here.

[411]

And then notice, now it's starting

[412]

to become a little bit more likely

[414]

that you'd get values in this range over here

[416]

because you're taking the sum.

[417]

Each of these are going to be pretty small values,

[419]

but you're taking the sum.

[420]

So it starts to shift it a little over to the right.

[422]

And so the more degrees of freedom you have,

[424]

the further this lump starts to move to the right

[427]

and, to some degree, the more symmetric it gets.

[429]

And what's interesting about this,

[431]

I guess it's different than almost every other distribution

[433]

we've looked at, although we've looked at others that

[436]

have this property as well, is that you can't have

[438]

a value below 0 because we're always just squaring

[441]

these values.

[442]

Each of these guys can have values below 0.

[444]

They're normally distributed.

[445]

They could have negative values.

[446]

But since we're squaring and taking the sum of squares,

[449]

this is always going to be positive.

[450]

And the place that this is going to be useful--

[452]

and we're going to see in the next few videos--

[454]

is in measuring essentially error from an expected value.

[458]

And if you took take this total error,

[460]

you can figure out the probability

[461]

of getting that total error if you assume some parameters.

[468]

And we'll talk more about it in the next video.

[470]

Now with that said, I just want to show you

[472]

how to read a chi-squared distribution table.

[476]

So if I were to ask you, if this is our distribution--

[481]

let me pick this blue one right here.

[483]

So over here, we have 2 degrees of freedom

[484]

because we're adding 2 of these guys right here.

[487]

If I were to ask you, what is the probability of Q2 being

[495]

greater than-- or, let me put it this way.

[501]

What is the probability of Q2 being greater than 2.41?

[507]

And I'm picking that value for a reason.

[508]

So I want the probability of Q2 being greater than 2.41.

[514]

What I want to do is I'll look at a chi-square table

[519]

like this.

[520]

Q2 is a version of chi-squared with 2 degrees of freedom.

[524]

So I look at this row right here under 2 degrees of freedom.

[528]

And I want the probability of getting a value above 2.41.

[532]

And I picked 2.41 because it's actually at this table.

[536]

And so most of these chi-squared-- the reason

[538]

why we have these weird numbers like this instead

[540]

of whole numbers or easy-to-read fractions

[542]

is it is actually driven by the p value.

[544]

It's driven by the probability of getting

[546]

something larger than that value.

[548]

So normally you would look at the other way.

[550]

You'd say, OK, if I want to say, what chi-squared value

[555]

for 2 degrees of freedom, there's

[557]

a 30% chance of getting something larger than that?

[560]

Then I would look up 2.41.

[561]

But I'm doing it the other way just

[563]

for the sake of this video.

[564]

So if I want the probability of this random variable

[567]

right here being greater than 2.41, or its p value,

[572]

we read it right here.

[573]

It is 30%.

[575]

And just to visualize it on this chart,

[577]

this chi-square distribution-- this was Q2, the blue one,

[580]

over here-- 2.41 is going to sit-- let's see.

[582]

This is 3.

[584]

This is 2.5.

[585]

So 2.41 is going to be someplace right around here.

[589]

So essentially, what that table is

[590]

telling us is, this entire area under this blue line right

[598]

here, what is that?

[599]

And that right there is going to be 30% of-- well,

[603]

it's going to be 0.3.

[607]

Or you could view it as 30% of the entire area

[609]

under this curve, because obviously all the probabilities

[612]

have to add up to 1.

[613]

So that's our intro to the chi-square distribution.

[615]

In the next video, we're actually

[617]

going to use it to make some, or to test some, inferences.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage