馃攳
Chi-square distribution introduction | Probability and Statistics | Khan Academy - YouTube
Channel: Khan Academy
[0]
In this video, we'll
just talk a little bit
[2]
about what the chi-square
distribution is, sometimes
[10]
called the chi-squared
distribution.
[12]
And then in the next
few videos, we'll
[13]
actually use it to
really test how well
[17]
theoretical distributions
explain observed ones,
[20]
or how good a fit
observed results
[23]
are for theoretical
distributions.
[25]
So let's just think
about it a little bit.
[27]
So let's say I have
some random variables.
[29]
And each of them
are independent,
[31]
standard, normal, normally
distributed random variables.
[35]
So let me just remind
you what that means.
[37]
So let's say I have the random
variable X. If X is normally
[41]
distributed, we
could write that X
[43]
is a normal random
variable with a mean of 0
[49]
and a variance of 1.
[51]
Or you could say that
the expected value of X,
[56]
is equal to 0, or
in that the variance
[59]
of our random variable
X is equal to 1.
[63]
Or just to visualize
it is that, when
[66]
we take an instantiation
of this very variable,
[69]
we're sampling from a
normal distribution,
[72]
a standardized normal
distribution that
[74]
looks like this.
[75]
Mean of 0 and then a variance
of 1, which would also mean,
[79]
of course, a standard
deviation of 1.
[82]
So that could be the standard
deviation, or the variance,
[86]
or the standard deviation,
that would be equal to 1.
[90]
So a chi-square
distribution, if you just
[91]
take one of these
random variables--
[94]
and let me define it this way.
[95]
Let me define a new
random variable.
[98]
Let me define a
new random variable
[100]
Q that is equal to--
you're essentially
[107]
sampling from this the
standard normal distribution
[110]
and then squaring
whatever number you got.
[112]
So it is equal to this
random variable X squared.
[123]
The distribution for this
random variable right
[126]
here is going to be an example
of the chi-square distribution.
[129]
Actually what we're going
to see in this video is
[131]
that the chi-square, or the
chi-squared distribution
[134]
is actually a set
of distributions
[136]
depending on how
many sums you have.
[138]
Right now, we only have
one random variable
[141]
that we're squaring.
[142]
So this is just one
of the examples.
[145]
And we'll talk more
about them in a second.
[147]
So this right
here, this we could
[149]
write that Q is a chi-squared
distributed random variable.
[155]
Or that we could use
this notation right here.
[157]
Q is-- we could
write it like this.
[160]
So this isn't an X anymore.
[162]
This is the Greek
letter chi, although it
[163]
looks a lot like a curvy X. So
it's a member of chi-squared.
[169]
And since we're only
taking one sum over here--
[171]
we're only taking the
sum of one independent,
[174]
normally distributed, standard
or normally distributed
[177]
variable, we say that this
only has 1 degree of freedom.
[182]
And we write that over here.
[183]
So this right here is
our degree of freedom.
[189]
We have 1 degree of
freedom right over there.
[194]
So let's call this Q1.
[196]
Let's say I have
another random variable.
[199]
Let's call this Q-- let me
do it in a different color.
[201]
Let me do Q2 in blue.
[204]
Let's say I have another
random variable, Q2,
[206]
that is defined as-- let's say I
have one independent, standard,
[211]
normally distributed variable.
[213]
I'll call that X1.
[214]
And I square it.
[215]
And then I have another
independent, standard,
[224]
normally distributed
variable, X2.
[227]
And I square it.
[228]
So you could imagine
both of these guys
[229]
have distributions like this.
[231]
And they're independent.
[232]
So get to sample Q2,
you essentially sample
[237]
X1 from this distribution,
square that value, sample X2
[241]
from the same distribution,
essentially, square that value,
[245]
and then add the two.
[246]
And you're going to get Q2.
[248]
This over here-- here we
would write-- so this is Q1.
[252]
Q2 here, Q2 we would
write is a chi-squared,
[256]
distributed random variable
with 2 degrees of freedom.
[261]
Right here.
[262]
2 degrees of freedom.
[267]
And just to visualize
kind of the set
[269]
of chi-squared distributions,
let's look at this over here.
[273]
So this, I got this
off of Wikipedia.
[276]
This shows us some of the
probability density functions
[279]
for some of the
chi-square distributions.
[282]
This first one over here,
for k of equal to 1,
[285]
that's the degrees of freedom.
[287]
So this is essentially our Q1.
[289]
This is our probability
density function for Q1.
[292]
And notice it really
spikes close to 0.
[294]
And that makes sense.
[295]
Because if you are
sampling just once
[299]
from this standard
normal distribution,
[302]
there's a very high
likelihood that you're
[304]
going to get something
pretty close to 0.
[306]
And then if you square
something close to 0-- remember,
[308]
these are decimals, they're
going to be less than 1,
[311]
pretty close to 0-- it's
going to become even smaller.
[314]
So you have a high probability
of getting a very small value.
[317]
You have high probabilities
of getting values
[319]
less than some threshold,
this right here, less than,
[323]
I guess, this is 1 right here.
[324]
So the less than 1/2.
[325]
And you have a very
low probability
[327]
of getting a large number.
[328]
I mean, to get a 4, you
would have to sample a 2
[334]
from this distribution.
[334]
And we know that 2 is--
actually it's 2 variances
[339]
or 2 standard deviations
from the mean.
[341]
So it's less likely.
[342]
And actually that's to get a 4.
[345]
So to get even
larger numbers are
[346]
going to be even less likely.
[349]
So that's why you see
this shape over here.
[351]
Now when you have 2
degrees of freedom,
[353]
it moderates a little bit.
[354]
This is the shape,
this blue line right
[359]
here is the shape of Q2.
[361]
And notice you're a little
bit less likely to get values
[364]
close to 0 and a little bit
more likely to get numbers
[367]
further out.
[368]
But it still is kind of
shifted or heavily weighted
[371]
towards small numbers.
[372]
And then if we had
another random variable,
[374]
another chi-squared
distributed random variable--
[379]
so then we have, let's say, Q3.
[381]
And let's define
it as the sum of 3
[385]
of these independent
variables, each of them
[389]
that have a standard
normal distribution.
[391]
So X1, X2 squared
plus X3 squared.
[396]
Then all of a sudden, our
Q3-- this is Q2 right here--
[399]
has a chi-squared distribution
with 3 degrees of freedom.
[404]
And so this guy
right over here--
[405]
that will be this green line.
[407]
Maybe I should have
done this in green.
[408]
This will be this
green line over here.
[411]
And then notice,
now it's starting
[412]
to become a little
bit more likely
[414]
that you'd get values
in this range over here
[416]
because you're taking the sum.
[417]
Each of these are going
to be pretty small values,
[419]
but you're taking the sum.
[420]
So it starts to shift it a
little over to the right.
[422]
And so the more degrees
of freedom you have,
[424]
the further this lump
starts to move to the right
[427]
and, to some degree, the
more symmetric it gets.
[429]
And what's interesting
about this,
[431]
I guess it's different than
almost every other distribution
[433]
we've looked at, although
we've looked at others that
[436]
have this property as well,
is that you can't have
[438]
a value below 0 because
we're always just squaring
[441]
these values.
[442]
Each of these guys can
have values below 0.
[444]
They're normally distributed.
[445]
They could have negative values.
[446]
But since we're squaring and
taking the sum of squares,
[449]
this is always going
to be positive.
[450]
And the place that this
is going to be useful--
[452]
and we're going to see
in the next few videos--
[454]
is in measuring essentially
error from an expected value.
[458]
And if you took take
this total error,
[460]
you can figure out
the probability
[461]
of getting that total error
if you assume some parameters.
[468]
And we'll talk more about
it in the next video.
[470]
Now with that said, I
just want to show you
[472]
how to read a chi-squared
distribution table.
[476]
So if I were to ask you, if
this is our distribution--
[481]
let me pick this
blue one right here.
[483]
So over here, we have
2 degrees of freedom
[484]
because we're adding 2
of these guys right here.
[487]
If I were to ask you, what is
the probability of Q2 being
[495]
greater than-- or, let
me put it this way.
[501]
What is the probability of
Q2 being greater than 2.41?
[507]
And I'm picking that
value for a reason.
[508]
So I want the probability of
Q2 being greater than 2.41.
[514]
What I want to do is I'll
look at a chi-square table
[519]
like this.
[520]
Q2 is a version of chi-squared
with 2 degrees of freedom.
[524]
So I look at this row right
here under 2 degrees of freedom.
[528]
And I want the probability of
getting a value above 2.41.
[532]
And I picked 2.41 because
it's actually at this table.
[536]
And so most of these
chi-squared-- the reason
[538]
why we have these weird
numbers like this instead
[540]
of whole numbers or
easy-to-read fractions
[542]
is it is actually
driven by the p value.
[544]
It's driven by the
probability of getting
[546]
something larger
than that value.
[548]
So normally you would
look at the other way.
[550]
You'd say, OK, if I want to
say, what chi-squared value
[555]
for 2 degrees of
freedom, there's
[557]
a 30% chance of getting
something larger than that?
[560]
Then I would look up 2.41.
[561]
But I'm doing it
the other way just
[563]
for the sake of this video.
[564]
So if I want the probability
of this random variable
[567]
right here being greater
than 2.41, or its p value,
[572]
we read it right here.
[573]
It is 30%.
[575]
And just to visualize
it on this chart,
[577]
this chi-square distribution--
this was Q2, the blue one,
[580]
over here-- 2.41 is
going to sit-- let's see.
[582]
This is 3.
[584]
This is 2.5.
[585]
So 2.41 is going to be
someplace right around here.
[589]
So essentially,
what that table is
[590]
telling us is, this entire
area under this blue line right
[598]
here, what is that?
[599]
And that right there is
going to be 30% of-- well,
[603]
it's going to be 0.3.
[607]
Or you could view it as
30% of the entire area
[609]
under this curve, because
obviously all the probabilities
[612]
have to add up to 1.
[613]
So that's our intro to the
chi-square distribution.
[615]
In the next video,
we're actually
[617]
going to use it to make some,
or to test some, inferences.
Most Recent Videos:
You can go back to the homepage right here: Homepage





