馃攳
Statistics: Alternate variance formulas | Probability and Statistics | Khan Academy - YouTube
Channel: Khan Academy
[0]
I think now is as good a time as
any to play around a little bit
[3]
with the formula for variance
and see where it goes.
[7]
And I think just by doing this
we'll also get a little bit
[10]
better intuition of just
manipulating sigma notation,
[12]
or even what it means.
[13]
So we learned several times
that the formula for variance--
[18]
and let's just do
variance of a population.
[20]
It's almost the same thing
as variance of a sample.
[22]
You just divide by n
instead of n minus 1.
[25]
Variance of a population
is equal to-- well,
[27]
you take each of the
data points x sub i.
[32]
You subtract from that the mean.
[35]
You square it.
[36]
And then you take the
average of all of these.
[38]
So you add the squared distance
for each of these points from i
[43]
equals 1 to i is equal to n.
[46]
And you divide it by n.
[49]
So let's see what
happens if we can-- maybe
[51]
we want to multiply
out the squared term
[53]
and see where it takes us.
[55]
So let's see.
[56]
And I think it'll take
us someplace interesting.
[58]
So this is the same thing as the
sum from i is equal to 1 to n.
[68]
This, we just multiply it out.
[69]
This is the same thing as x
sub i squared minus-- this
[76]
is your little
algebra going on here.
[78]
So when you square it-- I
mean, we could multiply it out.
[81]
We could write it.
[82]
x sub i minus mu times
x sub i minus mu.
[90]
So we have x sub i times x
sub i, that's x sub i squared.
[93]
Then you have x sub
i times minus mu.
[98]
And then you have
minus mu times x sub i.
[100]
So when you add
those two together,
[102]
you get minus 2x sub i mu,
because you have it twice. x
[108]
sub i times mu, that's
1 minus x sub i mu.
[111]
And then you have another
one, minus mu x sub i.
[114]
When you add them together,
you get minus 2x sub i mu.
[117]
I know it's confusing with me
saying sub i and all of that.
[119]
But it's really no
different than when
[121]
you did a minus b squared.
[123]
Just the variables look a
little bit more complicated.
[125]
And then the last term is
minus mu times minus mu,
[129]
which is plus mu squared.
[134]
Fair enough.
[135]
Let me switch colors just
to keep it interesting.
[140]
Let me cordon that off.
[143]
The sum of this
is the same thing
[145]
as the sum of-- because
if you think about it,
[147]
we're going to
take each x sub i.
[148]
For each of the numbers
in our population,
[150]
we're going to
perform this thing.
[152]
And we're going to sum it up.
[153]
But if you think
about it, this is
[154]
the same thing
as-- if you're not
[156]
familiar with sigma notation
this is a good thing
[158]
to know in general, just
a little bit of intuition.
[160]
That this is the same thing as--
I'll do it here to have space.
[164]
The sum from i is equal to
1 to n of the first term,
[172]
x sub i squared minus--
and actually, we
[179]
can bring out the
constant terms.
[182]
When you're summing, the
only thing that matters
[185]
is the thing that
has the i-th term.
[187]
So in this case, it's x sub i.
[188]
So x sub 1, x sub 2.
[189]
So that's the
thing that you have
[191]
to leave on the right hand
side of the sigma notation.
[194]
And if you've done the
calculus playlists already,
[197]
sigma notation is really
like a discrete integral
[201]
on some level.
[202]
Because in an integral, you're
summing up a bunch of things
[205]
and you're multiplying
them times dx,
[207]
which is a really
small interval.
[208]
But here you're
just taking a sum.
[209]
And we showed in the
calculus playlist
[213]
that an integral actually
is this infinite sum
[215]
of infinitely
small things, but I
[216]
don't want to digress too much.
[218]
But this was just a long way
of saying that the sum from i
[221]
equals 1 to n of the second term
is the same thing as minus 2
[225]
times mu of the sum from i is
equal to 1 to n of x sub i.
[245]
And then finally,
you have plus-- well,
[250]
this is just a constant term.
[252]
This is just a constant term.
[253]
So you can take it out.
[254]
Times mu squared times the
sum from i equals 1 to n.
[264]
And what's going to be here?
[266]
It's going to be a 1.
[268]
We just divided a 1.
[269]
We just divided this by 1.
[270]
And took it out of the
sigma sign, out of the sum.
[272]
And you're just
left with a 1 there.
[276]
And actually, we could have
just left the mu squared there.
[278]
But either way, let's
just keep simplifying it.
[281]
So this we can't really do--
well, actually we could.
[284]
Well, no, we don't know
what the x sub i's are.
[287]
So we just have to
leave that the same.
[289]
So that's the sum.
[290]
Oh sorry, and this is
just the numerator.
[293]
This whole simplification, we're
just simplifying the numerator.
[295]
And later, we're just
going to divide by n.
[297]
So that is equal to
that divided by n,
[300]
which is equal to this
thing divided by n.
[302]
I'll divide by n at the end.
[304]
Because it's the numerator
that's the confusing part.
[306]
We just want to simplify
this term up here.
[310]
So let's keep doing this.
[311]
So this equals the sum
from i equals 1 to n
[315]
of x sub i squared.
[319]
And let's see, minus
2 times mu-- sorry,
[325]
that mu doesn't look good.
[327]
Edit, Undo, minus 2 times
mu times the sum from i
[335]
is equal to 1 to n of xi.
[342]
And then, what is this?
[343]
What is another
way to write this?
[345]
Essentially, we're going
to add 1 to itself n times.
[348]
This is saying, just look,
whatever you have here,
[351]
just iterate through it n times.
[353]
If you had an x sub
i here, you would
[355]
use the first x term,
then the second x term.
[357]
When you have a 1 here, this
is just essentially saying,
[360]
add one to itself n times,
which is the same thing as n.
[364]
So this is going to be
plus mu squared times n.
[376]
And then see if there's
anything else we can do here.
[381]
Remember, this was
just the numerator.
[383]
So this looks fine.
[385]
We add up each of those terms.
[387]
So we just have
minus 2 mu from i
[392]
equals 1 to-- oh well,
think about this.
[396]
What is this?
[397]
What is this thing right here?
[400]
Well actually, let's
bring back that n.
[404]
So this simplified
to that divided
[406]
by n, which simplifies to
that whole thing, which
[410]
is simplified to this
whole thing, divided by n,
[415]
which simplifies to this whole
thing divided by n, which
[420]
is the same thing as each of
the terms divided by n, which
[422]
is the same thing as that,
which is the same thing as that,
[427]
which is the same thing as that.
[430]
And now, well, how
does this simplify?
[432]
This is the interesting part.
[433]
Well, this, nothing
much I can do here.
[435]
So that just becomes the
sum from i is equal to 1
[442]
to n x sub i squared
divided by big N.
[448]
Now this is interesting.
[451]
If I take each of the terms in
my population and I add them up
[455]
and then I divide it
by n, what is that?
[457]
This thing right here?
[461]
If I sum up all of the
terms in my population
[463]
and divide by the number
of terms there are?
[465]
That's the mean, right?
[466]
That's the mean
of my population.
[468]
So this thing right
here is also mu.
[470]
So this thing
simplifies to what?
[475]
Minus 2 times what?
[478]
Mu times this whole
thing is mu too.
[480]
So times mu squared.
[482]
mu times mu, this is the
mean of the population.
[486]
So that was a nice
simplification.
[488]
And then plus-- what
do you have here?
[491]
Well let's see,
you have n over n.
[494]
Those cancel out.
[495]
So we just have plus mu squared.
[496]
So that was a very
nice simplification.
[498]
And then this simplifies to--
can't do much on this side.
[501]
So the sum from i is equal to 1
to n of x sub i squared over n.
[516]
And then you see, we have minus
2 mu squared plus mu squared.
[519]
Well, that's the same
thing as minus mu squared.
[527]
Minus the mean squared.
[529]
So this already we've
come up with a neat way
[533]
of writing the variance.
[538]
You can essentially take the
average of the squares of all
[541]
of the numbers in this
case, a population,
[543]
and then subtract from
that the mean squared
[548]
of your population.
[549]
So this could be, depending
on you're calculating things,
[551]
maybe a slightly faster way
of calculating the variance.
[555]
So just playing with a little
algebra, we got from this thing
[557]
where you have to each time
take each of your data points,
[559]
subtract the mean from
it, and then squared.
[561]
And of course, before
you have to do anything
[563]
you have to calculate the mean.
[565]
And you take the square.
[566]
And then you sum them all up.
[567]
Then you take the
average, essentially,
[568]
when you sum and divided by n.
[570]
We've simplified it just
using a little bit of algebra
[573]
to this formula.
[574]
We're getting to something
called the raw score method.
[577]
And what we want to do is write
this right here just in terms
[580]
of xi's.
[581]
And then we really are
what you call the raw score
[583]
method, which is
oftentimes a faster
[585]
way of calculating the variance.
[587]
So let's see what
is mu equal to?
[589]
What is the mean?
[591]
The mean is just equal to the
sum from i is equal to 1 to n
[598]
of each of the terms-- you
just take the sum of each
[600]
of the terms-- and you divide by
the number of terms there are.
[607]
So if we look at this
thing, this thing
[609]
can be written as-- let
me draw a line here.
[614]
This thing can be written as the
sum from i is equal to 1 to n
[622]
of x sub 1 squared all of
that over n minus mu squared.
[630]
Well, mu is this.
[631]
So this thing squared is what?
[635]
This is x sub i take
the sum up to n.
[646]
i is equal to 1.
[648]
You're going to
square this thing.
[651]
And then you're going to divide
it by-- we squared, right?
[654]
You divide it by n squared.
[658]
And this might seem like a
more-- out of all of them,
[661]
this actually seems like
the simplest formula for me.
[664]
Where you essentially
just take--
[666]
if you know the mean
of your population--
[669]
you just say, OK,
my mean is whatever
[670]
and I can just square that.
[672]
And just put that
aside for a second.
[674]
But first, I can just
take each of the numbers,
[676]
square them, and
then sum them up,
[677]
and divide by the number
of numbers I have.
[681]
I don't know if I
wrote-- no, I've
[682]
erased the last set of numbers.
[684]
But we could show
you that you'll
[685]
get to the same variance.
[686]
So to me, this is almost
the simplest formula.
[688]
But this one's even
faster in a lot of ways
[690]
because you don't really
have to even calculate
[692]
the mean ahead of time.
[694]
You can just say,
OK, for each xi
[696]
I just perform this operation.
[698]
And then I divide by n
squared or n accordingly.
[700]
And I'll also get
to the variance.
[702]
So you don't have to do
this calculation before you
[704]
figure out the whole variance.
[705]
But anyway, I thought it would
be instructive and hopefully
[708]
give you a little bit more
intuition behind the algebra
[710]
dealing with sigma
if we worked out
[713]
these other ways
to write variances.
[714]
And frankly, some books
will just say, oh yeah,
[716]
you know what?
[717]
The variance could
be written like this.
[719]
We're talking about the
variance of a population.
[721]
Or it could be
written like this,
[722]
or maybe they'll even
write it like this.
[724]
And it's good to know
that you can just
[726]
do a little simple
algebraic manipulation
[730]
and get from one to the other.
[731]
Anyway, I've run out of time.
[733]
See you in the next video.
Most Recent Videos:
You can go back to the homepage right here: Homepage





