Central limit theorem | Inferential statistics | Probability and Statistics | Khan Academy - YouTube

Channel: Khan Academy

[0]
In this video, I want to talk about what is easily
[3]
one of the most fundamental and profound concepts in statistics
[6]
and maybe in all of mathematics.
[8]
And that's the central limit theorem.
[16]
And what it tells us is we can start off
[18]
with any distribution that has a well-defined mean and
[21]
variance-- and if it has a well-defined variance,
[23]
it has a well-defined standard deviation.
[25]
And it could be a continuous distribution or a discrete one.
[27]
I'll draw a discrete one, just because it's easier
[29]
to imagine, at least for the purposes of this video.
[33]
So let's say I have a discrete probability distribution
[36]
function.
[37]
And I want to be very careful not
[38]
to make it look anything close to a normal distribution.
[41]
Because I want to show you the power of the central limit
[43]
theorem.
[44]
So let's say I have a distribution.
[45]
Let's say it could take on values 1 through 6.
[47]
1, 2, 3, 4, 5, 6.
[50]
It's some kind of crazy dice.
[52]
It's very likely to get a one.
[54]
Let's say it's impossible-- well,
[55]
let me make that a straight line.
[56]
You have a very high likelihood of getting a 1.
[58]
Let's say it's impossible to get a 2.
[60]
Let's say it's an OK likelihood of getting a 3 or a 4.
[63]
Let's say it's impossible to get a 5.
[64]
And let's say it's very likely to get a 6 like that.
[67]
So that's my probability distribution function.
[70]
If I were to draw a mean-- this the symmetric,
[72]
so maybe the mean would be something like that.
[74]
The mean would be halfway.
[76]
So that would be my mean right there.
[77]
The standard deviation maybe would
[79]
look-- it would be that far and that
[80]
far above and below the mean.
[82]
But that's my discrete probability distribution
[85]
function.
[86]
Now what I'm going to do here, instead of just taking
[88]
samples of this random variable that's
[90]
described by this probability distribution function,
[93]
I'm going to take samples of it.
[95]
But I'm going to average the samples
[97]
and then look at those samples and see
[99]
the frequency of the averages that I get.
[101]
And when I say average, I mean the mean.
[104]
Let me define something.
[105]
Let's say my sample size-- and I could put any number here.
[108]
But let's say first off we try a sample size of n is equal to 4.
[117]
And what that means is I'm going to take four samples from this.
[120]
So let's say the first time I take four samples--
[123]
so my sample sizes is four-- let's say I get a 1.
[125]
Let's say I get another 1.
[127]
And let's say I get a 3.
[129]
And I get a 6.
[130]
So that right there is my first sample of sample size 4.
[134]
I know the terminology can get confusing.
[136]
Because this is the sample that's made up of four samples.
[139]
But then when we talk about the sample mean and the sampling
[143]
distribution of the sample mean, which we're
[145]
going to talk more and more about over the next few videos,
[147]
normally the sample refers to the set of samples
[152]
from your distribution.
[153]
And the sample size tells you how many you actually
[155]
took from your distribution.
[157]
But the terminology can be very confusing,
[159]
because you could easily view one of these as a sample.
[162]
But we're taking four samples from here.
[164]
We have a sample size of four.
[165]
And what I'm going to do is I'm going to average them.
[168]
So let's say the mean-- I want to be very careful when
[170]
I say average.
[171]
The mean of this first sample of size 4 is what?
[175]
1 plus 1 is 2.
[176]
2 plus 3 is 5.
[178]
5 plus 6 is 11.
[179]
11 divided by 4 is 2.75.
[186]
That is my first sample mean for my first sample of size 4.
[191]
Let me do another one.
[192]
My second sample of size 4, let's say that I get a 3, a 4.
[199]
Let's say I get another 3.
[200]
And let's say I get a 1.
[201]
I just didn't happen to get a 6 that time.
[203]
And notice I can't get a 2 or a 5.
[205]
It's impossible for this distribution.
[207]
The chance of getting a 2 or 5 is 0.
[208]
So I can't have any 2s or 5s over here.
[211]
So for the second sample of sample size 4,
[217]
my second sample mean is going to be 3 plus 4 is 7.
[222]
7 plus 3 is 10 plus 1 is 11.
[226]
11 divided by 4, once again, is 2.75.
[229]
Let me do one more, because I really
[231]
want to make it clear what we're doing here.
[233]
So I do one more.
[233]
Actually, we're going to do a gazillion more.
[235]
But let me just do one more in detail.
[237]
So let's say my third sample of sample size 4--
[241]
so I'm going to literally take 4 samples.
[243]
So my sample is made up of 4 samples
[245]
from this original crazy distribution.
[248]
Let's say I get a 1, a 1, and a 6 and a 6.
[253]
And so my third sample mean is going to be 1 plus 1 is 2.
[258]
2 plus 6 is 8.
[260]
8 plus 6 is 14.
[261]
14 divided by 4 is 3 and 1/2.
[269]
And as I find each of these sample
[272]
means-- so for each of my samples of sample size 4,
[275]
I figure out a mean.
[276]
And as I do each of them, I'm going
[278]
to plot it on a frequency distribution.
[280]
And this is all going to amaze you in a few seconds.
[284]
So I plot this all on a frequency distribution.
[286]
So I say, OK, on my first sample,
[289]
my first sample mean was 2.75.
[292]
So I'm plotting the actual frequency of the sample
[294]
means I get for each sample.
[295]
So 2.75, I got it one time.
[298]
So I'll put a little plot there.
[299]
So that's from that one right there.
[302]
And the next time, I also got a 2.75.
[304]
That's a 2.75 there.
[306]
So I got it twice.
[308]
So I'll plot the frequency right there.
[310]
Then I got a 3 and 1/2.
[311]
So all the possible values, I could have a three,
[313]
I could have a 3.25, I could have a 3 and 1/2.
[316]
So then I have the 3 and 1/2, so I'll plot it right there.
[319]
And what I'm going to do is I'm going
[320]
to keep taking these samples.
[322]
Maybe I'll take 10,000 of them.
[325]
So I'm going to keep taking these samples.
[327]
So I go all the way to S 10,000.
[329]
I just do a bunch of these.
[331]
And what it's going to look like over time is each of these--
[333]
I'm going to make it a dot, because I'm
[335]
going to have to zoom out.
[337]
So if I look at it like this, over time-- it still
[341]
has all the values that it might be able to take on,
[343]
2.75 might be here.
[345]
So this first dot is going to be-- this one
[348]
right here is going to be right there.
[350]
And that second one is going to be right there.
[352]
Then that one at 3.5 is going to look right there.
[356]
But I'm going to do it 10,000 times.
[357]
Because I'm going to have 10,000 dots.
[359]
And let's say as I do it, I'm going just keep plotting them.
[361]
I'm just going to keep plotting the frequencies.
[364]
I'm just going to keep plotting them
[365]
over and over and over again.
[368]
And what you're going to see is, as I take
[369]
many, many samples of size 4, I'm
[372]
going to have something that's going
[374]
to start kind of approximating a normal distribution.
[378]
So each of these dots represent an incidence of a sample mean.
[382]
So as I keep adding on this column right here,
[384]
that means I kept getting the sample mean 2.75.
[387]
So over time.
[388]
I'm going to have something that's
[390]
starting to approximate a normal distribution.
[392]
And that is a neat thing about the central limit theorem.
[399]
So an orange, that's the case for n is equal to 4.
[402]
This was a sample size of 4.
[405]
Now, if I did the same thing with a sample size of maybe
[408]
20-- so in this case, instead of just taking 4 samples
[411]
from my original crazy distribution, every sample
[415]
I take 20 instances of my random variable,
[418]
and I average those 20.
[420]
And then I plot the sample mean on here.
[422]
So in that case, I'm going to have
[424]
a distribution that looks like this.
[426]
And we'll discuss this in more videos.
[428]
But it turns out if I were to plot 10,000 of the sample
[432]
means here, I'm going to have something
[434]
that, two things-- it's going to even more closely approximate
[437]
a normal distribution.
[438]
And we're going to see in future videos,
[440]
it's actually going to have a smaller-- well,
[442]
let me be clear.
[443]
It's going to have the same mean.
[445]
So that's the mean.
[446]
This is going to have the same mean.
[448]
So it's going to have a smaller standard deviation.
[451]
Well, I should plot these from the bottom
[453]
because you kind of stack it.
[454]
One you get one, then another instance and another instance.
[457]
But this is going to more and more approach
[458]
a normal distribution.
[460]
So this is what's super cool about the central limit
[464]
theorem.
[465]
As your sample size becomes larger--
[473]
or you could even say as it approaches infinity.
[475]
But you really don't have to get that close
[476]
to infinity to really get close to a normal distribution.
[478]
Even if you have a sample size of 10 or 20,
[481]
you're already getting very close to a normal distribution,
[484]
in fact about as good an approximation
[486]
as we see in our everyday life.
[487]
But what's cool is we can start with some crazy distribution.
[491]
This has nothing to do with a normal distribution.
[495]
This was n equals 4, but if we have a sample size of n
[497]
equals 10 or n equals 100, and we
[499]
were to take 100 of these, instead of four here,
[502]
and average them and then plot that average,
[504]
the frequency of it, then we take 100 again, average them,
[507]
take the mean, plot that again, and if we
[508]
do that a bunch of times, in fact,
[510]
if we were to do that an infinite time,
[512]
we would find that we, especially
[513]
if we had an infinite sample size,
[515]
we would find a perfect normal distribution.
[517]
That's the crazy thing.
[519]
And it doesn't apply just to taking the sample mean.
[522]
Here we took the sample mean every time.
[524]
But you could have also taken the sample sum.
[526]
The central limit theorem would have still applied.
[528]
But that's what's so super useful about it.
[531]
Because in life, there's all sorts of processes out there,
[534]
proteins bumping into each other, people doing
[537]
crazy things, humans interacting in weird ways.
[540]
And you don't know the probability distribution
[542]
functions for any of those things.
[544]
But what the central limit theorem
[545]
tells us is if we add a bunch of those actions
[548]
together, assuming that they all have the same distribution,
[551]
or if we were to take the mean of all of those actions
[553]
together, and if we were to plot the frequency of those means,
[556]
we do get a normal distribution.
[558]
And that's frankly why the normal distribution shows up
[562]
so much in statistics and why, frankly, it's
[565]
a very good approximation for the sum
[568]
or the means of a lot of processes.
[571]
Normal distribution.
[573]
What I'm going to show you in the next video is I'm actually
[576]
going to show you that this is a reality, that as you increase
[579]
your sample size, as you increase your n,
[581]
and as you take a lot of sample means,
[583]
you're going to have a frequency plot that looks very, very
[585]
close to a normal distribution.