4.7.1 Law Of Large Numbers: Video - YouTube

Channel: MIT OpenCourseWare

[0]
the law of large numbers gives a precise
[3]
formal statement of the basic intuitive
[6]
idea that underlies probability theory
[8]
and in particular our interest in random
[11]
variables and their expectations their
[13]
means so let's begin by asking what the
[18]
mean means why are we so interested in
[20]
it for example if you roll a fair die
[23]
with heads with faces one through six
[26]
the mean value it's expected value is
[30]
three and a half and you'll never roll
[32]
three and a half because there is no
[33]
three and a half face so why do we care
[36]
about what this mean is if we're never
[37]
going to roll it and the answer is that
[39]
we believe that after many rolls if we
[43]
take the average of the numbers that
[46]
show on the dice that average is going
[48]
to be near the mean it's gonna be near
[51]
three and a half let's look at an even
[53]
more basic example we if it's a fair die
[57]
the probability of rolling a six as with
[60]
any other number is 1/6 and the very
[63]
meaning of the fact that the probability
[66]
of rolling a six is 1/6 is that we
[69]
expect that if you roll a lot of times
[72]
if you roll about n times the fraction
[77]
of sixes is going to be around and over
[79]
six the fraction of six is going to be
[82]
about one-sixth that of n rolls you'll
[85]
get about n over six sixes that's that's
[89]
almost the definition or the intuitive
[92]
idea behind what we mean when we assign
[95]
probability to some outcome it's that if
[98]
we did it repeatedly the fraction of
[101]
times that it came up would be equal to
[102]
its probability or at least closely
[104]
equal to it in the long run
[106]
so let's look at what Jacob Bernoulli
[108]
who is the discoverer of the law of
[110]
large numbers had to say on the subject
[111]
he was born in 1659 died in 1705 and his
[117]
famous book the art of guessing are as
[119]
konjac Tandy was actually published
[121]
posthumously by his cousin and he said
[126]
Bernoulli says even the stupidest man by
[129]
some instinct of nature per se and by no
[131]
previous instruction this is truly a
[134]
knows for sure that the more
[136]
observations that are taken the less the
[139]
danger will be of straying from the mark
[142]
all right what does he mean well like
[146]
it's what we said a moment ago if you
[149]
roll the fair die n times and the
[151]
probability of a roll of is 1/6 then the
[154]
average number of sixes which is the
[156]
number of sixes rolled divided by n we
[160]
believe intuitively that that number is
[163]
gonna approach 1/6 as n approaches
[166]
infinity
[167]
that's what Bernoulli is saying that
[169]
everybody understands that they
[170]
intuitively are sure of it and who knows
[172]
how they figure that out but that's what
[174]
everyone thinks and he might be right
[176]
now of course when you're doing this
[180]
experiment of rolling n times and
[183]
Counting the number of six and seeing if
[185]
the fraction is close to 1/6 you might
[187]
be unlucky and it's possible that you'd
[189]
get an average that actually was way off
[191]
1/6 but that would be unlucky and the
[194]
question is how unlikely is it B is it
[198]
to be that you'd get a number of a
[203]
fraction of sixes that wasn't really
[205]
close to 1/6 and with the law of large
[207]
numbers is getting a grip on that and in
[209]
fact subsequently we'll get a more even
[211]
quantitative grip on it which will be
[213]
crucial for applications in sampling and
[216]
hypothesis testing but let's go on so
[221]
let's look at some actual numbers which
[222]
I calculated if you roll a die n times
[227]
where n is 6 6600 1,200 3,000 or 6,000
[232]
the probability that you're going to be
[234]
within 10% of the expected number of
[238]
sixes is given here so it turns out of
[241]
course that in order to be within 10 if
[243]
you're gonna roll six times the only way
[245]
to be within 10% of the one expected six
[249]
that you should roll is to roll exactly
[253]
one six in six tries and the probability
[256]
of that is about 40% point four as you
[259]
can check yourself easily then it turns
[262]
out that if you flip a roll
[265]
60 times
[267]
probability of being between 66 sorry
[272]
the expected number in 60 rolls is going
[274]
to be 10 so the probability of there
[279]
being within 10 percent of 10 or 9 to 11
[284]
sixes is 0.26 and likewise the
[288]
probability of there being within 10
[291]
percent of 100 which is the expected
[294]
number of sixes when you roll 600 times
[296]
is 0.72 and so on until finally the
[300]
probability of being within 10 percent
[303]
of a thousand which is the expected
[306]
number when you roll 6,000 times that is
[308]
between 900 and 1106 is in 6,000 rolls
[313]
is point nine nine nine triple nines in
[318]
fact it's a little bit bigger so it's
[320]
really only about one chance in a
[323]
thousand that your number of sixes won't
[328]
fall in that interval within ten percent
[330]
of the expected number well suppose I
[333]
asked for a tighter tolerance and I'd
[336]
like to know what's the probability of
[337]
being within five percent well first of
[339]
all notice of course that as the number
[341]
of rolls get larger the probability of
[345]
being in this given interval is of
[347]
getting getting higher and higher which
[349]
is what Bernoulli said and what we
[350]
intuitively believe the more rolls the
[354]
more likely you are to be close to what
[356]
you expect if you tighten the tolerance
[360]
of course then the probabilities wind up
[364]
getting smaller that you'll do so well
[368]
so if you want to be within 5% of the
[371]
average in six rolls it means you still
[374]
have to roll exactly 1/6 which means the
[376]
probability is still 0.4 but if you're
[379]
trying to be within 5% of the expected
[383]
number 10 and 60 rolls meaning between 5
[387]
and 15 that probability is only 0.14
[390]
compared to the probability of 0.26 of
[393]
being within 10% and if we jump down
[396]
here say to 3,000 rolls
[400]
the probability of being within 10% of
[403]
500 which is the expected number in
[405]
3,000 roles within 10% is 98% 9 8 but
[410]
within being within 5% of 500 it's 0.78
[416]
or about 3 little over 3/4 so what does
[420]
that tell us well it means that if you
[422]
enrolled three thousand times and you
[424]
did not get within ten percent of the
[427]
expected number of 500 that is you did
[429]
not get in the interval between 450 and
[431]
550 sixes you could be 98% confident
[436]
that your die is loaded it's not
[439]
weighted 1/6 to show a 6 and similarly
[443]
if you did not get within 425 and for
[446]
525 a 6 is in 3,000 rolls you can be 78%
[452]
sure that your die is loaded and this is
[457]
exactly why the law of large numbers is
[461]
so important to us because it allows us
[464]
to do an experiment and then assess
[467]
whether what we think is true is
[470]
verified by the outcome that we got in
[474]
this experiment all right let's go on to
[478]
see what else Bernoulli was concerned
[481]
with in his time it certainly remains to
[484]
be inquired whether after the number of
[486]
observations has been increased the
[488]
probability of obtaining the true ratio
[490]
finally exceeds any given degree of
[492]
certainty or whether the problem has so
[496]
to speak its own asymptotes that is
[497]
whether some degree of certainty is
[499]
given which one can never exceed now
[503]
that's a 17th century English that maybe
[508]
a little bit hard to parse so let's
[511]
translate it into math language what is
[515]
it that Bernoulli is asking so what
[518]
Bernoulli means is that he wants to
[521]
think about taking a random variable R
[523]
with an an expectation or mean of mu and
[526]
he wants to make n trial observations of
[530]
R and take the average of those
[532]
observations
[533]
and see how close they are to you all
[536]
right what is it what is making any
[538]
trial observations mean well formally
[540]
the way we're gonna capture it is we're
[542]
gonna think of having um a bunch of
[545]
mutually independent identically
[547]
distributed random variables r1 through
[549]
RN this phrase identically independent
[552]
identically distributed comes up so
[554]
often that there's a standard
[555]
abbreviation iid random variables so
[558]
we're going to have n of them and think
[560]
of those as being the n observations
[563]
that we make of a given random variable
[565]
on so r1 through RN each have exactly
[568]
the same distribution is our and they're
[570]
mutually independent and again since
[573]
they're they have identical
[575]
distributions they all have the same
[576]
mean mu as the random variable R that we
[580]
were trying to investigate so we model n
[582]
independent trials repeated trials by
[585]
saying that we have n random variables
[588]
that are iid okay now what Bernoulli is
[592]
proposing is that you take the average
[594]
of those n random variables so you take
[597]
R 1 through some of our 1 or 2 up
[600]
through R and then divide by n that's
[601]
the average value call that a sub n the
[604]
average of the n observations of the n
[607]
roll's
[607]
and Bernoulli's question is is this
[609]
average probably close to the mean mu if
[613]
n is big what exactly does that mean
[616]
probably close to MU means that the
[619]
probability that the distance between
[621]
the average and mu is less than or equal
[624]
to Delta is what so Delta is talking
[628]
about how close you are
[630]
don't they as a parameter we expect it's
[632]
got to be positive we're asking think of
[634]
whatever close means to you does it mean
[638]
0.1 does it mean point o 1 what amount
[642]
would persuade you that um that the
[646]
average was close to what it ought to be
[649]
and we ask then whether the distance
[652]
between the average and the mean is
[655]
close less than or equal to Delta and
[658]
Bernoulli wants to know what is the
[661]
probability of that
[665]
and what he goes on to say is therefore
[668]
this is the problem which I now set
[670]
forth and make known after I have
[672]
pondered it pondered over it for 20
[674]
years both its novelty and its very
[676]
great usefulness coupled with its great
[678]
difficulty can exceed in weight and
[681]
value all the remaining chapters of this
[684]
thesis now Bernoulli was right on about
[686]
the usefulness of this result at least
[690]
in its quantitative form and at the time
[694]
it was really pretty difficult for him
[696]
it took him like 200 pages to complete
[698]
his proof in ARS konjac tandy nowadays
[702]
we are going to do it in about a lecture
[704]
worth of material and you'll be seeing
[708]
that in some subsequent video segments
[710]
so that's what happens with 300 years
[712]
two or three hundred and fifty years to
[714]
tune up a result what took 200 pages
[716]
then now takes 10 or less pages in fact
[720]
if it was really concise it could be
[722]
done in a few in three pages alright so
[725]
again coming back to Bernoulli's
[727]
question Bernoulli's question is what is
[729]
the probability that the average that
[731]
the distance between the average and the
[733]
mean is less than or equal to Delta as
[736]
you take more and more tries as n goes
[740]
to infinity and Bernoulli's answer to
[742]
the question is that the probability is
[744]
1 that is if you want to be have a
[749]
certain degree of certainty of being
[751]
close to the mean within though if you
[754]
take enough trials you can be as certain
[757]
as you want that you'll be as close as
[759]
you want and that is called the weak law
[762]
of large numbers and it's one of the
[766]
basic transcendent rules and theorems of
[770]
probability theory it's usually stated
[773]
in the other way as that the limit of
[775]
the probability that the average is a
[778]
distance away from the mean Delta is 0
[782]
it's the probability that it's it's
[785]
extremely unlikely it can be as unlikely
[787]
as you want to make it that it's more
[790]
than any given tolerance from the mean
[792]
as if you take a large enough number of
[795]
trials now in this form it's not yet
[798]
really use
[798]
this is a romantic qualitative limit
[801]
limiting result and to really use it you
[804]
need to know something rather about the
[805]
rate at which it approaches the limit
[806]
which is what we're going to be seeing
[808]
in a subsequent video and in fact the
[812]
proof of this is going to follow easily
[814]
from the chebyshev inequality bound and
[818]
variance properties when we go about
[821]
trying to get the quantitative version
[823]
that explains the rate at which the
[825]
limit is approached