L18.4 The Weak Law of Large Numbers - YouTube

Channel: MIT OpenCourseWare

[0]
in this segment we derive and discuss
[3]
the weak law of large numbers it is a
[5]
rather simple result but plays a central
[7]
role within probability theory the
[9]
setting is as follows we start with some
[12]
probability distribution that has a
[14]
certain mean and variance which we
[16]
assume to be finite we then draw
[20]
independent random variables out of this
[23]
distribution so that these x i's are
[26]
independent and identically distributed
[29]
iid for short what's going on here is
[33]
that we are carrying out a long
[35]
experiment during which all of these
[37]
random variables are drawn once we have
[41]
drawn n of these random variables we can
[44]
calculate the average of the values that
[46]
have been obtained and this gives us the
[49]
so called sample mean
[52]
notice that the sample mean is a random
[55]
variable because it is a function of
[57]
random variables it should be
[60]
distinguished from the true mean mu
[62]
which is the expected value of the x i's
[66]
which is a number it is not random and
[70]
mu is some kind of average over all the
[74]
possible outcomes of the random variable
[77]
X I the sample mean is the simplest and
[81]
most natural way for trying to estimate
[83]
the true mean and the weak law of large
[86]
numbers will provide some support to
[88]
this notion let us now look at the
[91]
properties of the sample mean let us
[93]
calculate its expectation by the way
[96]
this object here involves two different
[99]
kinds of averaging the sample mean
[103]
averages over the values observed during
[107]
one long experiment whereas the
[111]
expectation averages over all possible
[116]
outcomes of this experiment the
[119]
expectation is some kind of theoretical
[120]
average because we do not get to observe
[124]
all the possible outcomes of this
[125]
experiment but the sample mean is
[128]
something that we actually calculate on
[130]
the basis of our observed
[132]
in any case the expected value of the
[135]
sample mean by linearity it is the
[139]
expected value of the numerator divided
[144]
by the denominator using linearity once
[148]
more the expected value of a sum is the
[150]
sum of the expected values and since
[153]
each one of those expected values is
[154]
equal to MU we obtain n times mu divided
[158]
by n which leaves us with mu so the
[162]
theoretical average the expected value
[166]
of the sample mean is equal to the true
[168]
mean let us now calculate the variance
[172]
of the sample mean the variance of a
[176]
random variable divided by a number is
[179]
the variance of that random variable
[182]
divided by the square of that number now
[191]
since the x i's are independent the
[193]
variance is the sum of the variances and
[196]
therefore we obtain n times the variance
[199]
of each one of them and after we
[203]
simplify this leaves us with Sigma
[206]
squared over n we are now in a position
[210]
to apply the chebyshev inequality the
[213]
chebyshev inequality tells us that the
[217]
distance of a random variable from its
[219]
mean being larger than a certain number
[222]
has a probability that's bounded above
[224]
by the variance of the random variable
[228]
of interest divided by the square of the
[232]
number that we have here we have already
[235]
calculated the variance and so this
[240]
quantity is Sigma squared over n times
[244]
epsilon squared and now if we consider
[248]
epsilon as a fixed number and let n go
[252]
to infinity then what we obtain is a
[256]
limiting value of zero
[261]
so the probability of falling far from
[264]
the mean diminishes to zero as we draw
[268]
more and more samples that's exactly
[271]
what the weak law of large number tells
[273]
us if we fix any particular epsilon
[276]
which is a positive constant the
[279]
probability that the sample mean falls
[281]
away from the true mean by more than
[284]
Epsilon
[285]
that probability becomes smaller and
[288]
smaller and converges to zero as n goes
[291]
to infinity
[292]
let us now interpret the weak law of
[295]
large numbers as I already hinted we
[300]
have to think in terms of one long
[303]
experiment and during that experiment we
[306]
draw several independent random
[308]
variables drawn from the same
[310]
distribution one way of thinking about
[312]
those random variables is that each one
[315]
of them is equal to the mean the true
[317]
mean plus some measurement noise which
[321]
is a term that has zero expected value
[324]
and all of these noises are independent
[327]
so we have a collection of noisy
[330]
measurements and then we take those
[332]
measurements and form the average of
[334]
them what the weak law of large numbers
[337]
tells us is that the sample mean is
[340]
unlikely to be far off from the true
[344]
mean and by far off we mean at least
[347]
epsilon distance away so the sample mean
[352]
is in some ways a good way of estimating
[355]
the true mean if n is large enough then
[359]
we have high confidence that the sample
[363]
mean gives us a value that's close to
[365]
the true mean as a special case let us
[368]
consider a probabilistic model in which
[371]
will repeat independently many times the
[373]
same experiment there's a certain event
[376]
a associated with that experiment that
[378]
has a certain probability and each time
[380]
that we carry out the experiment we use
[383]
an indicator variable to indicate
[386]
whether the outcome was inside the event
[390]
or outside the event so X I is 1
[395]
a occurs and it is zero otherwise the
[404]
expected value of the X is the true mean
[407]
in this case is equal to the number P in
[413]
this particular example the sample mean
[417]
just counts how many times the event a
[420]
occurred out of the n experiments that
[423]
we carried out so it's the frequency
[425]
with which the event a has occurred and
[428]
we call it the empirical frequency of
[431]
event a what the weak law of large
[433]
numbers tells us is that the empirical
[436]
frequency will be close to the
[439]
probability of that event in this sense
[443]
it reinforces or justifies the
[446]
interpretation of probabilities as
[449]
frequencies