The standard error, Clearly Explained!!! - YouTube

Channel: StatQuest with Josh Starmer

[0]
step quest step quest stack quest
[7]
hello and welcome to stab quest this
[10]
time we're gonna talk about standard
[12]
errors and we're also gonna have a
[14]
bootstrapping bonus we'll start by
[17]
talking about error bars which are very
[19]
closely related to standard errors for
[23]
example you might collect measurements
[25]
from three samples labeled a B and C and
[29]
plot them on a scatter plot just like we
[31]
see here
[33]
you could then calculate the means for
[36]
the three data sets and we Illustrated
[38]
those here with three green horizontal
[41]
bars approximately halfway up the
[44]
clusters of data points after that we
[48]
could calculate the standard deviations
[50]
and add those to the graph and we've
[51]
shown those here with red error bars in
[55]
manuscripts and presentations people
[58]
often don't display the original data
[60]
but instead just show the mean in the
[62]
standard deviation and what's called a
[64]
dynamite plot because each column in the
[67]
plot looks like it's the igniter for a
[69]
stick of dynamite
[72]
there are three common types of error
[75]
bars the first type of standard
[77]
deviations which we just saw and I'm
[80]
sure you're all familiar with these tell
[82]
you how the data are distributed around
[84]
the mean big standard deviations tell
[87]
you that some of the data points were
[88]
pretty far from the mean in most cases
[91]
you want to use standard deviations in
[93]
your graphs since it tells us about your
[95]
data the data points that you collected
[98]
yourself the second type of error bar
[101]
comes from standard errors these tell
[104]
you how the mean is distributed not just
[107]
the data but the means which sounds
[109]
crazy but it'll become clear once I draw
[112]
some pictures the third common type of
[115]
error bar are confidence intervals and
[118]
these are related to standard errors
[120]
confidence intervals will be explained
[122]
more in a future stat quest since this
[125]
stat quest is all about standard errors
[128]
that's what we're going to talk about
[131]
let's start by considering a normal
[134]
distribution in this case we can imagine
[137]
that we weighed a lot of mice and
[138]
plotted the distribution of differences
[141]
from the mean the y-axis is the
[145]
proportion of the mice that we weighed
[146]
in the x-axis is the difference from the
[149]
mean most of the mice had weights close
[153]
to the average a few of the mice weighed
[156]
much less than the average Mouse and a
[159]
few other mice weighed much more than
[161]
the average Mouse usually you can't
[165]
afford to measure the weight of all the
[167]
mice so you just take a sample in this
[170]
example we'll just assume we took five
[172]
measurements from the population rather
[175]
than measuring all the mice since most
[178]
of the mice have weights close to the
[180]
average most of our samples are going to
[183]
be close to zero now just like we always
[187]
do we can calculate the mean and
[189]
standard deviation from our sample in
[191]
this case the mean of our sample is
[195]
minus point two and the standard
[197]
deviation is one point nine to three and
[199]
we can plot the mean and standard
[202]
deviation on our graph as the mean
[205]
plus or minus the standard deviation
[207]
around the mean and for all you stat
[210]
Questers out there here's a rule of
[212]
thumb remember that one standard
[215]
deviation on each side of the mean is
[217]
supposed to cover about 68% of the data
[219]
two standard deviations on each side of
[222]
the mean is supposed to cover about 95
[224]
percent of the data this will come in
[226]
handy later
[228]
the mean is now a lighter color because
[231]
we're going to take additional samples
[233]
and overlay additional means and
[235]
standard deviations on this same graph
[237]
here we've taken another five
[240]
measurements and from those five
[242]
measurements we've calculated the mean
[244]
and the standard deviation and here
[247]
we've plotted that mean plus or minus
[249]
one standard deviation on each side and
[252]
now we take another five measurements
[254]
this is the first sample or one of the
[257]
measurements is relatively extreme
[259]
however that one measurement doesn't
[262]
sway the mean that far from zero that is
[267]
to say the means are relatively close to
[270]
each other compared to the raw data this
[273]
is because for a mean to be far from the
[275]
middle most if not all of the raw data
[277]
points would have to be in a single
[280]
cluster that is far away from the middle
[282]
for example the sample of purple points
[285]
all form a cluster that are far from the
[287]
middle this could happen but very rarely
[292]
what's much more likely is to have a
[295]
sample where most of the points are
[297]
close to zero and only one or two are
[300]
far away so far we've shown that you can
[304]
calculate the standard deviations for
[306]
each sample but now that we have three
[309]
means we can also calculate the standard
[312]
deviation of those means because one
[315]
standard deviation will cover 68% of the
[318]
values and two will cover 95% of the
[321]
values the standard deviation of the
[323]
means won't be as wide as the standard
[325]
deviations of the data here we've
[329]
plotted the mean of the means plus or
[331]
minus one standard deviation of the
[333]
means notice that this standard
[336]
deviation is much smaller than the
[338]
standard deviations we got from the
[340]
individual samples the standard
[344]
deviation of the mean is called the
[346]
standard error of the mean or more
[348]
simply the standard error the standard
[351]
error gives us a sense of how much
[353]
variation we can expect in our means if
[356]
we took a bunch of independent five
[357]
measurement samples so to review this is
[361]
how we calculate this
[362]
an error of the mean first you take a
[366]
bunch of samples each with the same
[369]
number of measurements or in in this
[371]
case in equals five the second step is
[376]
to calculate the mean for each sample
[378]
here we calculated the mean and standard
[380]
deviation for each sample but for the
[383]
standard error all we need to do is
[385]
calculate the mean once we've calculated
[389]
the means for each sample we can
[391]
calculate the standard deviation of the
[393]
means in this case the standard error
[396]
equals zero point eight six here we
[401]
notice that the standard error is much
[403]
less than the standard deviations
[404]
because the means aren't as widely
[406]
dispersed as the raw data we've shown
[410]
how to calculate the standard error of
[412]
the mean but there are other standard
[414]
errors for example we can also take the
[418]
standard deviation of the standard
[420]
deviations this is called the standard
[422]
error of the standard deviations which I
[424]
guess is to avoid a tongue-twister it
[427]
tells us how the standard deviations of
[429]
multiple samples are dispersed you can
[432]
calculate the standard deviation of any
[434]
statistic for example the median the
[436]
mode percentiles are anything anything
[439]
that you can calculate for multiple
[441]
samples you just calculate the standard
[444]
deviation and then you have the standard
[446]
error of that so if we calculated many
[448]
mediums we could calculate the standard
[451]
deviation of those mediums and we have
[453]
the standard error of those mediums to
[457]
summarize everything we've talked about
[459]
so far know that the standard error is
[462]
just the standard deviation of multiple
[465]
means taken from the same population so
[468]
if there's a population and we can take
[470]
a bunch of different samples from it all
[473]
we have to do to get the standard error
[474]
is to calculate the standard deviation
[476]
of the means of each sample well at this
[481]
point you might be wondering if we can
[482]
calculate standard errors without
[484]
spending a lot of time and money on
[486]
doing the same experiment a bunch of
[488]
times the good news is the answer is yes
[492]
in rare cases there's a formula you can
[495]
use to estimate
[496]
the standard error for the mean is 1 the
[499]
formula for that is very simple it's
[501]
just the standard deviation divided by
[503]
the square root of the sample size
[504]
however there aren't many other cases
[508]
the good news again is that we can use
[511]
something called bootstrapping for
[513]
everything else
[514]
every time we don't have a simple
[516]
formula we can bootstrap it the nice
[518]
thing about bootstrapping is it's very
[520]
simple conceptually and it's easy to
[522]
make a computer do this work here's a
[526]
bootstrapping example just like before
[529]
we have an experiment where we took 5
[531]
measurements as an aside usually for
[534]
bootstrapping it's good to have 10 or
[537]
more measurements in a single experiment
[539]
now we bootstrap our data with the
[544]
following steps first we pick a random
[547]
measurement from the sample that we just
[549]
took this random measurement isn't a new
[552]
measurement that we haven't taken before
[554]
it's not a new number that we haven't
[556]
seen it's part of the sample that we
[558]
already have now we just write that
[561]
value down in this case it's one point
[565]
four three in step three we just go back
[569]
to step one and pick a new random
[571]
measurement and write that value down
[573]
and we do that five times our second
[578]
measurement is minus one point three
[580]
eight the third measurement is minus
[583]
three point one one our fourth
[587]
measurement is one point four three
[590]
we've already picked that measurement
[592]
before but that's okay when you're
[595]
bootstrapping you just pick five
[597]
measurements from your sample and it
[600]
doesn't matter if you've picked the same
[601]
one before our last measurement is minus
[606]
zero point one zero step four in
[611]
bootstrapping is to calculate the mean
[613]
median mode or whatever the statistic it
[616]
is we're interested in understanding the
[618]
standard error of and we calculate that
[620]
with our sample in this case we're
[623]
interested in the standard error of the
[625]
mean so all we do is calculate the mean
[628]
from our new bootstrap sample the fifth
[631]
step is to go all the way back to the
[633]
beginning step one and repeat that until
[636]
you have a lot of means or medians or
[639]
whatever you're interested in
[640]
calculating the standard error of the
[643]
sixth and final step in the
[645]
bootstrapping procedure is to simply
[647]
calculate the standard deviation of all
[650]
the means that we generated in steps one
[652]
through five that's all there is to it
[655]
in this case we calculated the standard
[657]
error of the mean and we've plotted it
[660]
as a black line in the graph so if
[663]
there's no fancy formula to help us
[665]
calculate the standard error we can do
[667]
it ourselves from scratch we can just
[669]
use bootstrapping and get the job done
[671]
and that's it in this static quest we
[676]
learned that the standard error is a
[677]
measure of how we might expect the means
[680]
from many different samples to vary from
[683]
one sample to another we also learned
[686]
that if we don't have a fancy formula
[688]
for calculating the standard error we
[690]
can do it ourselves using bootstrapping
[693]
okay so tune in next time and we'll talk
[696]
about how to use bootstrapping to
[698]
calculate confidence intervals and
[700]
that's when things get really cool