Root mean square deviation (RMSD) - YouTube

Channel: Khan Academy

[0]
so we are interested in studying the
[1]
relationship between the amount that
[4]
folks study for a test and their score
[6]
on a test where the score is between
[8]
zero and six and so what we're going to
[11]
do is go look at the people who took the
[13]
tests we're going to plot for each
[15]
person the amount that they studied and
[18]
their score so for example this data
[19]
point is someone who studied an hour and
[21]
they got a one on the test and then
[23]
we're going to fit a regression line and
[25]
this blue regression line is the actual
[28]
regression line for these four data
[30]
points and here is the equation for that
[33]
regression line now there's a couple of
[35]
things to keep in mind normally when
[37]
you're doing this type of analysis you
[39]
do it with far more than four data
[42]
points the reason why i kept this to
[44]
four is because we're actually going to
[46]
calculate how good a fit this regression
[49]
line is by hand and typically you would
[52]
not do it by hand we have computers for
[54]
that
[55]
now the way that we're going to measure
[57]
how good a fit this regression line is
[59]
to the data has several names
[63]
one name is the standard deviation of
[67]
the residuals another name is the root
[70]
mean square deviation sometimes
[72]
abbreviated rmsd sometimes it's called
[75]
root mean square error so what we're
[78]
going to do is is for every point we're
[80]
going to calculate the residual and then
[83]
we're going to square it and then we're
[84]
going to add up the sum of those squared
[87]
residuals
[88]
so we're going to take the sum
[90]
of the residuals
[93]
residuals
[95]
squared
[96]
and then we're going to divide that by
[98]
the number of data points we have minus
[101]
two and we can talk in future videos or
[104]
a more advanced statistics class of why
[105]
you divide by two
[107]
but it's related to the idea that what
[109]
we're calculating here is a statistic
[111]
and we're trying to estimate a true
[113]
parameter as best as possible and n
[115]
minus 2 actually does the trick for us
[117]
but to calculate the root mean squared
[119]
deviation we would then take a square
[121]
root of this
[122]
and some of you might recognize strong
[125]
parallels between this and how we
[127]
calculated sample standard deviation
[130]
early in our statistics career and i
[132]
encourage you to think about it but
[133]
let's actually calculate it by hand as i
[136]
mentioned earlier in this video to see
[138]
how things actually play out so to do
[140]
that i'm going to give ourselves a
[142]
little table here so let's say that is
[145]
our x value in that column
[148]
let's make this our y value
[152]
let's make this
[153]
y hat which is going to be equal to 2.5
[157]
x minus 2
[159]
and then let's make this the residual
[162]
squared which is going to be
[164]
our y value minus our y hat value our
[167]
actual minus our estimate for that given
[169]
x
[170]
squared and then we're going to sum them
[173]
all up divide by n minus 2 and take the
[175]
square root
[176]
so first let's do this data point so
[178]
that's the point 1 comma 1
[181]
1 comma 1. now what is the estimate from
[184]
our regression line well for that x
[186]
value when x is equal to 1 it's going to
[187]
be 2.5 times 1 minus 2. so it's going to
[190]
be 2.5 times 1 minus 2 which is equal to
[196]
0.5
[198]
and so our residual squared is going to
[200]
be 1 minus 0.5 1 minus 0.5
[205]
squared which is equal to that's going
[207]
to be 0.5 squared which is going to be
[210]
0.25
[212]
all right let's do the next data point
[214]
we have this one right over here it is 2
[217]
comma 2.
[219]
now our estimate from the regression
[221]
line when x is equal to when x equals 2
[224]
is going to be equal to 2.5
[227]
times our x value times 2
[230]
minus 2 which is going to be equal to
[233]
3.
[235]
and so our residual squared is going to
[237]
be 2
[238]
minus 3
[240]
to
[241]
minus 3 squared which is negative 1
[244]
squared which is going to be equal to 1.
[247]
then we can go to this point so that's
[250]
the point 2 comma 3 2 comma 3.
[253]
now our estimate from our regression
[256]
line is going to be 2.5
[258]
times our x value times 2
[261]
minus 2
[263]
which is going to be equal to
[266]
3.
[267]
and so our residual here is going to be
[269]
zero and you can see that that point
[270]
sits on the regression line so it's
[272]
going to be 3 minus 3 3 minus 3 squared
[276]
which is equal to 0. and then last but
[279]
not least we have this point right over
[280]
here when x is 3
[284]
our y value this person studied 3 hours
[287]
and they got a 6 on the test
[289]
so y is equal to 6 and so our estimate
[292]
from the regression line you could say
[294]
what you would have expected to get
[296]
based on that regression line is 2.5
[299]
times our x value times 3 minus 2 is
[303]
equal to
[304]
5.5
[306]
and so our residual squared is 6 minus
[309]
5.5 squared minus 5.5
[313]
squared so it's 0.5 squared which is
[315]
0.25
[317]
so now the next step let me take the sum
[320]
of all of these squared residuals
[322]
so this is let me just write it this way
[324]
actually let me just do it like this
[326]
so the sum
[328]
of the residuals
[331]
residuals squared is equal to
[335]
if i just sum all of this up it's going
[337]
to be point it's going to be 1.5
[342]
1.5 and then if i divide that by n minus
[345]
2 so if i divide by n minus 2 that's
[348]
going to be equal to i have four data
[350]
points so i'm going to divide by 4 minus
[352]
2 so i'm going to divide by 2
[354]
and then i'm going to want to take the
[356]
square root of that then we want to take
[358]
the square root of that and so this is
[360]
going to get us 1.5 over 2 is the same
[363]
thing as 3 4. so it's the square root of
[366]
three-fourths or the square root of
[368]
three
[368]
over two and you could use a calculator
[371]
to figure what that is to figure out
[372]
what that is as a decimal but this gives
[375]
us a sense of how good a fit this
[377]
regression line is the closer this is to
[379]
zero the better the fit of the
[381]
regression line the further away from
[384]
zero the worse fit and what would be the
[386]
units for the root mean square deviation
[388]
well it would be in terms of whatever
[391]
your units are for your y-axis in this
[394]
case it would be the score on the test
[397]
and that's one of the other values of
[399]
this calculation of taking the square
[401]
root of the sum of the squares of the
[403]
residuals dividing by n minus 2.