馃攳
Regression line example | Regression | Probability and Statistics | Khan Academy - YouTube
Channel: Khan Academy
[0]
In the last several videos,
we did some fairly hairy
[3]
mathematics.
[4]
And you might have even
skipped them.
[5]
But we got to a pretty
neat result.
[7]
We got to a formula for the
slope and y-intercept of the
[11]
best fitting regression line
when you measure the error by
[15]
the squared distance
to that line.
[17]
And our formula is, and I'll
just rewrite it here just so
[19]
we have something
neat to look at.
[21]
So the slope of that line is
going to be the mean of x's
[24]
times the mean of the y's minus
the mean of the xy's.
[28]
And don't worry, this seems
really confusing, we're going
[30]
to do an example of this
actually in a few seconds.
[34]
Divided by the mean of x squared
minus the mean of the
[40]
x squareds.
[41]
And if this looks a little
different than what you see in
[43]
your statistics class or your
textbook, you might see this
[45]
swapped around.
[46]
If you multiply both the
numerator and denominator by
[49]
negative 1, you could see this
written as the mean of the
[52]
xy's minus the mean of x times
the mean of the y's.
[56]
All of that over the mean of the
x squareds minus the mean
[61]
of the x's squared.
[63]
These are obviously
the same thing.
[65]
You're just multiplying the
numerator and denominator by
[67]
negative 1, which is same thing
as multiplying the whole
[69]
thing by 1.
[70]
And of course, whatever you get
for m, you can then just
[74]
substitute back in this
to get your b.
[78]
Your b is going to be equal
to the mean of the
[80]
y's minus your m.
[84]
Let me write that in yellow
so it's very clear.
[87]
You solved for the m value.
[88]
Minus m times the
mean of the x's.
[97]
And this is all you need.
[98]
So let's actually put
that into practice.
[102]
So let's say I have three
points, and I'm going to make
[106]
sure that these points
aren't colinear.
[109]
Because, otherwise, it wouldn't
be interesting.
[111]
So let me draw three
points over here.
[115]
Let's say that to one point
is the point 1 comma 2.
[121]
So this 1, 2.
[127]
And then we also have
the point 2 comma 1.
[142]
And then, let's say we also
have the point, let's do
[150]
something a little bit
crazy, 4 comma 3.
[165]
So this is 4, 3.
[167]
So those are our three points.
[168]
And what we want to do is find
it the best fitting regression
[174]
line, which we suspect
is going to look
[176]
something like that.
[181]
We'll see what it actually looks
like using our formulas,
[183]
which we have proven.
[186]
So a good place to start is just
to calculate these things
[189]
ahead of time, and then
to substitute
[190]
them back in the equation.
[191]
So what's the mean of our x's?
[193]
The mean of our x's is going
to be 1 plus 2 plus
[203]
4 divided by 3.
[211]
And what's this going to be?
[213]
1 plus 2 is 3, plus 4
is 7 divided by 3.
[217]
It is equal to 7/3.
[220]
Now, what is the mean
of our y's?
[224]
The mean of our y's is equal
to 2 plus 1 plus 3.
[241]
All of that over 3.
[243]
So this is 2 plus 1 is 3.
[244]
Plus 3 is 6.
[245]
Divided by 3 is equal to 2.
[250]
This is 6 divided by
3 is equal to 2.
[253]
Now, what is the mean
of our xy's?
[266]
So our first xy over
here is 1 times 2.
[271]
Plus 2 times 1 plus 4 times 3.
[281]
And we have three
of these xy's.
[284]
So divided by 3.
[285]
So what's this going
to be equal to?
[287]
We have 2 plus 2, which is 4.
[291]
4 plus 12, which is 16.
[294]
So it's going to be 16/3.
[301]
And then the last one we have
to calculate is the mean of
[306]
the x squareds.
[308]
So what's the mean of
the x squareds?
[309]
The first x squared is just
going to be 1 squared.
[314]
Plus this 2 squared, plus
this 4 squared.
[324]
And we have three data
points again.
[327]
So this is 1 plus
4, which is 5.
[331]
Plus 16.
[335]
Is equal to 21/3, which
is equal to 7.
[339]
So that worked out to a
pretty neat number.
[341]
So let's actually find
our m's and our b's.
[344]
So our slope, our optimal slope
for our regression line,
[349]
the mean of the x's is
going to be 7/3.
[354]
Times the mean of the y's.
[356]
The mean of the y's is 2.
[359]
Minus the mean of the xy's.
[361]
Well, that's 16/3.
[365]
And then, all of that over
the mean of the x's.
[369]
The mean of the x's
is 7/3 squared.
[376]
Minus the mean of
the x squareds.
[379]
So it's going to be minus
this 7 right over here.
[382]
And we just have to do a little
bit of mathematics.
[385]
I'm tempted to get out my
calculator, but i'll resist
[387]
the temptation.
[388]
It's nice to keep things
as fractions.
[390]
Let's see if I can
calculate this.
[394]
This is 14/3 minus 16/3.
[400]
All of that over,
this is 49/9.
[410]
And then minus 7.
[412]
If I wanted to express that as
something over 9, that's the
[414]
same thing as 63/9.
[421]
So in our numerator, we
get negative 2/3.
[426]
And then in our denominator,
what's 49 minus 63?
[430]
That's negative 14/9.
[436]
And this is the same thing
as negative 2/3
[439]
times negative 9/ 14.
[444]
Divide numerator and
denominator by 3.
[446]
Well, the negatives are going
to cancel out first of all.
[448]
You divide by 3.
[449]
That becomes a 1.
[450]
That becomes a 3.
[451]
Divide by 2.
[453]
Becomes a 1.
[454]
That becomes a 7.
[455]
So our slope is 3/7.
[458]
Not too bad.
[459]
Now, we can go back and figure
out our y-intercept.
[462]
So let's figure out our
y-intercept using
[464]
this right over here.
[466]
So our y-intercept, b, is going
to be equal to the mean
[469]
of the y's, the mean of the
y's is 2, minus our slope.
[473]
We just figured out our
slope to be 3/7.
[478]
Times the mean of the
x's, which is 7/3.
[485]
These just are the reciprocal
of each other,
[487]
so they cancel out.
[488]
That just becomes 1.
[489]
So our y-intercept is literally
just 2 minus 1.
[492]
So it equals 1.
[493]
So we have the equation
for our line.
[496]
Our regression line is going
to be y is equal to-- We
[499]
figured out m.
[501]
m is 3/7.
[502]
y is equal to 3/7 x plus,
our y-intercept is 1.
[511]
And we are done.
[515]
So let's actually try
to graph this.
[517]
So our y-intercept
is going to be 1.
[519]
It's going to be right
over there.
[520]
And the slope of our
line is 3/7.
[523]
So for every 7 we
run, we rise 3.
[527]
Or another way to think of
it, for every 3.5 we
[529]
run, we rise 1.5.
[532]
So we're going to go 1.5
right over here.
[535]
So this line, if you were to
graph it, and obviously I'm
[538]
hand drawing it, so it's not
going to be that exact, is
[540]
going to look like that
right over there.
[545]
And it actually won't go
directly through that line.
[549]
So I don't want to give
you that impression.
[552]
So it might look something
like this.
[554]
And this line, we have shown,
that this formula minimizes
[559]
the squared distances
from each of these
[561]
points to that line.
[562]
Anyway, that was, at least
in my mind, pretty neat.
Most Recent Videos:
You can go back to the homepage right here: Homepage





