Regression line example | Regression | Probability and Statistics | Khan Academy - YouTube

Channel: Khan Academy

[0]
In the last several videos, we did some fairly hairy
[3]
mathematics.
[4]
And you might have even skipped them.
[5]
But we got to a pretty neat result.
[7]
We got to a formula for the slope and y-intercept of the
[11]
best fitting regression line when you measure the error by
[15]
the squared distance to that line.
[17]
And our formula is, and I'll just rewrite it here just so
[19]
we have something neat to look at.
[21]
So the slope of that line is going to be the mean of x's
[24]
times the mean of the y's minus the mean of the xy's.
[28]
And don't worry, this seems really confusing, we're going
[30]
to do an example of this actually in a few seconds.
[34]
Divided by the mean of x squared minus the mean of the
[40]
x squareds.
[41]
And if this looks a little different than what you see in
[43]
your statistics class or your textbook, you might see this
[45]
swapped around.
[46]
If you multiply both the numerator and denominator by
[49]
negative 1, you could see this written as the mean of the
[52]
xy's minus the mean of x times the mean of the y's.
[56]
All of that over the mean of the x squareds minus the mean
[61]
of the x's squared.
[63]
These are obviously the same thing.
[65]
You're just multiplying the numerator and denominator by
[67]
negative 1, which is same thing as multiplying the whole
[69]
thing by 1.
[70]
And of course, whatever you get for m, you can then just
[74]
substitute back in this to get your b.
[78]
Your b is going to be equal to the mean of the
[80]
y's minus your m.
[84]
Let me write that in yellow so it's very clear.
[87]
You solved for the m value.
[88]
Minus m times the mean of the x's.
[97]
And this is all you need.
[98]
So let's actually put that into practice.
[102]
So let's say I have three points, and I'm going to make
[106]
sure that these points aren't colinear.
[109]
Because, otherwise, it wouldn't be interesting.
[111]
So let me draw three points over here.
[115]
Let's say that to one point is the point 1 comma 2.
[121]
So this 1, 2.
[127]
And then we also have the point 2 comma 1.
[142]
And then, let's say we also have the point, let's do
[150]
something a little bit crazy, 4 comma 3.
[165]
So this is 4, 3.
[167]
So those are our three points.
[168]
And what we want to do is find it the best fitting regression
[174]
line, which we suspect is going to look
[176]
something like that.
[181]
We'll see what it actually looks like using our formulas,
[183]
which we have proven.
[186]
So a good place to start is just to calculate these things
[189]
ahead of time, and then to substitute
[190]
them back in the equation.
[191]
So what's the mean of our x's?
[193]
The mean of our x's is going to be 1 plus 2 plus
[203]
4 divided by 3.
[211]
And what's this going to be?
[213]
1 plus 2 is 3, plus 4 is 7 divided by 3.
[217]
It is equal to 7/3.
[220]
Now, what is the mean of our y's?
[224]
The mean of our y's is equal to 2 plus 1 plus 3.
[241]
All of that over 3.
[243]
So this is 2 plus 1 is 3.
[244]
Plus 3 is 6.
[245]
Divided by 3 is equal to 2.
[250]
This is 6 divided by 3 is equal to 2.
[253]
Now, what is the mean of our xy's?
[266]
So our first xy over here is 1 times 2.
[271]
Plus 2 times 1 plus 4 times 3.
[281]
And we have three of these xy's.
[284]
So divided by 3.
[285]
So what's this going to be equal to?
[287]
We have 2 plus 2, which is 4.
[291]
4 plus 12, which is 16.
[294]
So it's going to be 16/3.
[301]
And then the last one we have to calculate is the mean of
[306]
the x squareds.
[308]
So what's the mean of the x squareds?
[309]
The first x squared is just going to be 1 squared.
[314]
Plus this 2 squared, plus this 4 squared.
[324]
And we have three data points again.
[327]
So this is 1 plus 4, which is 5.
[331]
Plus 16.
[335]
Is equal to 21/3, which is equal to 7.
[339]
So that worked out to a pretty neat number.
[341]
So let's actually find our m's and our b's.
[344]
So our slope, our optimal slope for our regression line,
[349]
the mean of the x's is going to be 7/3.
[354]
Times the mean of the y's.
[356]
The mean of the y's is 2.
[359]
Minus the mean of the xy's.
[361]
Well, that's 16/3.
[365]
And then, all of that over the mean of the x's.
[369]
The mean of the x's is 7/3 squared.
[376]
Minus the mean of the x squareds.
[379]
So it's going to be minus this 7 right over here.
[382]
And we just have to do a little bit of mathematics.
[385]
I'm tempted to get out my calculator, but i'll resist
[387]
the temptation.
[388]
It's nice to keep things as fractions.
[390]
Let's see if I can calculate this.
[394]
This is 14/3 minus 16/3.
[400]
All of that over, this is 49/9.
[410]
And then minus 7.
[412]
If I wanted to express that as something over 9, that's the
[414]
same thing as 63/9.
[421]
So in our numerator, we get negative 2/3.
[426]
And then in our denominator, what's 49 minus 63?
[430]
That's negative 14/9.
[436]
And this is the same thing as negative 2/3
[439]
times negative 9/ 14.
[444]
Divide numerator and denominator by 3.
[446]
Well, the negatives are going to cancel out first of all.
[448]
You divide by 3.
[449]
That becomes a 1.
[450]
That becomes a 3.
[451]
Divide by 2.
[453]
Becomes a 1.
[454]
That becomes a 7.
[455]
So our slope is 3/7.
[458]
Not too bad.
[459]
Now, we can go back and figure out our y-intercept.
[462]
So let's figure out our y-intercept using
[464]
this right over here.
[466]
So our y-intercept, b, is going to be equal to the mean
[469]
of the y's, the mean of the y's is 2, minus our slope.
[473]
We just figured out our slope to be 3/7.
[478]
Times the mean of the x's, which is 7/3.
[485]
These just are the reciprocal of each other,
[487]
so they cancel out.
[488]
That just becomes 1.
[489]
So our y-intercept is literally just 2 minus 1.
[492]
So it equals 1.
[493]
So we have the equation for our line.
[496]
Our regression line is going to be y is equal to-- We
[499]
figured out m.
[501]
m is 3/7.
[502]
y is equal to 3/7 x plus, our y-intercept is 1.
[511]
And we are done.
[515]
So let's actually try to graph this.
[517]
So our y-intercept is going to be 1.
[519]
It's going to be right over there.
[520]
And the slope of our line is 3/7.
[523]
So for every 7 we run, we rise 3.
[527]
Or another way to think of it, for every 3.5 we
[529]
run, we rise 1.5.
[532]
So we're going to go 1.5 right over here.
[535]
So this line, if you were to graph it, and obviously I'm
[538]
hand drawing it, so it's not going to be that exact, is
[540]
going to look like that right over there.
[545]
And it actually won't go directly through that line.
[549]
So I don't want to give you that impression.
[552]
So it might look something like this.
[554]
And this line, we have shown, that this formula minimizes
[559]
the squared distances from each of these
[561]
points to that line.
[562]
Anyway, that was, at least in my mind, pretty neat.