R-squared or coefficient of determination | Regression | Probability and Statistics | Khan Academy - YouTube

Channel: Khan Academy

[0]
In the last few videos, we saw that if we had n points, each
[5]
of them have x and y-coordinates.
[7]
Let me draw n of those points.
[9]
So let's call this point one.
[11]
It has coordinates x1, y1.
[16]
You have the second point over here.
[19]
It had coordinates x2, y2.
[21]
And we keep putting points up here and eventually we get to
[25]
the nth point.
[28]
That has coordinates xn, yn.
[31]
What we saw is that there is a line that we can find that
[37]
minimizes the squared distance.
[42]
This line right here, I'll call it y, is
[45]
equal to mx plus b.
[49]
There's some line that minimizes the square distance
[51]
to the points.
[52]
And let me just review what those squared distances are.
[55]
Sometimes, it's called the squared error.
[57]
So this is the error between the line and point one.
[60]
So I'll call that error one.
[62]
This is the error between the line and point two.
[66]
We'll call this error two.
[68]
This is the error between the line and point n.
[73]
So if you wanted the total error, if you want the total
[76]
squared error-- this is actually how we started off
[79]
this whole discussion-- the total squared error between
[81]
the points and the line, you literally just take the y
[88]
value each point.
[90]
So for example, you would take y1.
[92]
That's this value right over here, you take y1 minus the y
[98]
value at this point in the line.
[100]
Well, that point in the line is, essentially, the y value
[103]
you get when you substitute x1 into this equation.
[106]
So I'll just substitute x1 into this equation.
[108]
So minus m x1 plus b.
[113]
This right here, that is the this y value right over here.
[117]
That is m x1 b.
[120]
I don't want to my get my graph too cluttered.
[122]
So I'll just delete that there.
[125]
That is error one right over there.
[128]
And we want the squared errors between each of the
[130]
points of the line.
[131]
So that's the first one.
[132]
Then you do the same thing for the second point.
[135]
And we started our discussion this way.
[136]
y2 minus m x2 plus b squared, all the way-- I'll do dot dot
[144]
dot to show that there are a bunch of these that we have to
[147]
do until we get to the nth point-- all the way to yn
[150]
minus m xn plus b squared.
[155]
And now that we actually know how to find these m's and b's,
[161]
I showed you the formula.
[162]
And in fact, we've proved the formula.
[166]
We can find this line.
[167]
And if we want to say, well, how much error is there?
[171]
We can then calculate it.
[172]
Because we now know the m's and the b's.
[173]
So we can calculate it for certain set of data.
[176]
Now, what I want to do is kind of come up with a more
[179]
meaningful estimate of how good this line is fitting the
[186]
data points that we have. And to do that, we're going to ask
[188]
ourselves the question, what percentage of the variation in
[205]
y is described by the variation in x?
[219]
So let's think about this.
[220]
How much of the total variation in y-- there's
[223]
obviously variation in y.
[224]
This y value is over here.
[226]
This point's y value is over here.
[227]
There is clearly a bunch of variation in the y.
[230]
But how much of that is essentially described by the
[232]
variation in x?
[233]
Or described by the line?
[236]
So let's think about that.
[237]
First, let's think about what the total variation is.
[239]
How much of the total variation in y?
[244]
So let's just figure out what the total variation in y is.
[249]
It's really just a tool for measuring.
[254]
When we think about variation, and this is even true when we
[257]
thought about variance, which was the mean variation in y.
[261]
If you think about the squared distance from some central
[264]
tendency, and the best central measure we can have of y is
[267]
the arithmetic mean.
[269]
So we could just say, the total variation in y is just
[273]
going to be the sum of the distances of each of the y's.
[278]
So you get y1 minus the mean of all the y's squared.
[294]
Plus y2 minus the mean of all the y's squared.
[299]
Plus, and you just keep going all the way
[301]
to the nth y value.
[303]
To yn minus the mean of all the y's squared.
[307]
This gives you the total variation in y.
[309]
You can just take out all the y values.
[312]
Find their mean.
[313]
It'll be some value, maybe it's
[314]
right over here someplace.
[318]
And so you can even visualize it the same way we visualized
[321]
the squared error from the line.
[323]
So if you visualize it, you can imagine a line that's y is
[327]
equal to the mean of y.
[329]
Which would look just like that.
[331]
And what we're measuring over here, this error right over
[334]
here, is the square of this distance right over here.
[336]
Between this point vertically and this line.
[340]
The second one is going to be this distance.
[344]
Just right up to the line.
[345]
And the nth one is going to be the distance from there all
[348]
the way to the line right over there.
[349]
And there are these other points in between.
[352]
This is the total variation in y.
[354]
Makes sense.
[355]
If you divide this by n, you're going to get what we
[363]
typically associate as the variance of y, which is kind
[367]
of the average squared distance.
[368]
Now, we have the total squared distance.
[371]
So what we want to do is-- how much of the total variation in
[375]
y is described by the variation in x?
[379]
So maybe we can think of it this way.
[380]
So our denominator, we want what percentage of the total
[382]
variation in y?
[383]
Let me write it this way.
[385]
Let me call this the squared error from the average.
[393]
Maybe I'll call this the squared error
[395]
from the mean of y.
[399]
And this is really the total variation in y.
[401]
So let's put that as the denominator.
[405]
The total variation in y, which is the squared error
[408]
from the mean of the y's.
[413]
Now we want to what percentage of this is described by the
[417]
variation in x.
[418]
Now, what is not described by the variation in x?
[421]
We want to how much is described by the
[423]
variation in x.
[423]
But what if we want how much of the total variation is not
[440]
described by the regression line?
[454]
Well, we already have a measure for that.
[456]
We have the squared error of the line.
[458]
This tells us the square of the distances from each point
[461]
to our line.
[462]
So it is exactly this measure.
[464]
It tells us how much of the total variation is not
[467]
described by the regression line.
[469]
So if you want to know what percentage of the total
[472]
variation is not described by the regression line, it would
[480]
just be the squared error of the line, because this is the
[485]
total variation not described by the regression line,
[488]
divided by the total variation.
[490]
So let me make it clear.
[493]
This, right over here, tells us what percentage of the
[505]
total variation is not described by the
[514]
variation in x.
[521]
Or by the regression line.
[529]
So to answer our question, what percentage is described
[532]
by the variation?
[533]
Well, the rest of it has to be described by the
[536]
variation in x.
[537]
Because our question is what percent of the total variation
[539]
is described by the variation in x.
[541]
This is the percentage that is not described.
[543]
So if this number is 30%-- if 30% of the variation in y is
[550]
not described by the line, then the remainder will be
[553]
described by the line.
[555]
So we could essentially just subtract this from 1.
[558]
So if we take 1 minus the squared error between our data
[563]
points and the line over the squared error between the y's
[569]
and the mean y, this actually tells us what percentage of
[578]
total variation is described by the line.
[586]
You can either view it's described by the line or by
[589]
the variation in x.
[598]
And this number right here, this is called the coefficient
[601]
of determination.
[605]
It's just what statisticians have decided to name it.
[617]
And it's also called R-squared.
[619]
You might have even heard that term when people talk about
[621]
regression.
[622]
Now let's think about it.
[624]
If the squared error of the line is really small
[633]
what does that mean?
[634]
It means that these errors, right over
[638]
here, are really small.
[641]
Which means that the line is a really good fit.
[650]
So let me write it over here.
[651]
If the squared error of the line is small, it tells us
[657]
that the line is a good fit.
[665]
Now, what would happen over here?
[666]
Well, if this number is really small, this is going to be a
[669]
very small fraction over here.
[672]
1 minus a very small fraction is going to be a
[676]
number close to 1.
[677]
So then, our R-squared will be close to 1, which tells us
[686]
that a lot of the variation in y is described by the
[690]
variation in x.
[691]
Which makes sense, because the line is a good fit.
[694]
You take the opposite case.
[695]
If the squared error of the line is huge, then that means
[703]
there's a lot of error between the data points and the line.
[706]
So if this number is huge, then this number over here is
[708]
going to be huge.
[711]
Or it's going to be a percentage close to 1.
[713]
And 1 minus that is going to be close to 0.
[716]
And so if the squared error of the line is large, this whole
[727]
thing's going to be close to 1.
[729]
And if this whole thing is close to 1, the whole
[731]
coefficient of determination, the whole R-squared, is going
[733]
to be close to 0, which makes sense.
[740]
That tells us that very little of the total variation in y is
[743]
described by the variation in x, or described by the line.
[746]
Well, anyway, everything I've been dealing with so far has
[748]
been a little bit in the abstract.
[749]
In the next video, we'll actually look at some data
[752]
samples and calculate their regression line.
[755]
And also calculate the R-squared, and see how good of
[758]
a fit it really is.