馃攳
R-squared or coefficient of determination | Regression | Probability and Statistics | Khan Academy - YouTube
Channel: Khan Academy
[0]
In the last few videos, we saw
that if we had n points, each
[5]
of them have x and
y-coordinates.
[7]
Let me draw n of those points.
[9]
So let's call this point one.
[11]
It has coordinates x1, y1.
[16]
You have the second
point over here.
[19]
It had coordinates x2, y2.
[21]
And we keep putting points up
here and eventually we get to
[25]
the nth point.
[28]
That has coordinates xn, yn.
[31]
What we saw is that there is a
line that we can find that
[37]
minimizes the squared
distance.
[42]
This line right here,
I'll call it y, is
[45]
equal to mx plus b.
[49]
There's some line that minimizes
the square distance
[51]
to the points.
[52]
And let me just review what
those squared distances are.
[55]
Sometimes, it's called
the squared error.
[57]
So this is the error between
the line and point one.
[60]
So I'll call that error one.
[62]
This is the error between
the line and point two.
[66]
We'll call this error two.
[68]
This is the error between
the line and point n.
[73]
So if you wanted the total
error, if you want the total
[76]
squared error-- this is actually
how we started off
[79]
this whole discussion-- the
total squared error between
[81]
the points and the line, you
literally just take the y
[88]
value each point.
[90]
So for example, you
would take y1.
[92]
That's this value right over
here, you take y1 minus the y
[98]
value at this point
in the line.
[100]
Well, that point in the line is,
essentially, the y value
[103]
you get when you substitute
x1 into this equation.
[106]
So I'll just substitute
x1 into this equation.
[108]
So minus m x1 plus b.
[113]
This right here, that is the
this y value right over here.
[117]
That is m x1 b.
[120]
I don't want to my get my
graph too cluttered.
[122]
So I'll just delete
that there.
[125]
That is error one right
over there.
[128]
And we want the squared errors
between each of the
[130]
points of the line.
[131]
So that's the first one.
[132]
Then you do the same thing
for the second point.
[135]
And we started our discussion
this way.
[136]
y2 minus m x2 plus b squared,
all the way-- I'll do dot dot
[144]
dot to show that there are a
bunch of these that we have to
[147]
do until we get to the nth
point-- all the way to yn
[150]
minus m xn plus b squared.
[155]
And now that we actually know
how to find these m's and b's,
[161]
I showed you the formula.
[162]
And in fact, we've proved
the formula.
[166]
We can find this line.
[167]
And if we want to say, well,
how much error is there?
[171]
We can then calculate it.
[172]
Because we now know the
m's and the b's.
[173]
So we can calculate it for
certain set of data.
[176]
Now, what I want to do is kind
of come up with a more
[179]
meaningful estimate of how good
this line is fitting the
[186]
data points that we have. And to
do that, we're going to ask
[188]
ourselves the question, what
percentage of the variation in
[205]
y is described by the
variation in x?
[219]
So let's think about this.
[220]
How much of the total variation
in y-- there's
[223]
obviously variation in y.
[224]
This y value is over here.
[226]
This point's y value
is over here.
[227]
There is clearly a bunch
of variation in the y.
[230]
But how much of that is
essentially described by the
[232]
variation in x?
[233]
Or described by the line?
[236]
So let's think about that.
[237]
First, let's think about what
the total variation is.
[239]
How much of the total
variation in y?
[244]
So let's just figure out what
the total variation in y is.
[249]
It's really just a tool
for measuring.
[254]
When we think about variation,
and this is even true when we
[257]
thought about variance, which
was the mean variation in y.
[261]
If you think about the squared
distance from some central
[264]
tendency, and the best central
measure we can have of y is
[267]
the arithmetic mean.
[269]
So we could just say, the total
variation in y is just
[273]
going to be the sum of the
distances of each of the y's.
[278]
So you get y1 minus the mean
of all the y's squared.
[294]
Plus y2 minus the mean of
all the y's squared.
[299]
Plus, and you just keep
going all the way
[301]
to the nth y value.
[303]
To yn minus the mean of
all the y's squared.
[307]
This gives you the total
variation in y.
[309]
You can just take out
all the y values.
[312]
Find their mean.
[313]
It'll be some value, maybe it's
[314]
right over here someplace.
[318]
And so you can even visualize it
the same way we visualized
[321]
the squared error
from the line.
[323]
So if you visualize it, you can
imagine a line that's y is
[327]
equal to the mean of y.
[329]
Which would look
just like that.
[331]
And what we're measuring over
here, this error right over
[334]
here, is the square of this
distance right over here.
[336]
Between this point vertically
and this line.
[340]
The second one is going
to be this distance.
[344]
Just right up to the line.
[345]
And the nth one is going to be
the distance from there all
[348]
the way to the line
right over there.
[349]
And there are these other
points in between.
[352]
This is the total
variation in y.
[354]
Makes sense.
[355]
If you divide this by n, you're
going to get what we
[363]
typically associate as the
variance of y, which is kind
[367]
of the average squared
distance.
[368]
Now, we have the total
squared distance.
[371]
So what we want to do is-- how
much of the total variation in
[375]
y is described by the
variation in x?
[379]
So maybe we can think
of it this way.
[380]
So our denominator, we want what
percentage of the total
[382]
variation in y?
[383]
Let me write it this way.
[385]
Let me call this the squared
error from the average.
[393]
Maybe I'll call this
the squared error
[395]
from the mean of y.
[399]
And this is really the
total variation in y.
[401]
So let's put that as
the denominator.
[405]
The total variation in y, which
is the squared error
[408]
from the mean of the y's.
[413]
Now we want to what percentage
of this is described by the
[417]
variation in x.
[418]
Now, what is not described
by the variation in x?
[421]
We want to how much is
described by the
[423]
variation in x.
[423]
But what if we want how much of
the total variation is not
[440]
described by the regression
line?
[454]
Well, we already have
a measure for that.
[456]
We have the squared
error of the line.
[458]
This tells us the square of the
distances from each point
[461]
to our line.
[462]
So it is exactly this measure.
[464]
It tells us how much of the
total variation is not
[467]
described by the regression
line.
[469]
So if you want to know what
percentage of the total
[472]
variation is not described by
the regression line, it would
[480]
just be the squared error of the
line, because this is the
[485]
total variation not described
by the regression line,
[488]
divided by the total
variation.
[490]
So let me make it clear.
[493]
This, right over here, tells
us what percentage of the
[505]
total variation is not
described by the
[514]
variation in x.
[521]
Or by the regression line.
[529]
So to answer our question, what
percentage is described
[532]
by the variation?
[533]
Well, the rest of it has
to be described by the
[536]
variation in x.
[537]
Because our question is what
percent of the total variation
[539]
is described by the
variation in x.
[541]
This is the percentage that
is not described.
[543]
So if this number is 30%-- if
30% of the variation in y is
[550]
not described by the line, then
the remainder will be
[553]
described by the line.
[555]
So we could essentially just
subtract this from 1.
[558]
So if we take 1 minus the
squared error between our data
[563]
points and the line over the
squared error between the y's
[569]
and the mean y, this actually
tells us what percentage of
[578]
total variation is described
by the line.
[586]
You can either view it's
described by the line or by
[589]
the variation in x.
[598]
And this number right here, this
is called the coefficient
[601]
of determination.
[605]
It's just what statisticians
have decided to name it.
[617]
And it's also called
R-squared.
[619]
You might have even heard that
term when people talk about
[621]
regression.
[622]
Now let's think about it.
[624]
If the squared error of the
line is really small
[633]
what does that mean?
[634]
It means that these
errors, right over
[638]
here, are really small.
[641]
Which means that the line
is a really good fit.
[650]
So let me write it over here.
[651]
If the squared error of the
line is small, it tells us
[657]
that the line is a good fit.
[665]
Now, what would happen
over here?
[666]
Well, if this number is really
small, this is going to be a
[669]
very small fraction over here.
[672]
1 minus a very small fraction
is going to be a
[676]
number close to 1.
[677]
So then, our R-squared will be
close to 1, which tells us
[686]
that a lot of the variation
in y is described by the
[690]
variation in x.
[691]
Which makes sense, because
the line is a good fit.
[694]
You take the opposite case.
[695]
If the squared error of the line
is huge, then that means
[703]
there's a lot of error between
the data points and the line.
[706]
So if this number is huge, then
this number over here is
[708]
going to be huge.
[711]
Or it's going to be a percentage
close to 1.
[713]
And 1 minus that is going
to be close to 0.
[716]
And so if the squared error of
the line is large, this whole
[727]
thing's going to
be close to 1.
[729]
And if this whole thing is
close to 1, the whole
[731]
coefficient of determination,
the whole R-squared, is going
[733]
to be close to 0, which
makes sense.
[740]
That tells us that very little
of the total variation in y is
[743]
described by the variation in
x, or described by the line.
[746]
Well, anyway, everything I've
been dealing with so far has
[748]
been a little bit
in the abstract.
[749]
In the next video, we'll
actually look at some data
[752]
samples and calculate their
regression line.
[755]
And also calculate the
R-squared, and see how good of
[758]
a fit it really is.
Most Recent Videos:
You can go back to the homepage right here: Homepage





