馃攳
The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.) - YouTube
Channel: StatQuest with Josh Starmer
[2]
When we go on a quest and that quest is really awesome. It's that StatQuest
[10]
Yeah, yeah, yeah
[13]
Hello, and welcome to StatQuest. StatQuest is brought to you by the friendly folks in the genetics Department at the University of North
[21]
Carolina at Chapel Hill.
[23]
Today, we're going to talk about fitting a line to Data. aka
[28]
Least Squares aka
[30]
Linear regression. Now let's get to it.
[34]
Okay, you worked really hard. You did the experiment and now you got some data. Here it is plotted on an XY graph.
[43]
We usually like to add a line to our data so we can see what the trend is.
[48]
But is this the best line we should use?
[52]
Or does this new line fit the data even better?
[56]
Or what about this line is it better or worse than the other options?
[62]
A horizontal line that cuts through the average y value of our data is probably the worst fit of all.
[69]
However, it gives us a good starting point for talking about how to find the optimal line to fit our data.
[77]
So now let's focus on this horizontal line.
[81]
It cuts through the average Y value which is 3.5.
[87]
Let's just call this point B. Because different data sets will have different average values on the Y axis.
[95]
That is to say the Y value for this line is B, and
[101]
for this particular data set B equals 3.5.
[106]
We can measure how well this line fits the data by seeing how close it is to the data points.
[113]
We'll start with the point in the lower left-hand corner of the graph with Coordinates X-One Y-one.
[121]
We can now draw a line from this point up to the line that cuts across the average Y value for this data set.
[130]
The distance between the line and the first data point
[133]
equals B minus
[135]
Y1.
[137]
The distance between the line and the second data point is B minus Y2?
[144]
So far the total distance between the data points and the line is the sum of the two distances and we
[153]
can calculate the distance between the line and the third point that equals B minus Y3.
[161]
Now we've added the third distance to our total sum.
[165]
The distance for the fourth point is B minus Y4.
[170]
Note Y4 is greater than B. Because it's above the horizontal line, so this value will be negative.
[179]
That's no good, since it will subtract from the total and make the overall fit appear better than it really is.
[188]
The fifth data point is even higher relative to the horizontal line this distance is going to be very negative.
[197]
Back in the day when they were first working this out
[200]
they probably tried taking the absolute value of everything and then discovered that it made the math pretty tricky.
[208]
So they ended up squaring each term.
[211]
Squaring ensures that each term is positive.
[216]
Here's the equation that shows the total distance the data points have from the horizontal line.
[222]
In this specific example,
[225]
24.62 is our measure of how well this line fits the data.
[231]
It's called the sum of squared residuals
[234]
because the residuals are the differences between the real data and the line and
[240]
we are summing the square of these values. Now
[244]
let's see how good the fit is if we rotate the line a little bit. In
[249]
this case, the sum of squared residuals
[253]
equals 18.72.
[256]
This is better than before.
[259]
Does this fit improve if we rotate a little more?
[264]
Yes,
[265]
the sum of squared residuals
[267]
now equals 14.05. That value keeps going down the more we rotate the line.
[275]
What if we rotate the line a whole lot?
[279]
Well as you can see the fit gets worse, in this case the sum of squared residuals is
[286]
31.71. so there's a sweet spot in between
[290]
horizontal and two vertical.
[293]
To find that sweet spot
[294]
let's start with the generic line equation.
[297]
This is Y equals AX or A times X plus B. A
[304]
is the slope of the line and B
[307]
is the
[309]
Y-intercept of the line. That's the location on the Y axis that the line crosses when X equals 0.
[318]
We want to find the optimal values for A and B so that we minimize the sum of squared residuals.
[326]
In more general math terms the sum of squared residuals is this complicated mathematical equation.
[334]
But it's actually not that complicated,
[336]
this first part is the value of the line at X1 and
[342]
this second part is the observed value at X1.
[346]
So really all we're doing in this part of the equation is calculating the distance between the line and the observed value.
[354]
So this is no big deal.
[356]
Since we want the line that will give us the smallest sum of squares
[361]
this method for finding the best values for A and B is called least squares.
[368]
If we plotted the sum of squared residuals
[371]
versus each rotation we get something like this, where on the Y axis we have the sum of squared residuals
[378]
and on the X axis we've got each different rotation of the line.
[383]
We see that the sum of squared residuals goes down when we start rotating the line, but that it's possible to rotate the line
[391]
too far in the sum of squared residual starts going back up again.
[396]
How do we find the optimal rotation for the line?
[400]
Well, we take the derivative of this function.
[404]
The derivative tells us the slope of the function at every point.
[408]
The slope at the point on the far left side is pretty steep. As
[414]
we move to the right we see that the slope isn't as steep.
[419]
The slope at the best point where we have the least squares is zero
[425]
after that the slope starts getting steep again.
[429]
Let's go back to that middle point where we have the least squares value and the slope is zero.
[437]
Remember the different rotations are just different values for A the slope and B the intercept.
[445]
We can use a 3D graph to show how different values for the slope and intercept result in different sums of squares.
[454]
In this graph
[455]
the intercept is the Z axis so it's going back sort of deep into your computer screen
[461]
and if we select one value for the intercept.
[465]
For example, assume we set the intercept value to be 3.
[470]
Then we could change values for the slope and see how an intercept of 3
[477]
plus different values for the slope would affect the sum of squared residuals.
[484]
Anyways, we do that for bunches of different intercepts and slopes.
[489]
Taking the derivatives of both the slope and the intercepts tells us where the optimal values are for the best fit.
[497]
Note: no one ever solves this problem by hand, this is done on a computer. So for most people
[504]
It's not essential to know how to take these derivatives.
[508]
However, it's essential to understand the concepts.
[513]
Big important concept number one,
[515]
we want to minimize the square of the distance between the observed values and the line.
[523]
Big important concept number two,
[526]
we do this by taking the derivative and finding where it is equal to zero.
[532]
The final line
[534]
minimizes the sums of squares. It gives the least squares between it and the real data.
[541]
In this case, the line is defined by the following equation
[545]
Y = 0.77 * X + 0.66.
[552]
Hooray, we've made it to the end of another StatQuest.
[556]
Tune in next time for another exciting adventure in statistics land.
Most Recent Videos:
You can go back to the homepage right here: Homepage





