🔍

The Main Ideas of Fitting a Line to Data (The Main Ideas of Least Squares and Linear Regression.) - YouTube

Channel: StatQuest with Josh Starmer

[2]

When we go on a quest and that quest is really awesome. It's that StatQuest

[10]

Yeah, yeah, yeah

[13]

Hello, and welcome to StatQuest. StatQuest is brought to you by the friendly folks in the genetics Department at the University of North

[21]

Carolina at Chapel Hill.

[23]

Today, we're going to talk about fitting a line to Data. aka

[28]

Least Squares aka

[30]

Linear regression. Now let's get to it.

[34]

Okay, you worked really hard. You did the experiment and now you got some data. Here it is plotted on an XY graph.

[43]

We usually like to add a line to our data so we can see what the trend is.

[48]

But is this the best line we should use?

[52]

Or does this new line fit the data even better?

[56]

Or what about this line is it better or worse than the other options?

[62]

A horizontal line that cuts through the average y value of our data is probably the worst fit of all.

[69]

However, it gives us a good starting point for talking about how to find the optimal line to fit our data.

[77]

So now let's focus on this horizontal line.

[81]

It cuts through the average Y value which is 3.5.

[87]

Let's just call this point B. Because different data sets will have different average values on the Y axis.

[95]

That is to say the Y value for this line is B, and

[101]

for this particular data set B equals 3.5.

[106]

We can measure how well this line fits the data by seeing how close it is to the data points.

[113]

We'll start with the point in the lower left-hand corner of the graph with Coordinates X-One Y-one.

[121]

We can now draw a line from this point up to the line that cuts across the average Y value for this data set.

[130]

The distance between the line and the first data point

[133]

equals B minus

[135]

Y1.

[137]

The distance between the line and the second data point is B minus Y2?

[144]

So far the total distance between the data points and the line is the sum of the two distances and we

[153]

can calculate the distance between the line and the third point that equals B minus Y3.

[161]

Now we've added the third distance to our total sum.

[165]

The distance for the fourth point is B minus Y4.

[170]

Note Y4 is greater than B. Because it's above the horizontal line, so this value will be negative.

[179]

That's no good, since it will subtract from the total and make the overall fit appear better than it really is.

[188]

The fifth data point is even higher relative to the horizontal line this distance is going to be very negative.

[197]

Back in the day when they were first working this out

[200]

they probably tried taking the absolute value of everything and then discovered that it made the math pretty tricky.

[208]

So they ended up squaring each term.

[211]

Squaring ensures that each term is positive.

[216]

Here's the equation that shows the total distance the data points have from the horizontal line.

[222]

In this specific example,

[225]

24.62 is our measure of how well this line fits the data.

[231]

It's called the sum of squared residuals

[234]

because the residuals are the differences between the real data and the line and

[240]

we are summing the square of these values. Now

[244]

let's see how good the fit is if we rotate the line a little bit. In

[249]

this case, the sum of squared residuals

[253]

equals 18.72.

[256]

This is better than before.

[259]

Does this fit improve if we rotate a little more?

[264]

Yes,

[265]

the sum of squared residuals

[267]

now equals 14.05. That value keeps going down the more we rotate the line.

[275]

What if we rotate the line a whole lot?

[279]

Well as you can see the fit gets worse, in this case the sum of squared residuals is

[286]

31.71. so there's a sweet spot in between

[290]

horizontal and two vertical.

[293]

To find that sweet spot

[294]

let's start with the generic line equation.

[297]

This is Y equals AX or A times X plus B. A

[304]

is the slope of the line and B

[307]

is the

[309]

Y-intercept of the line. That's the location on the Y axis that the line crosses when X equals 0.

[318]

We want to find the optimal values for A and B so that we minimize the sum of squared residuals.

[326]

In more general math terms the sum of squared residuals is this complicated mathematical equation.

[334]

But it's actually not that complicated,

[336]

this first part is the value of the line at X1 and

[342]

this second part is the observed value at X1.

[346]

So really all we're doing in this part of the equation is calculating the distance between the line and the observed value.

[354]

So this is no big deal.

[356]

Since we want the line that will give us the smallest sum of squares

[361]

this method for finding the best values for A and B is called least squares.

[368]

If we plotted the sum of squared residuals

[371]

versus each rotation we get something like this, where on the Y axis we have the sum of squared residuals

[378]

and on the X axis we've got each different rotation of the line.

[383]

We see that the sum of squared residuals goes down when we start rotating the line, but that it's possible to rotate the line

[391]

too far in the sum of squared residual starts going back up again.

[396]

How do we find the optimal rotation for the line?

[400]

Well, we take the derivative of this function.

[404]

The derivative tells us the slope of the function at every point.

[408]

The slope at the point on the far left side is pretty steep. As

[414]

we move to the right we see that the slope isn't as steep.

[419]

The slope at the best point where we have the least squares is zero

[425]

after that the slope starts getting steep again.

[429]

Let's go back to that middle point where we have the least squares value and the slope is zero.

[437]

Remember the different rotations are just different values for A the slope and B the intercept.

[445]

We can use a 3D graph to show how different values for the slope and intercept result in different sums of squares.

[454]

In this graph

[455]

the intercept is the Z axis so it's going back sort of deep into your computer screen

[461]

and if we select one value for the intercept.

[465]

For example, assume we set the intercept value to be 3.

[470]

Then we could change values for the slope and see how an intercept of 3

[477]

plus different values for the slope would affect the sum of squared residuals.

[484]

Anyways, we do that for bunches of different intercepts and slopes.

[489]

Taking the derivatives of both the slope and the intercepts tells us where the optimal values are for the best fit.

[497]

Note: no one ever solves this problem by hand, this is done on a computer. So for most people

[504]

It's not essential to know how to take these derivatives.

[508]

However, it's essential to understand the concepts.

[513]

Big important concept number one,

[515]

we want to minimize the square of the distance between the observed values and the line.

[523]

Big important concept number two,

[526]

we do this by taking the derivative and finding where it is equal to zero.

[532]

The final line

[534]

minimizes the sums of squares. It gives the least squares between it and the real data.

[541]

In this case, the line is defined by the following equation

[545]

Y = 0.77 * X + 0.66.

[552]

Hooray, we've made it to the end of another StatQuest.

[556]

Tune in next time for another exciting adventure in statistics land.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage