🔍

Polynomial Regression in R | R Tutorial 5.12 | MarinStatsLectures - YouTube

Channel: MarinStatsLectures-R Programming & Statistics

[0]

Hi, I'm Mike Marin and in this video we'll discuss the idea of polynomial

[5]

regression and how to fit and assess these models in R. Polynomial

[10]

regression is a special case of linear regression where the relationship

[14]

between x and y is modeled using a polynomial rather than a line. It can be

[20]

used when the relationship between x and y is nonlinear although this is still

[24]

considered to be a special case of multiple linear regression; we will be

[29]

working with a different version of the lung capacity data as was used in other

[33]

videos. A link to download this data can be found in the video description below.

[37]

You can also download the R script used in this video there as well. I've already

[42]

imported the data into RStudio and attached it. As an example we'll model

[48]

the relationship between lung capacity and height. Let's begin by looking at a

[52]

scatter plot and let's fit a simple linear regression model and look at a

[56]

summary of that model. Here we'll model the relationship between lung capacity

[60]

and height, we can take note that the r-squared is about 75% and the residual

[66]

standard error is 1.292; we can also add the line for this model

[71]

to the plot using the abline command (function); visually we can see that the

[76]

relationship between lung capacity and height looks a bit curved or non linear;

[80]

a reminder that we can also use residual plots to help with assessing linearity

[86]

and checking model assumptions. for a more thorough discussion of this topic

[90]

you can refer to one of our earlier videos on checking assumptions for

[94]

linear regression. There are many approaches to dealing with

[97]

nonlinearities, one we will discuss in this video is including polynomial terms

[102]

in our model we will start with including height squared in our model

[106]

first let's take a look at the wrong way to do this while it may seem like

[111]

including height squared directly in the model call will work, R will not include

[115]

height squared in the model if entered this way let's take a look at that here

[120]

we can see if we enter height squared directly in the model call, in the model

[125]

summary you can see height squared is left out, R has just ignored height

[129]

squared. It is important to take note of this because R does not give us a

[133]

warning or an error message with this. Now let's take a look at the right way

[138]

to do this: to do so we'll use a capital I and then include height squared within

[143]

the parentheses and if we ask for a summary of the model we can now see that

[149]

height squared has been included in the model we'll get to talking about this

[154]

model in a moment but first let's take a look at a few other ways to get the same

[158]

result instead of using the approach we just

[160]

showed we could instead first create a new variable called height squared and

[165]

then include this variable in the model here we can see we're creating this

[169]

height square variable and including that in our model and if we ask for the

[173]

summary you can see that this produces the exact same model and results as the

[178]

previous set of commands, we could also make use of the Poly command (function) in R,

[183]

here we would let R know that we would like to include polynomial terms for the

[188]

height variable and we set the degree argument to the degree of polynomial

[192]

we'd like, in this case setting the degree equal 2 will include height and

[197]

height squared and if instead of setting the raw argument to TRUE we set it to

[202]

FALSE, R would fit a model that used orthogonal polynomials so let's fit that

[208]

model and again you can take the time to verify that this produces the exact same

[215]

results as the earlier two ways we looked at so let's just give ourselves a

[219]

quick reminder of the model that we fit we can see here that with this

[225]

polynomial including height squared the R squared is about 77% and the residual

[231]

standard error is 1.238, you may recall that for the model

[236]

with only height the R squared was about 75% and the residual standard error was

[241]

1.292, it looks like height squared may be improving the

[246]

model let's take a look at this visually we can add the polynomial model to the

[251]

plot using the lines command, so let's add this line using a thick blue line

[259]

subjectively it looks like the model that includes height squared may provide

[263]

a better fit to the data than the model that does not include height squared

[267]

let's compare these two models formally using the partial F test. for a more

[272]

detailed discussion of the partial F test you can see one of our earlier

[275]

videos where we discuss this test; this test has a null hypothesis that there is

[280]

no significant difference between the two models and an alternative hypothesis

[284]

that's a full model, the model that includes height squared, is significantly

[288]

better we can run the test in R, using the ANOVA command we can see that with

[295]

such a small p-value we will reject the null hypothesis and conclude that we

[299]

have evidence to believe the model including height squared provides a

[303]

statistically significantly better fit than the model without height squared

[307]

most often we won't want to include polynomial terms much beyond x squared

[312]

or x cubed let's explore a model that includes X cubed as well and it's worth

[317]

noting that you must always include all lower order terms in a model. If we

[322]

include height cubed we must also include height squared and height in the

[325]

model let's fit this model including height squared and height cubed and

[330]

let's ask for a summary we can add this model to the plot using the lines

[336]

command and we will do this as a thick green dashed line let's also add a

[343]

legend to the plot to help remind ourselves which model is which we can

[350]

see visually that there's almost no difference between the model that

[354]

includes height cubed and the model that does not as before we can use the

[358]

partial F test to help us decide if height cubed improves the model here

[363]

when we conduct the partial F test we can see that the p-value is large and

[368]

there is not a statistically significant difference in the models we can decide

[372]

that including height cubed is not necessary in our model before finishing

[377]

off we should add a note that there are other approaches to dealing with

[381]

nonlinearities some of which include transforming the x or y variable

[386]

converting x to a categorical variable or factor or using nonlinear regression

[391]

methods instead. All of these different approaches have their pros and cons!

[395]

thanks for watching this video make sure to subscribe to marinstatslectures, like us on

[400]

Facebook visit our website (statslectures.com)

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage