Multiple Linear Regression in R | R Tutorial 5.3 | MarinStatsLectures - YouTube

Channel: MarinStatsLectures-R Programming & Statistics

[0]
hi I'm Mike Marin and in this video
[3]
we'll introduce multiple linear
[5]
regression multiple linear regression is
[8]
useful for modeling the relationship
[10]
between a numeric outcome dependent or Y
[13]
variable and multiple explanatory
[16]
independent or X variables we will be
[19]
working with the lung capacity data that
[21]
was introduced earlier in this series of
[23]
videos I've already gone ahead and
[26]
imported the data into R and attached it
[28]
our outcome variable will be lung
[31]
capacity to fit our linear model we will
[34]
be using the LM commit you can access
[37]
the help menu by typing help and the
[39]
name of the command in brackets or by
[42]
typing the command name directly into
[44]
the help search window
[48]
we will be working with scripts in our
[50]
you can see I have one prepared here
[53]
having worked through my series of
[55]
videos you should be a bit more
[57]
comfortable using our at this point to
[59]
learn more about writing or saving
[61]
scripts in our see my video in series
[64]
one on writing scripts in our first
[67]
let's fit a linear regression model
[69]
using age and height as our explanatory
[72]
or X variables and let's save this in an
[74]
object called model one will submit this
[77]
command here and now let's ask for a
[80]
summary of this model here we can see
[84]
the r-squared of zero point eight four
[87]
three approximately 84% of variation in
[91]
lung capacity can be explained by our
[93]
model that is can be explained by age
[95]
and height here we can see the F
[98]
statistic and p-value for an overall
[101]
test of significance of our model this
[104]
tests the null hypothesis that all of
[106]
the model coefficients are 0 in our
[108]
example here a test specifically that
[111]
the slope for age and height are zero
[114]
here we can see the residual standard
[117]
error this gives us an idea of how far
[120]
observed lung capacities or Y values are
[123]
from the predicted or fitted lung
[125]
capacity the Y hats this gives us an
[128]
idea of the typical sized residual or
[130]
error the intercept of negative eleven
[134]
point seven four seven is the estimated
[137]
mean Y value when all X's are zero this
[141]
would be the estimated mean lung
[143]
capacity for someone of H and height
[145]
zero you'll notice that this doesn't
[147]
have a very meaningful interpretation to
[150]
give the intercept a better
[151]
interpretation we can Center age and
[153]
height this is a topic we'll discuss in
[156]
following videos we can see that the
[159]
slope for age is zero point 1 to 6 this
[162]
is the effect of age on lung capacity
[165]
adjusting or controlling for height we
[168]
associate an increase of 1 year in age
[170]
with an increase of 0.1 to 6 in lung
[173]
capacity adjusting or controlling for
[176]
the height we can also see the
[178]
hypothesis test that the slope equals 0
[180]
here
[181]
the slope for height is 0.278 this is
[186]
the estimated effect of height on lung
[188]
capacity adjusting for age we can see
[191]
the test for the hypothesis that the
[193]
slope for height is zero here now let's
[197]
go ahead and calculate Pearson's
[198]
correlation between age and height we
[202]
can see that age and height are very
[204]
highly correlated the collinearity
[206]
between age and height means that we
[209]
should not directly interpret the slopes
[211]
say the slope of age as the effective
[213]
age on lung capacity adjusting for
[215]
height this high correlation between age
[218]
and height suggests that these two
[220]
effects are somewhat founded together
[222]
dealing with collinearity is a topic
[225]
we'll discuss in later videos and
[228]
finally as we've seen in earlier videos
[231]
we can create a confidence interval for
[233]
the model coefficients using the comp
[235]
int command let's go ahead and take a
[237]
look at a confidence interval for our
[239]
model coefficients we have an estimated
[242]
slope for age of zero point 1 to 6 where
[246]
95% confident the true slope is between
[249]
0.09 and 0.16 let's go ahead and fit a
[254]
linear model using all of our X
[256]
variables we'll submit this command here
[259]
and now we can ask for a summary of our
[262]
model we can check the model assumptions
[266]
by examining plots of the residuals or
[269]
errors to do so we can use the plot
[272]
model command taking a look at these
[275]
plots here we can see the relationship
[278]
between age height and lung capacity is
[281]
approximately linear the variation looks
[283]
constant lung capacity given age and
[287]
height is approximately normal to learn
[291]
more about producing and examining these
[293]
residual plots you can see my earlier
[295]
video on examining model assumptions in
[298]
linear regression in the next video in
[301]
this series we'll talk more about linear
[303]
regression and specifically we will
[306]
focus our discussion on the inclusion of
[308]
categorical variables or factors in a
[310]
linear model thanks for watching this
[313]
video and make sure to check out
[315]
other instructional videos