Simple Linear Regression in R | R Tutorial 5.1 | MarinStatsLectures - YouTube

Channel: MarinStatsLectures-R Programming & Statistics

[0]
hi I'm Mike Marin and in this video
[3]
we'll introduce simple linear regression
[5]
using our simple linear regression is
[9]
useful for examining or modeling the
[12]
relationship between two numeric
[14]
variables well in fact we can fit a
[17]
simple inner aggression using a
[19]
categorical explanatory or X variable
[21]
but we'll save that topic for a later
[23]
video we will be working with the lung
[26]
capacity data that was introduced
[28]
earlier in this series of videos I've
[31]
already gone ahead and imported the data
[33]
into R and attached it we will model the
[36]
relationship between age and lung
[38]
capacity with lung capacity being our
[41]
outcome dependent or Y variable we can
[45]
begin by producing a scatter plot of the
[47]
data plotting age on the x-axis and lung
[50]
capacity on the y axis and we'll add a
[53]
title here we may also want to go ahead
[58]
and calculate Pearson's correlation
[60]
between lung capacity and age we can see
[65]
that there is a positive fairly linear
[67]
association between age and lung
[69]
capacity we can fit a linear regression
[71]
in our using the LM command to access
[75]
the help menu you can type help and in
[77]
brackets the name of the command or
[79]
simply place a question mark in front of
[81]
the command name let's go ahead and fit
[84]
a linear regression to this data and
[85]
save it in the object mo D to do so
[89]
we'll fit a linear model predicting lung
[92]
capacity using the variable age it's
[97]
important to note here that the first
[98]
variable we enter should be our Y
[100]
variable and the second variable the X
[103]
variable we can then ask for a summary
[106]
of this model here we can see that we
[110]
are returned a summary for the residuals
[113]
or errors we can see the estimate of the
[116]
intercept its standard error as well as
[119]
the test statistic and p-value for a
[122]
hypothesis test that the intercept is
[124]
zero it's worth noting that a test if
[127]
the intercept is zero is often not of
[129]
interest we can also see the estimate of
[132]
the slope
[133]
for age its standard error and the test
[136]
statistic and p-value for the hypothesis
[139]
test that the slope equals zero you'll
[142]
also notice that stars are used to
[144]
identify significant coefficients here
[148]
we can see the residual standard error
[150]
of one point five to six which is a
[153]
measure of the variation of observations
[155]
around the regression line this is the
[157]
same as the square root of the mean
[159]
squared error or root MSE we can also
[162]
see the R squared and the adjusted R
[164]
squared as well as the hypothesis test
[167]
and p-value for a test that all the
[170]
coefficients in the model are zero
[173]
recall in earlier videos we saw the
[176]
attributes command here we can ask for
[179]
the attributes for our model and this
[181]
will let us know which particular
[183]
attributes are stored in this object mo
[185]
D we can extract certain attributes
[188]
using the dollar sign for example we may
[192]
want to pull up the coefficients from
[195]
our model it's worth noting that we only
[199]
need to type C OEF here and R will know
[202]
that these are the coefficients we're
[203]
asking for
[204]
we may also extract certain attributes
[207]
in the following way here we'll ask for
[209]
the coefficients of our model now let's
[214]
go ahead and produce that plot we had
[216]
earlier if we would like to add the
[221]
regression line to this plot we can do
[223]
so using the a/b line command here we
[226]
would like to add the line for our
[227]
regression model and as we've seen
[231]
earlier we can add colors for this line
[233]
as well as change the line width using
[236]
these commands it's worth noting that we
[240]
will need to do something slightly
[241]
different to add regression lines for
[243]
multiple linear regressions with
[245]
multiple variables we've already seen
[248]
the C OEF command to get our model
[250]
coefficients we can produce confidence
[253]
intervals for these coefficients using
[255]
the confident command here we'd like a
[258]
confidence interval for our model
[259]
coefficients if we would like to change
[262]
the level of confidence for these we can
[264]
do so using the level argument
[266]
come into command here let's go ahead
[269]
and have 99% confidence intervals you'll
[272]
recall that we can ask for a summary of
[275]
the model using the summary command we
[278]
can also produce the ANOVA table for the
[280]
linear regression model using the ANOVA
[282]
command here we'd like the ANOVA table
[285]
for this model you'll note that this
[288]
ANOVA table corresponds to the F test
[290]
presented in the last row of the linear
[293]
regression summary one final thing to
[296]
note is that the residual standard error
[298]
of one point five to six presented in
[301]
the linear regression summary is the
[302]
same as the square root of the mean
[305]
squared error or mean squared residual
[307]
from the ANOVA table we can see if we
[310]
take the square root of the two point
[312]
three we get the same value as the
[314]
residual standard error the slight
[316]
difference is due to rounding error in
[318]
the next video in this series will
[321]
discuss how to produce some regression
[323]
diagnostic plots to examine the
[325]
regression assumptions these include
[328]
residual plots and QQ plots among a few
[330]
others thanks for watching this video
[333]
and make sure to check out my other
[334]
instructional videos