Week 6, Lecture 12, Part 5: What is Collinearity? - YouTube

Channel: unknown

[0]
now that we've talked about issues of non-constant聽 variance and linearity and what transformations聽聽
[7]
we can use to try to correct for the structure聽 of our data to make sure that our assumptions聽聽
[13]
hold we're now going to cross into a new聽 problem that we may have and we're going to聽聽
[20]
you know ponder whether or not this is an issue聽 when our predictors are strongly correlated with聽聽
[26]
each other or in other words what are we going聽 to do when one predictor is a linear combination聽聽
[33]
of other predictors so an example of this聽 would be if we have an x1 and x2 that are聽聽
[40]
perfectly correlated with each other they聽 are going to be what we're going to call them聽聽
[45]
co-linear so for example we could imagine聽 that x2 is a perfect linear combination of x1聽聽
[53]
such that 5 again is where the聽 relationship would cross the y axis
[64]
when x equals 0 and again the slope of that聽 line would be 0.5 so we can imagine that there聽聽
[72]
could be instances in which our predictors are聽 perfectly co-linear with each other this is going聽聽
[80]
to be an issue that we call multi-colinearity聽 where one or more of our predictors are nearly聽聽
[87]
linear related to the others and if one of the聽 predictors is almost perfectly predicted from聽聽
[94]
the other set of variables then we are also聽 going to have multi-collinearity in the model聽聽
[101]
so for the rest of the lecture we're going to聽 talk about how multi-collinearity could affect our聽聽
[107]
statistical inference as well as our predictions聽 and we're going to talk about how we can detect聽聽
[113]
multi-co-linearity once we have detected it聽 what can we do to resolve any issues that it may聽聽
[120]
have on our actual statistical inference i聽 want to begin by talking about the effects聽聽
[126]
that multi-collinearity can have so聽 the first effect that you can look for聽聽
[131]
is your fitted values your y hats are probably聽 not going to be affected by multicollinearity聽聽
[141]
what is going to be affected is the聽 variability in your beta estimates so the聽聽
[150]
standard errors around your estimated coefficients聽 are going to be artificially larger because聽聽
[156]
we're not as certain as to what partial effect is聽 really driving the true underlying relationship聽聽
[164]
between our covariates and our predictors聽 so when we have a high standard error of聽聽
[172]
our betas that's going to mean that fewer of聽 those estimated coefficients are significant聽聽
[179]
even when a true relationship may actually exist聽 another artifact of multicollinearity is that聽聽
[187]
our estimated coefficients are going to be聽 really sensitive to minor changes in the model聽聽
[193]
so if there are really large differences in your聽 estimated coefficients when you leave one variable聽聽
[200]
out and then include it that is a good indication聽 that there may be some multicollinearity聽聽
[206]
in your model another effect of multicollinearity聽 could be the fact that the sample that you have聽聽
[214]
is not really generalizable to the total overall聽 population and so if you had a new sample you may聽聽
[223]
end up getting a very different model because聽 again we're having a hard time distinguishing聽聽
[231]
exactly what partial effect is really driving聽 the true underlying relationship and what is聽聽
[238]
nice is that any coefficients or covariates聽 that are not multi-collinear with each other聽聽
[246]
they should not be affected by this so if there聽 are coefficients that are widely swinging around聽聽
[253]
that should be an indication that those are the聽 covariates or that are co-linear with each other