🔍

11a multicollinearity VIF R2 - YouTube

Channel: unknown

[0]

Today we will talk about

[3]

multi-collinearity.

[7]

Sometimes two variables measure a

[9]

similar concept,

[12]

they are correlated but they're not

[14]

exactly identical.

[17]

But the fact that they're correlated

[19]

makes it hard for the regression to

[20]

disentangle their individual effect on

[23]

the response.

[24]

And what happens as a consequence,

[27]

this difficulty to figure out is it due

[30]

to that variable or this one

[32]

results in confidence intervals in the

[34]

regression

[35]

being very very wide only of the

[38]

correlated variables.

[40]

So it reflects the uncertainty of we

[42]

don't really know

[43]

is the contribution; the beta due to one

[46]

variable or the other.

[50]

More generally, it needn't be two

[52]

variables, it could be

[55]

several variables. One variable may be

[57]

correlated with a linear combination

[60]

of multiple variables.

[63]

And one way to measure the

[67]

degree of collinearity is this concept of

[71]

variation inflation factors VIF.

[76]

Let's talk first about exact

[78]

collinearity

[80]

and there's a formal definition. When you

[84]

have a linear combination of the x

[86]

variables

[88]

that is always zero where you can choose

[91]

the t's here as you want.

[94]

Then you have exact collinearity, the t's of

[97]

course cannot be trivially be all zero.

[102]

We have already seen an example of this

[104]

when we talked about indicator variables.

[108]

So the idea with indicator variables, you

[110]

have to take out

[112]

one of the indicators

[115]

to avoid exact collinearity.

[119]

And in the example of male and female,

[123]

we can make this fit this definition;

[126]

we're looking at the intercept male and

[128]

female

[130]

and we're choosing for female the

[133]

coefficient one

[134]

for male the coefficient one. Remember,

[137]

the intercept is one

[139]

and we choose the coefficient minus 1.

[142]

Right? So t0

[143]

is minus 1. With this

[146]

choice, we have minus 1 plus male

[149]

indicator of male and female. These

[152]

together

[152]

are always 1 and so you have minus 1

[155]

plus 1 equals 0.

[159]

So that was exactly linearity,

[163]

we talk about multicollinearity when the

[166]

linear combination is not

[167]

exactly 0 but approximately zero.

[173]

And equivalently,

[176]

we can say that we have multiple

[179]

linearity

[180]

when one x variable can be predicted

[184]

approximately by a linear combination

[188]

of the x variables. Aha

[191]

linear combination, we are back at

[193]

regression.

[194]

So now we can write this as a regression,

[198]

replacing the approximate sign with an

[201]

equal sign but then adding error

[203]

and as always error is distributed

[206]

normally

[207]

in the usual way.

[212]

So, the variance inflation factor is

[214]

formally defined as

[216]

1 divided by 1 minus R squared this is

[219]

R j squared .With this R j squared is the

[223]

coefficient of determination the

[225]

R squared,

[226]

when regressing the j's x variable x j

[229]

on all other variables

[231]

x variables. It is not the R

[234]

squared; the usual R squared from linear

[237]

regression where you have y

[238]

on x there's no y here anywhere.

[245]

Yeah and here's the same thing, the

[248]

linear regression again where we're

[251]

leaving out

[252]

the j's variable because that's over

[253]

here.

[256]

Question for you, what is the variance

[260]

inflation factor when x j

[262]

is uncorrelated with the other x

[264]

variables?

[265]

Here's the formula, here's some answers

[269]

and I will leave you to answer this

[273]

and move on to the next question.

[277]

Next question is very similar.

[281]

What is the variance inflation factor

[283]

when x j is very highly correlated with

[285]

the other x variables?

[288]

Same answers, same formula

[291]

and so I will leave you with that also.

[296]

And going to a fun fact,

[301]

we've defined the variance inflation

[304]

factor

[305]

through the R squared.

[308]

The R squared of the j's variable

[312]

and it can be shown that

[318]

the variance inflation factor of the j's

[320]

variable is

[321]

the j's diagonal entry of this matrix.

[326]

The matrix being inverse X transpose X

[330]

and if you remember the variance of beta;

[333]

the estimate of beta was sigma squared

[335]

times that same

[337]

matrix. So what does it tell us?

[341]

Well,

[344]

the j's diagonal element

[349]

of this matrix

[352]

is the variance of beta j. Right?

[358]

Well, times sigma squared.

[361]

So we have a direct correspondence here

[364]

that well the width of the confidence

[367]

interval

[368]

is directly related to that

[373]

R squared the R j squared that we saw

[376]

earlier.

[377]

That connection is not obvious but it's

[381]

really cool I think. so I'll leave you

[384]

with that.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage