🔍

Multiple Linear Regression - Variance Inflation Factor - Part 1 - YouTube

Channel: unknown

[0]

[Music]

[13]

welcome back to this session on multiple

[15]

regression in the previous session we

[17]

have seen

[18]

how path diagram helps us

[21]

understand the direct and indirect

[23]

effect of an explanatory variable on the

[25]

response variable

[27]

and as we saw the path variables are

[29]

more relevant when the explanatory

[31]

variables are correlated today we are

[34]

going to extend that discussion and talk

[36]

about

[37]

one more

[38]

[Applause]

[40]

quantification of this co-linearity the

[43]

relationship between the explanatory

[45]

variables that particular situation is

[47]

referred to as collinearity or

[48]

multicollinearity

[50]

right and

[52]

we have seen the effect of collinearity

[55]

ah through path diagram today what we

[58]

are going to do is look at what is

[60]

called as ah look at what is called as

[62]

variance inflation factor v i f which is

[66]

also a quantification of collinearity

[69]

amongst the exponential variable so

[71]

first of all

[72]

what is what is variance inflation

[74]

factor variance inflation factor is the

[77]

amount of unique variation in each

[80]

explanatory variable and it essentially

[82]

measures the effect of collinearity

[85]

right

[86]

so pop in particular this v i f for a

[88]

particular explanatory variable is

[90]

calculated as 1 over 1 minus r squared

[94]

where this r squared is not the regular

[96]

r squared of the multiple linear

[97]

regression but this r squared is the

[100]

coefficient of determination

[102]

on a special regression where

[105]

that particular explanatory variable is

[107]

the response variable and all the other

[109]

explanatory variables are the

[111]

explanatory variables what do i mean by

[114]

that so let us say that there is a

[116]

multiple multiple linear regression

[118]

where the explanatory variables are x1

[120]

x2 x3

[122]

[123]

and we are trying to study the impact of

[125]

these explanatory variables on the

[127]

response variable y

[129]

now what will be v i f of

[134]

x 1

[135]

v i f of x 1 will be 1 over 1 minus r 1

[139]

squared what is this r1

[142]

r1 or r1 squared r1 squared is the

[147]

ah is the coefficient of determination

[150]

ah in a regression

[152]

in a regression where

[155]

x1 is the response variable

[159]

x1 is the response variable and

[163]

x2 x3 x4 are your explanatory variables

[173]

ok when we calculate v i f

[177]

of x 2

[179]

we will say it is 1 minus r 2 squared

[182]

and what is this r2 squared r2 squared

[184]

comes from the regression where x2 is

[187]

the response variable

[190]

x2 is the response variable and x1 x3 x4

[196]

they are our explanatory variables

[202]

okay so now what will happen if

[206]

if this r squared is a large value when

[209]

will the r squared be a large value r

[211]

squared will be a large value if this

[213]

regression is significant

[214]

right if this regression is significant

[218]

which means that explanatory variable x1

[221]

[223]

fairly correlated with x2 x3 x4 right if

[226]

that happens then this r squared will be

[228]

a large value if this r squared is a

[230]

large value v i f will be a large value

[234]

right v i f will be a large value so

[236]

this is how you quantify variance

[239]

inflation factor now why is it called

[241]

variance inflation factor

[243]

now if you recall if you recall

[247]

ah we we discussed about the we

[250]

discussed about the partial slope and

[252]

the marginal slope

[254]

now when we discussed about the partial

[256]

slope and the marginal slope what is the

[259]

expression for

[260]

this estimation of the partial slope so

[263]

partial slope partial slope

[266]

we discussed uh that it is beta naught

[269]

plus beta 1 x 1 plus beta 2 x 2 and so

[274]

on so this beta 1

[276]

beta 2 are the partial slopes now from

[279]

the sample of data on which we are going

[281]

to run the regression i am going to get

[284]

an estimate of beta1 which we called as

[286]

[288]

from the s from the data i am going to

[290]

get an estimate of beta2 which is b2 now

[292]

this is only an estimate and therefore

[294]

it is going to have a standard error of

[296]

itself

[298]

right standard error in estimating beta1

[300]

can also be calculated and if you had

[302]

noticed in the excel output this

[304]

standard error is also getting recorded

[307]

so similarly there is going to be a

[309]

standard error in predicting beta2 which

[311]

is called se of b2 right a standard

[313]

error in b2 standard error in b1

[317]

now what what was the expression for

[319]

this standard error

[321]

ah so

[322]

first of all let us let us first of all

[324]

see where is this getting recorded so

[327]

let us go to excel right let us go to

[329]

excel of our

[331]

gpa example that we had discussed last

[333]

time right a gpa example of what we had

[336]

discussed last time where remember we

[338]

were looking at we were looking at

[342]

we were looking at

[344]

cgpa in the mba program as our response

[347]

variable cgpa in the mba program as our

[349]

response variable the scores in the

[352]

entrance examination and scores in the

[354]

interview were our explanatory variables

[357]

you recall that

[358]

regression in the previous session so

[360]

here

[362]

here we had said that the estimate of

[365]

beta1

[366]

is 0.455 and the estimate of beta2 is

[369]

0.622

[371]

now excel

[373]

also reported the standard error in this

[375]

estimation standard error reported was

[378]

0.168 and 0.213

[382]

so this is essentially this value

[385]

this value is essentially standard error

[388]

in estimating b2

[390]

this value is essentially standard error

[393]

in estimating b1

[395]

right

[396]

[397]

how how are these calculated right so

[399]

this is where it is reported if we had

[401]

seen the simple linear regression where

[403]

we are consider where we had considered

[404]

only

[405]

one of the explanatory variables one of

[408]

the explanatory variables

[410]

this was our standard error in

[412]

estimating the beta so this was our se

[415]

[416]

and similarly

[418]

this would have been our

[420]

standard error standard error in

[423]

estimating beta2

[425]

standard error in estimating beta2

[427]

right

[428]

so this is this is what we are talking

[430]

about this is what we mean by standard

[433]

error in estimating the slopes standard

[436]

error in estimating the slopes

[438]

how are the standard errors calculated

[440]

standard error in estimating b1 is

[443]

generally given by

[446]

the standard error

[448]

right standard error which you already

[450]

know what this is this is an estimate of

[452]

sigma of epsilon right uh divided by

[454]

square root of n multiplied by 1 over

[457]

standard deviation in x

[460]

what is standard deviation in x this

[462]

would be standard deviation in x1 right

[464]

[465]

let us go to that variance inflation

[469]

factor okay so uh generally

[472]

the standard error in b1 is estimated as

[475]

standard error of the the standard error

[478]

in the

[479]

error terms

[481]

divided by the square root of n 1 over

[484]

standard deviation of that particular

[486]

explanatory variable so if you are

[487]

estimating standard error in b 1 this

[488]

will be standard

[490]

deviation of x one

[492]

right if you are if you are calculating

[494]

the standard deviation in b two this

[496]

will be standard deviation of x two

[498]

variable

[499]

now we all know y standard deviation in

[501]

x two is in the denominator right if the

[504]

x range is quite large if the x1 range

[508]

is quite large which means that the

[509]

standard deviation of x1 is quite large

[511]

that actually helps me

[514]

understand the variation in y

[516]

and therefore if the standard deviation

[518]

in x is quite large

[520]

the standard error in the corresponding

[523]

beta value will be smaller

[526]

will be smaller and what what is what do

[528]

i mean by uh this standard error value

[531]

being smaller i get very high precision

[534]

in estimating that particular beta value

[537]

right once again take the extreme

[539]

example what if all the x values right

[541]

all the x values all the x values were

[544]

same

[545]

1 1 1 1 1 1 and therefore the standard

[548]

deviation of this will be 0

[551]

right if the standard deviation is 0

[553]

what will happen to the standard error

[555]

of b 1 standard error of b 1 will

[556]

skyrocket

[557]

which means that you will get absolutely

[559]

no precision in estimating that

[561]

particular beta

[563]

okay so this is typically what happens

[566]

in absence of vif

[568]

in absence of vif okay when will when

[571]

will vif be absent

[573]

vif will be absent if the explanatory

[576]

variables are uncorrelated okay but if

[580]

there is vif if this is if there is vif

[584]

the standard error in the estimation of

[586]

v1 gets inflated and how does it get

[588]

inflated it gets inflated by this factor

[592]

it gets inflated by this factor okay so

[595]

with vf the standard error in estimating

[597]

b1 is actually much more standard error

[600]

in estimating b1 is actually much more

[603]

by this much amount by standard

[605]

deviation of vif amount

[607]

now going further

[610]

how how is this vif so if the

[612]

explanatory variables are completely

[614]

uncorrelated

[615]

if the explanatory variables are

[617]

completely uncorrelated then the

[619]

coefficient of determination in that

[620]

spatial regression would be 0

[623]

right would be 0 and if you plug in 0

[625]

here you will get a v i f of 1.

[627]

now v i f of 1 if you plug it in here

[631]

which means that there is no change in

[633]

our

[634]

precision the standard error of b1

[636]

remains pretty much the same if vif is

[639]

actually one when will vif be one vif

[642]

will be one if the correlation

[645]

ah amongst the explanatory variable is

[647]

just absent right

[649]

when we run that special regression

[650]

where one of the explanatory variable is

[652]

made response variable

[654]

if that regression has a r squared of 0

[658]

then the vif will be one however if the

[661]

explanatory variables are correlated

[663]

correlated somehow

[665]

and v i f turns out to be a value more

[667]

than one

[669]

then

[670]

we can say that there is collinearity in

[672]

our model okay there is collinearity in

[675]

our model

[677]

and as we saw larger value of vif ins

[680]

essentially increases the standard error

[682]

in predicting the partial slope and

[684]

therefore it can make our predictions

[687]

very very unreliable

[689]

okay very very unreliable

[692]

now let us look at let us look at our

[694]

data right let us look at our data

[696]

let us go back to the gpa example right

[699]

so this was the

[701]

this was the

[702]

multiple linear regression model this

[704]

was the multiple linear regression model

[706]

and if you recall from the data the

[708]

explanatory variables were correlated

[710]

explanatory variables were correlated

[712]

entrance examination was an explanatory

[714]

variable interview was an explanatory

[716]

variable and there was a 54 coefficient

[719]

of correlation

[720]

uh 0.54 was the coefficient of

[722]

correlation between the two explanatory

[725]

variables

[726]

how does that get reflected that gets

[728]

reflected

[731]

that gets reflected

[732]

by running a regression by running a

[734]

regression

[736]

where you make one of the explanatory

[738]

variables or response variable the other

[740]

explanatory variables remains as an

[742]

explanatory variable and you see that

[744]

the r squared is

[746]

almost 0.3 0.29 is the r squared this is

[750]

the r squared that is going to get used

[752]

in calculating the vif let me say that

[756]

again

[757]

what is this special regression this is

[759]

a special regression there was a simple

[761]

linear regression where we had used

[763]

one of the explanatory variable against

[765]

the response variable right here the

[768]

response variable was the cgpa in

[770]

college and one of the explanatory

[772]

variables was kept in the model

[774]

in the second simple linear regression

[777]

our response variable did not change our

[779]

response variable was cgpa during the

[781]

mba program

[783]

the explanatory variable changed now

[785]

this is a special regression this is a

[787]

special regression where we have made

[789]

one of the explanatory variables where

[790]

one of the original response original

[792]

explanatory variables as a response

[794]

variable

[796]

and the other explanatory variable

[797]

remains as an explanatory variable

[800]

so the r squared reported was 0.29

[804]

and therefore i can calculate the

[805]

variance inflation factor this 0.29 is

[808]

the r squared similarly the other

[810]

regression is also going to report the

[812]

same 0.9 or 0.29

[814]

right

[815]

where

[816]

it doesn't matter whether i have the

[817]

entrance examination as the explanatory

[820]

variable or whether i have the interview

[823]

as an explanatory variable right the vif

[826]

is still going to be 0.29

[829]

and therefore the vif is going to be

[831]

point

[833]

1.41 1.41 okay

[836]

and and the square root of this right

[839]

square root of this

[843]

square root of this

[845]

is going to be the inflation is going to

[848]

be the inflation

[849]

[850]

we can say from this value that there is

[853]

going to be a 18 percent increase

[856]

there is going to be a 18 percent

[858]

increase in the

[860]

standard error of the corresponding beta

[862]

value

[863]

okay

[865]

so uh going back to this expression here

[868]

right corresponding this square root of

[870]

v i f turned out to be 1.18 and

[873]

therefore the standard error in beta 1

[876]

is going to increase by about 18 percent

[878]

similarly the standard error in

[880]

estimating beta 2 is going to increase

[883]

by 18 percent

[885]

okay now fortunately for us in our

[888]

example that we have taken

[890]

for our example the inflation

[893]

because of vif was not much it was only

[896]

18 percent increase okay it was only 18

[899]

percent increase

[901]

however

[902]

however sometimes the vif could be very

[904]

large right

[906]

for us we were little more fortunate we

[909]

were little more fortunate that our r

[912]

was only 0.54 and particularly the r

[916]

squared the r squared was 0.29 right was

[920]

only 0.29 now imagine imagine if this r

[924]

squared was of the range of

[927]

0.7 for example right 0.7 now if this r

[931]

squared was 0.7 right if this r squared

[934]

was 0.7 let us see what what would have

[936]

happened

[938]

okay

[939]

the variance inflation factor would have

[941]

been three point three three okay

[942]

variance inflation factor would have

[944]

been three point three three

[946]

ah therefore uh the square root of that

[949]

one point eight two there there would

[950]

have been eighty two percent increase

[952]

there would have been an 82 percent

[954]

increase in the standard error of b1

[957]

okay what does this do why do i want to

[960]

keep the standard error in estimating

[962]

beta1 to be small in general because

[964]

it's standard error

[966]

anywhere i see standard error in

[967]

regression i want to keep it to the

[969]

minimum

[971]

now

[972]

what will happen if this standard error

[975]

gets inflated which is why we are

[976]

referring to this as vif it is variance

[979]

inflation factor right what if this

[981]

standard error in estimating beta1 gets

[983]

inflated look at the

[986]

look at the multiple linear regression

[988]

model

[989]

now let me let me delete this so that

[991]

this becomes clearer

[992]

okay now if this if the standard error

[995]

terms get inflated right and uh

[998]

for us we were fortunate that uh

[1000]

standard error did not get inflated by

[1002]

82 percent right if the the error in the

[1004]

standard error inflation was quite small

[1007]

only 18 percent

[1009]

if this would have been very high what

[1010]

would happen to the t statistic the t

[1012]

statistic would come down right why

[1015]

would t statistic come down

[1016]

t statistic is how is that how is the t

[1018]

statistic calculated

[1020]

that t statistic is calculated like this

[1024]

okay how is this t statistic calculated

[1026]

this this t statistic is for a null

[1029]

hypothesis that that particular beta

[1032]

value is zero

[1033]

okay and how is this calculated this is

[1035]

calculated as the prediction of

[1037]

predicted value of that beta divided by

[1040]

the standard error of that beta now if

[1042]

this standard error gets inflated

[1044]

because of

[1047]

[1049]

because of v i f

[1051]

now because of v i f let us say the

[1053]

standard error gets inflated this t

[1055]

value is going to reduce

[1057]

okay this t value is going to come down

[1060]

okay and what if this value comes down

[1063]

what if this value comes down right what

[1066]

if this value comes down it may actually

[1068]

impact my p value it may actually impact

[1070]

my p value

[1072]

if this t statistic is very small if

[1075]

this t statistic is very small

[1078]

if this t statistic is very small i may

[1081]

not be able to reject this hypothesis

[1085]

i may not be able to reject this null

[1087]

hypothesis

[1088]

okay

[1089]

what if i am not able to reject this

[1091]

null hypothesis if i am not able to

[1093]

reject this null hypothesis i may end up

[1096]

saying that well i don't know this beta

[1098]

could be zero i cannot say for i cannot

[1100]

say with confidence that this beta is

[1102]

not zero i am not able to reject this

[1104]

null hypothesis

[1105]

okay and what if i am not able to reject

[1107]

this null hypothesis

[1109]

if i am not able to reject this null

[1111]

hypothesis it means that that particular

[1113]

explanatory variable may be

[1115]

statistically insignificant for the

[1117]

regression

[1119]

okay that particular explanatory

[1120]

variable may turn out to be

[1122]

insignificant for the regression

[1125]

okay that is really the extreme case of

[1128]

collinearity that is really the extreme

[1130]

case of collinearity

[1132]

okay i i will discuss

[1134]

another example for this i will discuss

[1136]

another example for this where i will

[1138]

demonstrate an extreme uh case

[1140]

but

[1141]

coming back to this

[1143]

uh i don't want this explanatory

[1145]

i don't want this explanatory variable

[1147]

to be insignificant in my regression

[1150]

therefore i don't want this t value to

[1152]

be small

[1153]

therefore i don't want this standard

[1155]

error to be a large value

[1157]

if i don't want this standard error to

[1159]

be a large value i better make sure that

[1162]

the vif is in control and the only way

[1165]

to make vif in control is to ensure that

[1168]

the explanatory variables

[1170]

don't have too much correlation

[1173]

okay don't have too much correlation

[1176]

is that point understood

[1178]

okay

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage