馃攳
Variance Inflation Factor (VIF) for Detecting Multicolinearity in Python - YouTube
Channel: Bhavesh Bhatt
[4]
Hey Hi,
Collinearity is a situation in
[7]
which 2 features are heavily correlated with each other.
[11]
You can plot a heatmap & decide which 2
features are correlated to each other and
[16]
keep one of them.
[17]
However multicolinearity is a more difficult
problem to solve
[21]
wherein multiple features
[23]
may be correlated to 1 particular feature
& thus removing multicollinearity in case
[28]
of linear regression, is a difficult challenge.
[31]
In this video, I talk about Variance Inflation Factor
[35]
or VIF in short which deals with multicolinearity.
[40]
So stay tuned.
[42]
Let's start by importing the necessary modules.
[45]
The whole purpose of this
[46]
exercise is to remove multicollinearity
for a linear regression data set.
[50]
I have chosen the Boston House prediction data set.
[53]
and this is what is given the description of the data set
[59]
I save all my feature data
into a variable called as X.
[63]
I save all my target data into a variable called as Y
[67]
and I save all the column names into a list called
[70]
as names.
[71]
I create a data frame just out of the feature columns.
[76]
So for detecting multicolinearity, I don't
need the target columns, I just need the
[80]
feature columns.
[81]
So this is how my input data frame looks like
I just have columns which are from my futures.
[88]
So the idea of variance inflation factor or
VIF is I take one column in one iteration
[95]
as my target variable and the remaining columns
as my feature variables.I fit a linear regression
[102]
model on it.
[103]
Calculate the R square value and I take the
inverse of one minus the R squared value and
[109]
I calculate the VIF which is
variance inflation factor.
[112]
So in my first iteration, CRIM column would
be my target variable.
[117]
& the rest of the columns would be my features.
[120]
I will fit a linear regression model
& calculate the VIF score.
[125]
So your VIF is nothing else,
but the inverse of 1 - Rsquare score of the model that you fit
[132]
That is what I'm doing here
[134]
So I run through all the columns.
[137]
I select Y as that particular column and it
appears in my list and X when it does not
[142]
appear in my list.
[143]
So here in my first tension, my Y will only
contain the CRIM column and X will contain
[150]
all the remaining columns.
I fit a simple linear regression module.
[154]
& I calculate the Rsquared score.
[156]
Later the inverse of one minus R square gives
me the VIF value.
[162]
Then I run this this is what I get.
[166]
The RSquare value of CRIM column is 0.52
keeping all other columns as features
[171]
And the VIF is 2.1
[174]
The higher the VIF, the more are the chances of it being removed
[179]
In my linear regression model
[181]
For example.
[182]
I have my VIF factor of RM column to be equal
to 77.95 or close to 78 whereas the R squared
[190]
term was 0.99, so as you can clearly see RM
column can be explained as a combination of
[198]
multiple other columns or other features.So
the RM column can be neglected by creating
[203]
a linear regression.
[205]
Similarly higher the value of variance inflation
factor, the more are the chances that you
[211]
can drop it off when you build a linear regression model.
[215]
This is all that I had in terms of explaining what
variance inflation factor or VIF is and how
[220]
it can be used to
remove multicolinearity from your data.
[223]
If you do have any questions with what we covered in this video then feel free to ask
[228]
in the comment section below & I'll do my best to answer those.
[231]
If you enjoy these tutorials & would like to support them then the easiest way is to simply like the video & give it a thumbs up
[238]
& also it's a huge help to share these videos with anyone who you think would find them useful.
[245]
Be sure to subscribe for future videos
& thank you all for watching.
Most Recent Videos:
You can go back to the homepage right here: Homepage





