Statistics 101: Multiple Regression, Backward Elimination - YouTube

Channel: Brandon Foltz

[0]
hello and namaste my name is brandon and
[3]
welcome to the next video in my series
[4]
on basic statistics if you're new to the
[7]
channel welcome it is great to have you
[9]
if you are a returning viewer it is
[11]
great to have you back if you like this
[12]
video
[13]
please give it a thumbs up share it with
[15]
classmates colleagues or friends
[17]
or anyone else you think might benefit
[19]
from watching
[20]
now that we are introduced let's go
[22]
ahead and get started
[24]
so in this video we're going to pick up
[26]
where we left off
[27]
from the last video in the last video we
[30]
looked at
[31]
forward selection which is one of the
[33]
four common
[34]
regression model building techniques so
[36]
forward selection
[37]
and backward elimination which is what
[39]
this video is about
[41]
are very similar they're actually mirror
[44]
opposites
[44]
so if you understand forward selection
[47]
you'll get backward elimination
[49]
now if this is completely new to you i
[51]
highly suggest
[52]
going back and watching that video and
[54]
then coming back to this one
[56]
this video will be a lot shorter because
[58]
i'm not going to go over all that
[60]
again so let's go ahead and dive in
[63]
so there are three common techniques for
[65]
building iterative progression models
[67]
forward selection backward elimination
[70]
and stepwise regression
[72]
the fourth common technique is not
[73]
iterative it's called best
[75]
subsets regression so best subsets
[78]
examines all possible combinations of
[80]
feature variables
[81]
it's kind of a brute force method that
[83]
can be computationally
[85]
expensive it's going to try every
[87]
combination
[88]
of every number of variables that you
[90]
have there can be hundreds and hundreds
[92]
of different models even for relatively
[95]
small
[96]
variable feature sets so the analyst can
[99]
specify the maximum number of features
[101]
which can cut down on the output you
[102]
receive and the analyst can request the
[105]
best two or three models
[107]
for each number of feature variables now
[110]
it's important to note
[111]
that forward selection in the previous
[113]
video backward elimination in this video
[115]
and of course the next two will cover
[117]
later
[118]
these will not always produce the same
[121]
best model sort of the way each one of
[123]
them works it is quite possible
[125]
that they will produce different models
[128]
and as analysts that's something we have
[130]
to keep in mind when we build these
[131]
models
[132]
so forward versus backward the process
[135]
and logic
[136]
of forward selection and backward
[137]
elimination are mere
[139]
opposites of each other if you
[141]
understand forward selection
[143]
understanding backward elimination will
[144]
be very easy
[146]
backward elimination does have the trait
[149]
of showing the individual contribution
[151]
to reduction
[151]
in sse which are sum of squares due to
[154]
error
[155]
by each feature variable at the start
[158]
and like i said before
[159]
it is important to point out that
[160]
forward backward stepwise and best
[163]
subsets
[163]
may not generate the same quote best
[166]
model
[167]
and we'll talk about that more as we
[168]
look at all four together
[171]
so backward elimination just like
[173]
forward selection
[174]
feature variables are examined one at a
[176]
time the analyst that's a stopping rule
[179]
there are several such as p-value that
[181]
we'll use it's easy to understand
[183]
that acts as a hurdle the variable must
[185]
overcome
[186]
in this case to avoid being rejected
[190]
from the model not included it's
[193]
included by default
[194]
but to avoid being rejected from the
[196]
model
[197]
now the variable we are examining must
[199]
not reduce error significantly
[202]
that's a weird way of thinking about it
[203]
because we're used to thinking about it
[204]
the other way
[205]
which is forward selection so the
[207]
variable we're looking at
[208]
must not reduce error significantly
[211]
that's another way of saying
[213]
that whether the variable is in the
[214]
model or not has
[216]
very little effect on the model's sum of
[218]
square
[219]
due to error the feature variable that
[221]
reduces model
[222]
error or sse the least is chosen
[225]
so long as it passes the stopping rule
[228]
once a variable is out of the model
[230]
it stays out it is never allowed back in
[234]
then the process repeats until all
[236]
variables are out of the model
[238]
or no variable clears the stopping rule
[241]
just like forward selection backward
[243]
elimination does have some flaws to it
[245]
so the ability of variables to reduce
[247]
error can change as
[248]
other variables exit the model as
[251]
feature variables are removed
[253]
others can overlap and interact and how
[255]
they explain variance
[257]
or on the flip side how they reduce
[258]
error or don't reduce error
[260]
since a feature variable is permanently
[262]
removed from the model once it exits
[264]
backward elimination is not flexible
[268]
it is possible for a variable that
[270]
exited early to overcome the stopping
[272]
rule later
[273]
as other variables are removed but it's
[276]
out of the model
[277]
permanently there's also a temptation
[280]
factor for your r
[281]
squared so as variables are removed
[284]
the r square will always decrease or
[287]
stay the same
[288]
and you the model builder are sitting
[290]
there watching your r
[292]
square or your explained variance go
[294]
down as
[295]
variables are kicked out there's always
[297]
that temptation factor to want to keep
[298]
variables
[299]
in so here's a quick example this is a
[302]
very generic
[303]
off the cuff example so we have variable
[305]
one variable two variable three
[308]
and variable four so v1 v2 v3
[311]
and v4 and they start in the model by
[314]
default
[314]
that's all our variables they start in
[316]
the model so let's say v4 is removed
[318]
then v3 is removed and then v2
[321]
but now we look at our variables and
[324]
it's possible that
[325]
v4 could if allowed get back in the
[328]
model
[329]
but since it is eliminated it cannot
[332]
return
[333]
so squared semi-partial correlations
[335]
which we talked about at
[336]
length in the forward selection video
[339]
can tease out the relationships
[342]
it can also miss suppressor and or
[343]
complementary relationships
[345]
i don't want to go into this in great
[346]
detail but suppressor variables are kind
[349]
of weird variables
[350]
where a first variable correlates with
[353]
the dependent variable or the target
[354]
variable
[355]
to some degree that's a i think i use
[357]
0.4 in the previous video
[359]
that leaves 0.6 unexplained to the
[362]
target variable
[363]
well a second variable comes in and it
[366]
is not
[367]
correlated with the target variable but
[369]
it is correlated
[370]
with the 0.6 of the first variable hope
[373]
that makes sense
[374]
so a suppressor variable can actually
[376]
correlate with a
[377]
part of another variable without
[379]
correlating to the target variable
[382]
and then we have complementary variables
[384]
that are negatively correlated with each
[385]
other
[386]
but together explain the target variable
[388]
very well as sort of a trade-off
[390]
situation
[390]
and again i talked about that at length
[392]
in the previous video so i won't go over
[394]
it again
[395]
here so step one evaluate the full
[398]
model so we dump all of our variables in
[401]
right at the start
[401]
and we're using the same house price
[403]
data that i used in the previous video
[405]
it's data i script off the web for the
[407]
area kind of around where i live
[409]
and i changed it up a little bit to
[411]
using these teaching modules
[412]
so we dump all four variables in right
[415]
from the start
[416]
so we evaluate each variable we find
[419]
the small f values which will be aligned
[423]
with
[423]
small sum of squares for that variable
[426]
so we find the small values not the
[428]
large ones
[428]
like we did in forward selection then we
[432]
ask
[432]
is the p-value the probability greater
[435]
than
[436]
in this case 0.05 before
[439]
in forward selection we're looking at
[441]
the probability of less than
[443]
0.05 here's the opposite we're looking
[446]
for
[446]
high probabilities if yes then we remove
[450]
that variable and compare what we have
[452]
left we know our ssc
[454]
our degrees of freedom and our p-value
[457]
now to reiterate
[458]
this p-value threshold is completely up
[461]
to us it's up to the analyst
[462]
we can set it very low or very high
[464]
depending on how strict
[466]
or liberal we want to be in rejecting
[468]
variables
[469]
and there are other criteria besides
[471]
p-value that can be used
[472]
but to keep it simple for these modules
[475]
we're just going to go ahead and
[476]
keep it at p value all right so here is
[479]
the
[480]
output from jump jmp which is created by
[483]
sas
[484]
as i said before i love using it for
[485]
teaching these regression models because
[488]
you can actually click on things and
[489]
have it do each thing
[491]
one at a time so a couple things to
[493]
point out you notice that all
[495]
feature variables are entered so i went
[497]
ahead and pressed that enter all
[499]
button up there in the upper right and
[501]
you can see down below
[502]
we have entered checked next to all four
[505]
of our variables then we set a
[507]
probability to leave
[508]
so in this case i actually set it to 0.1
[511]
that's different than 0.5 but this
[513]
principle is the same
[514]
it's 0.1 so it's a little bit higher and
[517]
then for direction we're going
[519]
backward so make sure we set that
[522]
now we look at our ssc and our r squared
[525]
now at this stage with all the variables
[528]
entered our sse will be at its minimum
[532]
that's as low as that ssc is going to go
[536]
as we remove variables our error will
[538]
probably
[539]
increase it will creep up now at this
[541]
stage our r
[542]
square is also at its maximum so 0.7358
[547]
that is the maximum explained variance
[550]
we are going to get out of this
[551]
set of data and these variables so it
[554]
can only stay the same
[556]
or go down from here
[559]
down below we have the unique ability of
[561]
each feature variable
[562]
to reduce sse so if we look in the ss
[566]
column down below
[568]
that is the sum of squares allocated
[570]
uniquely
[571]
to that individual variable it's not
[574]
accounting for any
[575]
shared variance it's accounting for that
[578]
unique
[579]
contribution of that variable so we can
[582]
see that based off this list here
[584]
by looking at it it appears that square
[587]
footage
[588]
is nine seven one that is by far the
[591]
highest
[592]
unique sum of squares then we look at
[594]
the other ones and what do we see
[596]
we can see that number of bedrooms at
[598]
the very bottom there
[599]
has by far the lowest sum of squares
[602]
allocated to it
[603]
and its f ratio is by far the smallest
[607]
so the squared semi-partial correlations
[609]
is a way of measuring the contribution
[611]
of each individual variable to the model
[614]
it's very easy to calculate so it's just
[616]
the f ratio we see here
[618]
in our f ratio column divided by the
[620]
degrees of freedom residual or degrees
[622]
of freedom due to error it's dfe
[624]
and this output which in this case would
[626]
be 95 on this screen
[628]
multiplied by 1 minus the r squared
[631]
and that's it so we could find the
[633]
contribution of each variable
[635]
the amount of explained variance by
[637]
using this little formula
[638]
over here on the right and in the
[640]
previous video i went into that in great
[642]
depth
[644]
so look at our variables we find the one
[646]
with the smallest
[647]
sum of squares contribution the lowest f
[649]
ratio and look at its probability
[651]
well we can see the probability for beds
[654]
is 0.28629 that's well above
[658]
our threshold we set of 0.1 in this case
[661]
so we remove beds it has the smallest
[664]
sum of squares
[665]
and it fails our stopping rule
[668]
now what's the next one so we removed
[670]
badge you can see that the
[672]
entered checkbox is empty the next one
[675]
is exemplary high school so it has the
[678]
sum of squares
[679]
of 18283 that's the next smallest
[682]
f ratio of 11.875 so
[686]
that stays in the model and now
[689]
we stop that's as far as we can go all
[692]
the variables that are in there
[694]
meet our stopping rule and therefore
[696]
that's our final
[697]
model let's quickly take a look at how
[699]
things changed by removing number of
[701]
bedrooms from the model
[702]
so we can see here that our sse is one
[705]
four seven
[706]
eight one five if you look at the top
[708]
the sse
[709]
is one four six zero four
[712]
six or four seven approximately they're
[715]
almost the same
[717]
so yes our sse crept back
[720]
up but it went from six zero four
[724]
seven to seven eight one four
[727]
it went up by around eighteen that's all
[730]
it did
[730]
right so if you look down here for the
[732]
sum of squares for beds what is it
[735]
well it's that same amount now we should
[737]
also point out and it's
[739]
important to note that our root mean
[741]
square error which is sort of the
[742]
measure of how well
[744]
the data fits in the model went from 39
[747]
to 1 approximately at the top to 39.24
[753]
it hardly changed at all did it go up a
[755]
little bit absolutely did it go up by a
[757]
lot
[758]
absolutely not now look at the r square
[760]
so with all the variables our r square
[762]
was 0.7358
[764]
we took out number of bedrooms and yes
[767]
it went down
[768]
but it went to 0.7326 it hardly
[772]
budged at all and that's because number
[774]
of bedrooms
[775]
made very little if any contribution
[779]
to the overall model so even though we
[781]
took out that variable
[782]
we didn't really lose anything and there
[785]
is
[786]
evidence over here on the right and some
[787]
of these other measures that this is in
[789]
fact
[790]
a better model so mallows c the aicc
[794]
and stuff like that but we'll get to
[795]
that in later videos so
[797]
we took out number of bedrooms but
[798]
didn't really lose anything by doing it
[802]
all right so remember how this model did
[804]
for my own house
[806]
here's our regression equation with our
[807]
three variables in it
[809]
each square foot is worth 73 dollars and
[811]
20 cents
[812]
with the multiply everything by a
[813]
thousand to get the actual dollar value
[815]
so each square foot is worth 73 20.
[819]
being in an exemplary school district
[821]
adds 28
[822]
930 to the value of the home that's
[825]
because
[826]
exemplary high school is an indicator
[827]
variable it's either one or zero
[830]
so if it's a one it adds 28 930 to the
[833]
value of the home
[835]
and then each bathroom adds 35
[839]
470 dollars to the value of the home
[842]
that's just a regular number so if i
[844]
plug in the numbers for my house
[846]
it comes out to a price or a value
[849]
of 174 915 dollars
[853]
and as i mentioned i bought my house for
[856]
172
[858]
three that was a few years ago so for my
[860]
house
[861]
this is a very very good model
[865]
all right that wraps up this video on
[867]
backward elimination
[868]
again very similar to forward selection
[871]
it's just sort of the mirror
[872]
opposite so in forward selection we add
[874]
one variable at a time assuming it meets
[876]
our stopping rule
[877]
and backward elimination we remove one
[880]
variable at a time
[881]
if it fails to get over the hurdle of
[884]
our stopping rule and then we can see
[886]
how the numbers change as variables are
[887]
removed
[888]
and we notice that a lot of them didn't
[890]
change very much by removing that one
[892]
variable which is good we want to have
[895]
the simplest
[896]
model possible that explains the most
[899]
amount of variance that's always what
[903]
we're looking for
[904]
when we're building multiple regression
[906]
models like this
[907]
so thank you very much for watching i
[909]
appreciate you spending some of your
[910]
valuable time learning with me
[912]
and look forward to seeing you again in
[914]
the next video take care
[916]
bye bye