馃攳
Statistics 101: Multiple Regression, Stepwise Regression - YouTube
Channel: Brandon Foltz
[0]
hello and namaste my name is brandon
[3]
and welcome to the next video in my
[4]
series on basic statistics
[6]
if you are new to the channel welcome it
[8]
is great to have you if you're a
[9]
returning viewer
[10]
it is great to have you back if you like
[12]
this video please give it a thumbs up
[14]
share it with classmates colleagues or
[16]
friends or anyone else you think might
[19]
benefit from watching
[20]
so now that we are introduced let's go
[22]
ahead and get to it
[24]
so this video is the next in our series
[26]
where we're learning about regression
[28]
model building techniques
[29]
so up at this point we've learned about
[31]
forward selection
[32]
and backward elimination now luckily
[35]
stepwise
[36]
is just the combination of the two so if
[38]
you understand forward and backward
[40]
you are ninety percent of the way there
[42]
to understand stepwise
[43]
so because of that we're going to make
[45]
this video short high level
[47]
and conceptual because you probably have
[49]
most of the info
[50]
you already need let's go ahead and dive
[53]
right in
[54]
so to put things in context just
[55]
remember that there are three common
[57]
techniques for building iterative
[59]
regression models so forward selection
[61]
backward elimination
[62]
and then this video which is about
[64]
stepwise regression it's actually very
[66]
popular people
[67]
gravitate to step wise for some reason
[70]
the fourth common technique
[71]
is not iterative it's called best
[73]
subsets and that will be
[75]
the next topic in this series next video
[78]
best subsets examines all possible
[80]
combinations of feature variables
[82]
sort of brute force method that can be
[84]
computationally expensive and the output
[86]
can be
[86]
very very long if you don't control it
[88]
the analyst can specify the maximum
[91]
number of features in the final model
[93]
if they so choose and the analyst can
[95]
request the best say
[96]
two or three models for each number of
[99]
feature variables
[100]
so the top three two variable models the
[103]
top three
[104]
three variable models and so on and so
[106]
forth we'll talk about that more
[107]
in the next video now these methods will
[110]
not always produce the same best model
[113]
and actually in best subsets you will
[115]
have competing metrics
[117]
as to which one is the best model and
[119]
again we'll talk about that in the next
[120]
video
[120]
but keep in mind these may not all
[123]
arrive at the same
[124]
model so remember the entire goal of
[127]
regression model building
[128]
is to reduce the error sum of squares so
[131]
we want to take the ssc
[133]
and reduce it to the smallest possible
[135]
value without
[136]
overfitting the model so we want the
[138]
simplest model
[139]
that explains the maximum variance in
[141]
our dependent variable
[143]
we decide using a threshold value and in
[146]
these cases we've been using the p
[147]
value but there are others on the
[150]
partial f statistic
[151]
and that represents the unique
[153]
contribution
[154]
of the reduction in sse of that feature
[158]
variable
[160]
now forward selection adds features one
[162]
by one
[163]
to an empty model so we start empty
[166]
until no features overcome our threshold
[169]
value
[170]
and once the features are in they never
[172]
leave
[173]
backward elimination is the opposite so
[177]
there we start with a full model
[179]
and we pull variables out one by one
[182]
if they don't overcome the threshold
[184]
value and once a feature
[186]
leaves it never returns so you can see
[189]
the difference there
[190]
so far to sum up this video in one slide
[193]
it's this
[194]
stepwise regression is just forward
[197]
selection
[198]
and backward elimination combined into
[200]
one
[201]
process and we'll see how that works
[203]
here in a second so stepwise is just a
[205]
combination of forward and backward
[207]
there are two threshold values this time
[210]
one for entry
[211]
and one for exiting the model usually
[214]
the threshold to
[215]
exit is set a bit more liberally for
[218]
example if we have 0.05 to enter
[221]
and we'll have 0.10 to exit
[225]
so you can see kind of how that works
[227]
what this does is it makes the model a
[229]
bit more stable
[230]
so we don't have feature variables just
[232]
flying in and out of the model
[233]
everywhere
[234]
so the process is very stepped that's
[237]
why it's called stepwise
[238]
so we go forward we evaluate our future
[241]
variables
[241]
then we do a backwards step if any can
[244]
be eliminated
[245]
then we evaluate go forward again
[248]
evaluate
[248]
backward and that's the step forward
[251]
evaluate
[252]
backward evaluate forward evaluate
[254]
backward
[255]
evaluate we just keep doing that process
[257]
and stepwise
[259]
so at each step we evaluate the model if
[261]
a feature is no longer contributing to
[263]
the reduction in
[264]
error or the sse it is removed it is
[267]
deleted
[268]
then we move forward again so features
[272]
can re-enter at a later step and that's
[274]
what distinguishes step wise from the
[276]
first two methods so once a variable is
[278]
removed
[279]
if the ssc changes you know like i said
[282]
these models are living breathing things
[284]
and the
[284]
sum of squares will be reallocated at
[287]
each step
[288]
so if at a later step that variable can
[290]
contribute to the reduction
[292]
in sse then it can re-enter and that's
[295]
the difference
[296]
so stepwise allows re-entry at a later
[298]
step
[299]
so like most things there are some pros
[301]
and cons to step-wise
[303]
one big pro is that it's much more
[304]
flexible than
[306]
forward selection and backward
[307]
elimination because in those two methods
[310]
once a variable is
[311]
in or out that's it it can't come back
[313]
in or it can't
[315]
leave again so stepwise allows us to
[318]
reevaluate and put variables that were
[320]
eliminated at one step
[322]
back in stepwise is also very
[324]
transparent there's no black
[326]
box thing going on here so we can see
[329]
in our output exactly what is happening
[332]
at each step so our output will almost
[335]
always tell us
[336]
we enter this variable at this step at
[338]
this step this one left
[340]
and then we entered this one and then
[342]
maybe the one that left earlier comes
[344]
back in
[345]
later it's very transparent so you can
[347]
tell exactly
[348]
what's going on and how your model is
[351]
changing
[352]
so a con to stepwise is that it may
[354]
produce combinations of features that
[356]
are a bit
[357]
strange and don't make sense stepwise is
[360]
much better at
[361]
practical predictions because if you're
[363]
trying to predict a value
[364]
you know the manager of the business or
[366]
someone else probably doesn't
[368]
care if the variables make a whole lot
[370]
of practical sense they just want good
[371]
predictions
[372]
but that leaves open the fault of step
[376]
wise
[377]
and it's not very good at making
[378]
theoretical models
[380]
so when we make theoretical models
[382]
trying to explain sets of variables that
[384]
explain a dependent variable where we
[386]
enter variables into sets or something
[387]
like that
[388]
stepwise isn't good at that because it
[391]
can produce some very strange
[392]
combinations so i would imagine most
[395]
people watching this video
[396]
fall in the former category just trying
[398]
to make good
[400]
practical predictions in business or
[401]
whatever else so you probably don't need
[403]
to worry about this
[404]
that much for your purposes let's take a
[407]
rough look at how this might work
[409]
visually now this is a very rough
[411]
process sort of diagram so forgive
[413]
my graphical skills but the thing i want
[415]
you to get from this is actually the
[416]
process
[417]
it's not you know my artistic abilities
[420]
here
[420]
so let's say we have five feature
[422]
variables i'll reference them as v1 v2
[425]
v3 v4 and v5 for the rest of this slide
[429]
so we start with forward selection so we
[432]
look at our five variables here and we
[433]
say
[434]
hmm which one reduces the sse the most
[438]
and does it meet our threshold value
[441]
because reducing error the most does not
[444]
mean it meets the threshold value
[445]
okay it has to like be both so let's say
[449]
v2 is that variable it reduces error the
[452]
most
[452]
and it meets the threshold value for
[454]
entry into the model
[456]
so v2 is n now we have v1 v3
[459]
v4 and v5 still outside
[462]
the model now there's no real backward
[464]
step here because we'd end up
[466]
right back where we started so why would
[468]
we just remove a variable we just
[469]
entered
[470]
when there are no other variables in the
[471]
model to change it that's the important
[474]
thing
[474]
so we look at our other four variables
[476]
here and let's say
[478]
at this stage v4 enters the model
[481]
so it's the next one that reduces error
[483]
the most and
[484]
meets our threshold value for entry so
[486]
now we have v2 and v4 in the model
[489]
then we pause we'd look and make sure
[492]
the entry of v4 into the model could
[495]
have changed
[496]
v2 so we have to look we ask ourselves
[500]
do they both still contribute to
[502]
significant reduction
[504]
in error if they do they stay if one of
[507]
them falls below the threshold value to
[509]
exit
[510]
then we remove it so let's say for the
[512]
sake of argument they're both fine
[514]
now we look at the next three variables
[516]
v1 v3
[517]
and v5 so after doing that we see that
[521]
v5 enters the model so now we have v2
[525]
v4 v5 those are in
[528]
and v1 and v3 are out so now we stop
[532]
we look at v2 v4 and v5
[535]
so we ask ourselves do any of those fall
[537]
below the threshold to exit the model
[539]
and let's say at this stage because now
[541]
we have three variables and the sum of
[543]
squares has shifted around
[545]
between those three variables let's say
[548]
v4
[549]
exits it no longer meets our criteria to
[552]
remain
[553]
in the model so we'd remove v4 we look
[556]
at v1 and v3
[558]
to see if they could enter and it might
[560]
look like this so in this case
[563]
v4 exits v1 enters
[567]
so now we have v2 v1 and v5 in our model
[572]
v3 is still outside and v4 was just
[575]
removed
[576]
now remember v4 could return at a later
[579]
step
[580]
and stepwise there's nothing prohibiting
[582]
that so our next step
[583]
let's say we have v2 v1 v5
[587]
v3 never makes it in the model and then
[589]
v4
[591]
never overcomes the threshold to entry
[593]
again and it remains outside
[595]
now it could have entered so it depends
[598]
on again how the sum of squares
[601]
is allocated but just because it exited
[604]
at one point doesn't mean it can't enter
[605]
at a later point
[606]
so you can see this sort of back and
[608]
forth stepwise
[609]
process on how variables enter the model
[612]
how we decide if they leave the model
[614]
and if they might enter the model again
[616]
at a later step and that's what we call
[618]
it
[619]
stepwise regression
[622]
so here's some output so you can see
[624]
here that this is jump by the way
[626]
jmp jump by sas you can see that in the
[629]
settings here we have a threshold
[631]
to enter and a probability threshold to
[634]
leave that's what we discussed before
[636]
we can see that we have a mixed
[638]
direction that means forward and
[639]
backward
[641]
so our sse these are all the measures we
[643]
have talked about before
[644]
we can see down here our f ratios and
[646]
our probabilities and so on and so forth
[649]
we can see that we have our step history
[651]
down here that's what i mean by
[652]
transparency
[653]
so the software will tell us exactly
[656]
what happened
[657]
so action entered action entered action
[660]
entered
[661]
and if a variable leaves it'll say exit
[664]
and it'll give us all of our
[665]
probabilities down here as well it's
[667]
very transparent
[669]
and then these are just some things from
[671]
previous videos i've done
[672]
if you want to know the squared
[673]
semi-partial correlation which is the
[676]
unique ability of each
[677]
feature to reduce the sse you can look
[680]
at it that way that's another way of
[682]
looking at it
[682]
all right so that wraps up this video on
[684]
stepwise regression again it's just
[686]
forward selection
[687]
and backward elimination combined into
[689]
one process
[690]
so it's pretty easy to understand so in
[693]
the next video we'll talk about best
[694]
subsets which can get a bit more
[695]
complicated
[696]
i hope you found this video helpful i
[698]
hope you learned some things and i look
[699]
forward to seeing you again in the next
[700]
video
[701]
take care bye bye
Most Recent Videos:
You can go back to the homepage right here: Homepage





