🔍

Statistics 101: Multiple Regression, Stepwise Regression - YouTube

Channel: Brandon Foltz

[0]

hello and namaste my name is brandon

[3]

and welcome to the next video in my

[4]

series on basic statistics

[6]

if you are new to the channel welcome it

[8]

is great to have you if you're a

[9]

returning viewer

[10]

it is great to have you back if you like

[12]

this video please give it a thumbs up

[14]

share it with classmates colleagues or

[16]

friends or anyone else you think might

[19]

benefit from watching

[20]

so now that we are introduced let's go

[22]

ahead and get to it

[24]

so this video is the next in our series

[26]

where we're learning about regression

[28]

model building techniques

[29]

so up at this point we've learned about

[31]

forward selection

[32]

and backward elimination now luckily

[35]

stepwise

[36]

is just the combination of the two so if

[38]

you understand forward and backward

[40]

you are ninety percent of the way there

[42]

to understand stepwise

[43]

so because of that we're going to make

[45]

this video short high level

[47]

and conceptual because you probably have

[49]

most of the info

[50]

you already need let's go ahead and dive

[53]

right in

[54]

so to put things in context just

[55]

remember that there are three common

[57]

techniques for building iterative

[59]

regression models so forward selection

[61]

backward elimination

[62]

and then this video which is about

[64]

stepwise regression it's actually very

[66]

popular people

[67]

gravitate to step wise for some reason

[70]

the fourth common technique

[71]

is not iterative it's called best

[73]

subsets and that will be

[75]

the next topic in this series next video

[78]

best subsets examines all possible

[80]

combinations of feature variables

[82]

sort of brute force method that can be

[84]

computationally expensive and the output

[86]

can be

[86]

very very long if you don't control it

[88]

the analyst can specify the maximum

[91]

number of features in the final model

[93]

if they so choose and the analyst can

[95]

request the best say

[96]

two or three models for each number of

[99]

feature variables

[100]

so the top three two variable models the

[103]

top three

[104]

three variable models and so on and so

[106]

forth we'll talk about that more

[107]

in the next video now these methods will

[110]

not always produce the same best model

[113]

and actually in best subsets you will

[115]

have competing metrics

[117]

as to which one is the best model and

[119]

again we'll talk about that in the next

[120]

video

[120]

but keep in mind these may not all

[123]

arrive at the same

[124]

model so remember the entire goal of

[127]

regression model building

[128]

is to reduce the error sum of squares so

[131]

we want to take the ssc

[133]

and reduce it to the smallest possible

[135]

value without

[136]

overfitting the model so we want the

[138]

simplest model

[139]

that explains the maximum variance in

[141]

our dependent variable

[143]

we decide using a threshold value and in

[146]

these cases we've been using the p

[147]

value but there are others on the

[150]

partial f statistic

[151]

and that represents the unique

[153]

contribution

[154]

of the reduction in sse of that feature

[158]

variable

[160]

now forward selection adds features one

[162]

by one

[163]

to an empty model so we start empty

[166]

until no features overcome our threshold

[169]

value

[170]

and once the features are in they never

[172]

leave

[173]

backward elimination is the opposite so

[177]

there we start with a full model

[179]

and we pull variables out one by one

[182]

if they don't overcome the threshold

[184]

value and once a feature

[186]

leaves it never returns so you can see

[189]

the difference there

[190]

so far to sum up this video in one slide

[193]

it's this

[194]

stepwise regression is just forward

[197]

selection

[198]

and backward elimination combined into

[200]

one

[201]

process and we'll see how that works

[203]

here in a second so stepwise is just a

[205]

combination of forward and backward

[207]

there are two threshold values this time

[210]

one for entry

[211]

and one for exiting the model usually

[214]

the threshold to

[215]

exit is set a bit more liberally for

[218]

example if we have 0.05 to enter

[221]

and we'll have 0.10 to exit

[225]

so you can see kind of how that works

[227]

what this does is it makes the model a

[229]

bit more stable

[230]

so we don't have feature variables just

[232]

flying in and out of the model

[233]

everywhere

[234]

so the process is very stepped that's

[237]

why it's called stepwise

[238]

so we go forward we evaluate our future

[241]

variables

[241]

then we do a backwards step if any can

[244]

be eliminated

[245]

then we evaluate go forward again

[248]

evaluate

[248]

backward and that's the step forward

[251]

evaluate

[252]

backward evaluate forward evaluate

[254]

backward

[255]

evaluate we just keep doing that process

[257]

and stepwise

[259]

so at each step we evaluate the model if

[261]

a feature is no longer contributing to

[263]

the reduction in

[264]

error or the sse it is removed it is

[267]

deleted

[268]

then we move forward again so features

[272]

can re-enter at a later step and that's

[274]

what distinguishes step wise from the

[276]

first two methods so once a variable is

[278]

removed

[279]

if the ssc changes you know like i said

[282]

these models are living breathing things

[284]

and the

[284]

sum of squares will be reallocated at

[287]

each step

[288]

so if at a later step that variable can

[290]

contribute to the reduction

[292]

in sse then it can re-enter and that's

[295]

the difference

[296]

so stepwise allows re-entry at a later

[298]

step

[299]

so like most things there are some pros

[301]

and cons to step-wise

[303]

one big pro is that it's much more

[304]

flexible than

[306]

forward selection and backward

[307]

elimination because in those two methods

[310]

once a variable is

[311]

in or out that's it it can't come back

[313]

in or it can't

[315]

leave again so stepwise allows us to

[318]

reevaluate and put variables that were

[320]

eliminated at one step

[322]

back in stepwise is also very

[324]

transparent there's no black

[326]

box thing going on here so we can see

[329]

in our output exactly what is happening

[332]

at each step so our output will almost

[335]

always tell us

[336]

we enter this variable at this step at

[338]

this step this one left

[340]

and then we entered this one and then

[342]

maybe the one that left earlier comes

[344]

back in

[345]

later it's very transparent so you can

[347]

tell exactly

[348]

what's going on and how your model is

[351]

changing

[352]

so a con to stepwise is that it may

[354]

produce combinations of features that

[356]

are a bit

[357]

strange and don't make sense stepwise is

[360]

much better at

[361]

practical predictions because if you're

[363]

trying to predict a value

[364]

you know the manager of the business or

[366]

someone else probably doesn't

[368]

care if the variables make a whole lot

[370]

of practical sense they just want good

[371]

predictions

[372]

but that leaves open the fault of step

[376]

wise

[377]

and it's not very good at making

[378]

theoretical models

[380]

so when we make theoretical models

[382]

trying to explain sets of variables that

[384]

explain a dependent variable where we

[386]

enter variables into sets or something

[387]

like that

[388]

stepwise isn't good at that because it

[391]

can produce some very strange

[392]

combinations so i would imagine most

[395]

people watching this video

[396]

fall in the former category just trying

[398]

to make good

[400]

practical predictions in business or

[401]

whatever else so you probably don't need

[403]

to worry about this

[404]

that much for your purposes let's take a

[407]

rough look at how this might work

[409]

visually now this is a very rough

[411]

process sort of diagram so forgive

[413]

my graphical skills but the thing i want

[415]

you to get from this is actually the

[416]

process

[417]

it's not you know my artistic abilities

[420]

here

[420]

so let's say we have five feature

[422]

variables i'll reference them as v1 v2

[425]

v3 v4 and v5 for the rest of this slide

[429]

so we start with forward selection so we

[432]

look at our five variables here and we

[433]

say

[434]

hmm which one reduces the sse the most

[438]

and does it meet our threshold value

[441]

because reducing error the most does not

[444]

mean it meets the threshold value

[445]

okay it has to like be both so let's say

[449]

v2 is that variable it reduces error the

[452]

most

[452]

and it meets the threshold value for

[454]

entry into the model

[456]

so v2 is n now we have v1 v3

[459]

v4 and v5 still outside

[462]

the model now there's no real backward

[464]

step here because we'd end up

[466]

right back where we started so why would

[468]

we just remove a variable we just

[469]

entered

[470]

when there are no other variables in the

[471]

model to change it that's the important

[474]

thing

[474]

so we look at our other four variables

[476]

here and let's say

[478]

at this stage v4 enters the model

[481]

so it's the next one that reduces error

[483]

the most and

[484]

meets our threshold value for entry so

[486]

now we have v2 and v4 in the model

[489]

then we pause we'd look and make sure

[492]

the entry of v4 into the model could

[495]

have changed

[496]

v2 so we have to look we ask ourselves

[500]

do they both still contribute to

[502]

significant reduction

[504]

in error if they do they stay if one of

[507]

them falls below the threshold value to

[509]

exit

[510]

then we remove it so let's say for the

[512]

sake of argument they're both fine

[514]

now we look at the next three variables

[516]

v1 v3

[517]

and v5 so after doing that we see that

[521]

v5 enters the model so now we have v2

[525]

v4 v5 those are in

[528]

and v1 and v3 are out so now we stop

[532]

we look at v2 v4 and v5

[535]

so we ask ourselves do any of those fall

[537]

below the threshold to exit the model

[539]

and let's say at this stage because now

[541]

we have three variables and the sum of

[543]

squares has shifted around

[545]

between those three variables let's say

[548]

[549]

exits it no longer meets our criteria to

[552]

remain

[553]

in the model so we'd remove v4 we look

[556]

at v1 and v3

[558]

to see if they could enter and it might

[560]

look like this so in this case

[563]

v4 exits v1 enters

[567]

so now we have v2 v1 and v5 in our model

[572]

v3 is still outside and v4 was just

[575]

removed

[576]

now remember v4 could return at a later

[579]

step

[580]

and stepwise there's nothing prohibiting

[582]

that so our next step

[583]

let's say we have v2 v1 v5

[587]

v3 never makes it in the model and then

[589]

[591]

never overcomes the threshold to entry

[593]

again and it remains outside

[595]

now it could have entered so it depends

[598]

on again how the sum of squares

[601]

is allocated but just because it exited

[604]

at one point doesn't mean it can't enter

[605]

at a later point

[606]

so you can see this sort of back and

[608]

forth stepwise

[609]

process on how variables enter the model

[612]

how we decide if they leave the model

[614]

and if they might enter the model again

[616]

at a later step and that's what we call

[618]

[619]

stepwise regression

[622]

so here's some output so you can see

[624]

here that this is jump by the way

[626]

jmp jump by sas you can see that in the

[629]

settings here we have a threshold

[631]

to enter and a probability threshold to

[634]

leave that's what we discussed before

[636]

we can see that we have a mixed

[638]

direction that means forward and

[639]

backward

[641]

so our sse these are all the measures we

[643]

have talked about before

[644]

we can see down here our f ratios and

[646]

our probabilities and so on and so forth

[649]

we can see that we have our step history

[651]

down here that's what i mean by

[652]

transparency

[653]

so the software will tell us exactly

[656]

what happened

[657]

so action entered action entered action

[660]

entered

[661]

and if a variable leaves it'll say exit

[664]

and it'll give us all of our

[665]

probabilities down here as well it's

[667]

very transparent

[669]

and then these are just some things from

[671]

previous videos i've done

[672]

if you want to know the squared

[673]

semi-partial correlation which is the

[676]

unique ability of each

[677]

feature to reduce the sse you can look

[680]

at it that way that's another way of

[682]

looking at it

[682]

all right so that wraps up this video on

[684]

stepwise regression again it's just

[686]

forward selection

[687]

and backward elimination combined into

[689]

one process

[690]

so it's pretty easy to understand so in

[693]

the next video we'll talk about best

[694]

subsets which can get a bit more

[695]

complicated

[696]

i hope you found this video helpful i

[698]

hope you learned some things and i look

[699]

forward to seeing you again in the next

[700]

video

[701]

take care bye bye

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage