🔍

Statistics 101: Multiple Regression, Backward Elimination - YouTube

Channel: Brandon Foltz

[0]

hello and namaste my name is brandon and

[3]

welcome to the next video in my series

[4]

on basic statistics if you're new to the

[7]

channel welcome it is great to have you

[9]

if you are a returning viewer it is

[11]

great to have you back if you like this

[12]

video

[13]

please give it a thumbs up share it with

[15]

classmates colleagues or friends

[17]

or anyone else you think might benefit

[19]

from watching

[20]

now that we are introduced let's go

[22]

ahead and get started

[24]

so in this video we're going to pick up

[26]

where we left off

[27]

from the last video in the last video we

[30]

looked at

[31]

forward selection which is one of the

[33]

four common

[34]

regression model building techniques so

[36]

forward selection

[37]

and backward elimination which is what

[39]

this video is about

[41]

are very similar they're actually mirror

[44]

opposites

[44]

so if you understand forward selection

[47]

you'll get backward elimination

[49]

now if this is completely new to you i

[51]

highly suggest

[52]

going back and watching that video and

[54]

then coming back to this one

[56]

this video will be a lot shorter because

[58]

i'm not going to go over all that

[60]

again so let's go ahead and dive in

[63]

so there are three common techniques for

[65]

building iterative progression models

[67]

forward selection backward elimination

[70]

and stepwise regression

[72]

the fourth common technique is not

[73]

iterative it's called best

[75]

subsets regression so best subsets

[78]

examines all possible combinations of

[80]

feature variables

[81]

it's kind of a brute force method that

[83]

can be computationally

[85]

expensive it's going to try every

[87]

combination

[88]

of every number of variables that you

[90]

have there can be hundreds and hundreds

[92]

of different models even for relatively

[95]

small

[96]

variable feature sets so the analyst can

[99]

specify the maximum number of features

[101]

which can cut down on the output you

[102]

receive and the analyst can request the

[105]

best two or three models

[107]

for each number of feature variables now

[110]

it's important to note

[111]

that forward selection in the previous

[113]

video backward elimination in this video

[115]

and of course the next two will cover

[117]

later

[118]

these will not always produce the same

[121]

best model sort of the way each one of

[123]

them works it is quite possible

[125]

that they will produce different models

[128]

and as analysts that's something we have

[130]

to keep in mind when we build these

[131]

models

[132]

so forward versus backward the process

[135]

and logic

[136]

of forward selection and backward

[137]

elimination are mere

[139]

opposites of each other if you

[141]

understand forward selection

[143]

understanding backward elimination will

[144]

be very easy

[146]

backward elimination does have the trait

[149]

of showing the individual contribution

[151]

to reduction

[151]

in sse which are sum of squares due to

[154]

error

[155]

by each feature variable at the start

[158]

and like i said before

[159]

it is important to point out that

[160]

forward backward stepwise and best

[163]

subsets

[163]

may not generate the same quote best

[166]

model

[167]

and we'll talk about that more as we

[168]

look at all four together

[171]

so backward elimination just like

[173]

forward selection

[174]

feature variables are examined one at a

[176]

time the analyst that's a stopping rule

[179]

there are several such as p-value that

[181]

we'll use it's easy to understand

[183]

that acts as a hurdle the variable must

[185]

overcome

[186]

in this case to avoid being rejected

[190]

from the model not included it's

[193]

included by default

[194]

but to avoid being rejected from the

[196]

model

[197]

now the variable we are examining must

[199]

not reduce error significantly

[202]

that's a weird way of thinking about it

[203]

because we're used to thinking about it

[204]

the other way

[205]

which is forward selection so the

[207]

variable we're looking at

[208]

must not reduce error significantly

[211]

that's another way of saying

[213]

that whether the variable is in the

[214]

model or not has

[216]

very little effect on the model's sum of

[218]

square

[219]

due to error the feature variable that

[221]

reduces model

[222]

error or sse the least is chosen

[225]

so long as it passes the stopping rule

[228]

once a variable is out of the model

[230]

it stays out it is never allowed back in

[234]

then the process repeats until all

[236]

variables are out of the model

[238]

or no variable clears the stopping rule

[241]

just like forward selection backward

[243]

elimination does have some flaws to it

[245]

so the ability of variables to reduce

[247]

error can change as

[248]

other variables exit the model as

[251]

feature variables are removed

[253]

others can overlap and interact and how

[255]

they explain variance

[257]

or on the flip side how they reduce

[258]

error or don't reduce error

[260]

since a feature variable is permanently

[262]

removed from the model once it exits

[264]

backward elimination is not flexible

[268]

it is possible for a variable that

[270]

exited early to overcome the stopping

[272]

rule later

[273]

as other variables are removed but it's

[276]

out of the model

[277]

permanently there's also a temptation

[280]

factor for your r

[281]

squared so as variables are removed

[284]

the r square will always decrease or

[287]

stay the same

[288]

and you the model builder are sitting

[290]

there watching your r

[292]

square or your explained variance go

[294]

down as

[295]

variables are kicked out there's always

[297]

that temptation factor to want to keep

[298]

variables

[299]

in so here's a quick example this is a

[302]

very generic

[303]

off the cuff example so we have variable

[305]

one variable two variable three

[308]

and variable four so v1 v2 v3

[311]

and v4 and they start in the model by

[314]

default

[314]

that's all our variables they start in

[316]

the model so let's say v4 is removed

[318]

then v3 is removed and then v2

[321]

but now we look at our variables and

[324]

it's possible that

[325]

v4 could if allowed get back in the

[328]

model

[329]

but since it is eliminated it cannot

[332]

return

[333]

so squared semi-partial correlations

[335]

which we talked about at

[336]

length in the forward selection video

[339]

can tease out the relationships

[342]

it can also miss suppressor and or

[343]

complementary relationships

[345]

i don't want to go into this in great

[346]

detail but suppressor variables are kind

[349]

of weird variables

[350]

where a first variable correlates with

[353]

the dependent variable or the target

[354]

variable

[355]

to some degree that's a i think i use

[357]

0.4 in the previous video

[359]

that leaves 0.6 unexplained to the

[362]

target variable

[363]

well a second variable comes in and it

[366]

is not

[367]

correlated with the target variable but

[369]

it is correlated

[370]

with the 0.6 of the first variable hope

[373]

that makes sense

[374]

so a suppressor variable can actually

[376]

correlate with a

[377]

part of another variable without

[379]

correlating to the target variable

[382]

and then we have complementary variables

[384]

that are negatively correlated with each

[385]

other

[386]

but together explain the target variable

[388]

very well as sort of a trade-off

[390]

situation

[390]

and again i talked about that at length

[392]

in the previous video so i won't go over

[394]

it again

[395]

here so step one evaluate the full

[398]

model so we dump all of our variables in

[401]

right at the start

[401]

and we're using the same house price

[403]

data that i used in the previous video

[405]

it's data i script off the web for the

[407]

area kind of around where i live

[409]

and i changed it up a little bit to

[411]

using these teaching modules

[412]

so we dump all four variables in right

[415]

from the start

[416]

so we evaluate each variable we find

[419]

the small f values which will be aligned

[423]

with

[423]

small sum of squares for that variable

[426]

so we find the small values not the

[428]

large ones

[428]

like we did in forward selection then we

[432]

ask

[432]

is the p-value the probability greater

[435]

than

[436]

in this case 0.05 before

[439]

in forward selection we're looking at

[441]

the probability of less than

[443]

0.05 here's the opposite we're looking

[446]

for

[446]

high probabilities if yes then we remove

[450]

that variable and compare what we have

[452]

left we know our ssc

[454]

our degrees of freedom and our p-value

[457]

now to reiterate

[458]

this p-value threshold is completely up

[461]

to us it's up to the analyst

[462]

we can set it very low or very high

[464]

depending on how strict

[466]

or liberal we want to be in rejecting

[468]

variables

[469]

and there are other criteria besides

[471]

p-value that can be used

[472]

but to keep it simple for these modules

[475]

we're just going to go ahead and

[476]

keep it at p value all right so here is

[479]

the

[480]

output from jump jmp which is created by

[483]

sas

[484]

as i said before i love using it for

[485]

teaching these regression models because

[488]

you can actually click on things and

[489]

have it do each thing

[491]

one at a time so a couple things to

[493]

point out you notice that all

[495]

feature variables are entered so i went

[497]

ahead and pressed that enter all

[499]

button up there in the upper right and

[501]

you can see down below

[502]

we have entered checked next to all four

[505]

of our variables then we set a

[507]

probability to leave

[508]

so in this case i actually set it to 0.1

[511]

that's different than 0.5 but this

[513]

principle is the same

[514]

it's 0.1 so it's a little bit higher and

[517]

then for direction we're going

[519]

backward so make sure we set that

[522]

now we look at our ssc and our r squared

[525]

now at this stage with all the variables

[528]

entered our sse will be at its minimum

[532]

that's as low as that ssc is going to go

[536]

as we remove variables our error will

[538]

probably

[539]

increase it will creep up now at this

[541]

stage our r

[542]

square is also at its maximum so 0.7358

[547]

that is the maximum explained variance

[550]

we are going to get out of this

[551]

set of data and these variables so it

[554]

can only stay the same

[556]

or go down from here

[559]

down below we have the unique ability of

[561]

each feature variable

[562]

to reduce sse so if we look in the ss

[566]

column down below

[568]

that is the sum of squares allocated

[570]

uniquely

[571]

to that individual variable it's not

[574]

accounting for any

[575]

shared variance it's accounting for that

[578]

unique

[579]

contribution of that variable so we can

[582]

see that based off this list here

[584]

by looking at it it appears that square

[587]

footage

[588]

is nine seven one that is by far the

[591]

highest

[592]

unique sum of squares then we look at

[594]

the other ones and what do we see

[596]

we can see that number of bedrooms at

[598]

the very bottom there

[599]

has by far the lowest sum of squares

[602]

allocated to it

[603]

and its f ratio is by far the smallest

[607]

so the squared semi-partial correlations

[609]

is a way of measuring the contribution

[611]

of each individual variable to the model

[614]

it's very easy to calculate so it's just

[616]

the f ratio we see here

[618]

in our f ratio column divided by the

[620]

degrees of freedom residual or degrees

[622]

of freedom due to error it's dfe

[624]

and this output which in this case would

[626]

be 95 on this screen

[628]

multiplied by 1 minus the r squared

[631]

and that's it so we could find the

[633]

contribution of each variable

[635]

the amount of explained variance by

[637]

using this little formula

[638]

over here on the right and in the

[640]

previous video i went into that in great

[642]

depth

[644]

so look at our variables we find the one

[646]

with the smallest

[647]

sum of squares contribution the lowest f

[649]

ratio and look at its probability

[651]

well we can see the probability for beds

[654]

is 0.28629 that's well above

[658]

our threshold we set of 0.1 in this case

[661]

so we remove beds it has the smallest

[664]

sum of squares

[665]

and it fails our stopping rule

[668]

now what's the next one so we removed

[670]

badge you can see that the

[672]

entered checkbox is empty the next one

[675]

is exemplary high school so it has the

[678]

sum of squares

[679]

of 18283 that's the next smallest

[682]

f ratio of 11.875 so

[686]

that stays in the model and now

[689]

we stop that's as far as we can go all

[692]

the variables that are in there

[694]

meet our stopping rule and therefore

[696]

that's our final

[697]

model let's quickly take a look at how

[699]

things changed by removing number of

[701]

bedrooms from the model

[702]

so we can see here that our sse is one

[705]

four seven

[706]

eight one five if you look at the top

[708]

the sse

[709]

is one four six zero four

[712]

six or four seven approximately they're

[715]

almost the same

[717]

so yes our sse crept back

[720]

up but it went from six zero four

[724]

seven to seven eight one four

[727]

it went up by around eighteen that's all

[730]

it did

[730]

right so if you look down here for the

[732]

sum of squares for beds what is it

[735]

well it's that same amount now we should

[737]

also point out and it's

[739]

important to note that our root mean

[741]

square error which is sort of the

[742]

measure of how well

[744]

the data fits in the model went from 39

[747]

to 1 approximately at the top to 39.24

[753]

it hardly changed at all did it go up a

[755]

little bit absolutely did it go up by a

[757]

lot

[758]

absolutely not now look at the r square

[760]

so with all the variables our r square

[762]

was 0.7358

[764]

we took out number of bedrooms and yes

[767]

it went down

[768]

but it went to 0.7326 it hardly

[772]

budged at all and that's because number

[774]

of bedrooms

[775]

made very little if any contribution

[779]

to the overall model so even though we

[781]

took out that variable

[782]

we didn't really lose anything and there

[785]

[786]

evidence over here on the right and some

[787]

of these other measures that this is in

[789]

fact

[790]

a better model so mallows c the aicc

[794]

and stuff like that but we'll get to

[795]

that in later videos so

[797]

we took out number of bedrooms but

[798]

didn't really lose anything by doing it

[802]

all right so remember how this model did

[804]

for my own house

[806]

here's our regression equation with our

[807]

three variables in it

[809]

each square foot is worth 73 dollars and

[811]

20 cents

[812]

with the multiply everything by a

[813]

thousand to get the actual dollar value

[815]

so each square foot is worth 73 20.

[819]

being in an exemplary school district

[821]

adds 28

[822]

930 to the value of the home that's

[825]

because

[826]

exemplary high school is an indicator

[827]

variable it's either one or zero

[830]

so if it's a one it adds 28 930 to the

[833]

value of the home

[835]

and then each bathroom adds 35

[839]

470 dollars to the value of the home

[842]

that's just a regular number so if i

[844]

plug in the numbers for my house

[846]

it comes out to a price or a value

[849]

of 174 915 dollars

[853]

and as i mentioned i bought my house for

[856]

172

[858]

three that was a few years ago so for my

[860]

house

[861]

this is a very very good model

[865]

all right that wraps up this video on

[867]

backward elimination

[868]

again very similar to forward selection

[871]

it's just sort of the mirror

[872]

opposite so in forward selection we add

[874]

one variable at a time assuming it meets

[876]

our stopping rule

[877]

and backward elimination we remove one

[880]

variable at a time

[881]

if it fails to get over the hurdle of

[884]

our stopping rule and then we can see

[886]

how the numbers change as variables are

[887]

removed

[888]

and we notice that a lot of them didn't

[890]

change very much by removing that one

[892]

variable which is good we want to have

[895]

the simplest

[896]

model possible that explains the most

[899]

amount of variance that's always what

[903]

we're looking for

[904]

when we're building multiple regression

[906]

models like this

[907]

so thank you very much for watching i

[909]

appreciate you spending some of your

[910]

valuable time learning with me

[912]

and look forward to seeing you again in

[914]

the next video take care

[916]

bye bye

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage