🔍

Statistics 101: Logistic Regression, An Introduction - YouTube

Channel: Brandon Foltz

[0]

you

[4]

hello and welcome Brandon here thanks

[7]

for choosing my video if you liked the

[9]

video please give it a thumbs up if you

[12]

think someone you know can also benefit

[13]

by watching please share and as always

[16]

please subscribe I appreciate it very

[18]

much

[19]

so let's go ahead and get started so

[22]

here we are in logistic regression a

[25]

very useful if in my opinion

[27]

underutilized statistical procedure that

[30]

is not all that intuitive which is maybe

[32]

why it's underutilized now as with many

[34]

of my other videos we're going to start

[36]

out with an actual problem now this

[38]

problem is one that I made up so I made

[40]

up the text and the data for it so just

[42]

keep that in mind going forward however

[44]

I do think it has the side benefit of

[47]

being potentially useful in your

[49]

everyday life so let's go ahead and take

[51]

a look at it so we'll call it first time

[54]

homebuyer so as a first time homebuyer

[56]

you are busy organizing your financial

[59]

records so you can apply for a home

[60]

mortgage as part of this process you

[63]

order a copy of your credit report to

[65]

check for errors and gauge your credit

[67]

score which can range at least here in

[69]

the US from 300 to 850 now lenders will

[74]

factor in your credit score when

[75]

deciding to approve or not approve you

[78]

for a mortgage they will factor in other

[80]

things like your income how long you've

[83]

been at your job and other things but

[84]

your credit score is definitely an

[86]

important part of their decision now it

[89]

turns out your credit score is 720 on

[92]

that scale of 300 to 850 now while doing

[96]

your research which you are dutifully

[98]

doing as a potential home buyer you find

[101]

some raw data online so there's data

[104]

floating around the web everywhere so

[106]

you are lucky enough to come across this

[108]

data set that has 1000 applicant credit

[111]

scores and whether or not the

[113]

application was approved so yes or no

[116]

for the home mortgage now using the data

[121]

you found you would like to do the

[124]

following number one develop a model

[127]

that will provide the probability and

[129]

the odds of being approved for any given

[133]

credit score and again we will do all of

[135]

these as we progress throughout the

[136]

video series

[138]

number two discover approximately what

[141]

credit score is associated with our

[143]

probability of 50% so the odds are even

[148]

for being approved so if I walk into the

[150]

bank with a certain credit score it's

[152]

basically like flipping a coin my

[155]

probability is 50% of being approved

[157]

which is the same way of saying the odds

[159]

or even we want to know what credit

[161]

score that is on our scale number three

[164]

input your score of 720 into the model

[168]

to determine the probability and the

[171]

odds of you being approved for a

[173]

mortgage which of course is very

[175]

important to you and finally determine

[179]

how improving your credit score from 720

[183]

the 750 would affect your probability

[186]

and odds for being approved for the

[190]

mortgage so let's say your score is 720

[192]

you find that out so you're going to

[194]

wait a little bit and see if you can get

[195]

your credit score a bit higher up to 750

[198]

by paying down some debt maybe you know

[200]

that you're going to get a promotion and

[202]

a higher salary sometime soon or

[204]

something like that and you think that

[205]

your score may improve and you want to

[208]

know how that improvement in your score

[210]

would affect your probability and odds

[211]

of being approved with the mortgage so

[216]

here is just a little chunk of that

[217]

1,000 observation data set so there are

[220]

only 15 here but it's wanted you to see

[222]

how it's organized so we have the credit

[224]

score on the left and approved on the

[227]

right so again the N is a thousand the

[230]

credit score is the applicant's credit

[232]

score from 300 all the way up to 850 now

[236]

approved is coded as a 1/4 approved and

[239]

0 4 not approved so it is binary

[243]

it is a dichotomous variable and it is

[246]

mutually exclusive so you're either

[248]

approved or you're not approved there's

[252]

no in-between now as a good analyst and

[256]

a good stat student or whatever it is

[258]

you might be you create a scatterplot of

[260]

your 1,000 observations but it looks

[263]

like this now what is this so if you

[267]

look on the left hand side we have

[268]

approve and we have 0 at the bottom

[271]

so that means the application was not

[272]

approved at the top we have a one so

[275]

that means the application was approved

[277]

but we have the data points in two lines

[280]

to the credit score on the bottom so

[283]

FICO score is just a certain type of

[285]

credit score that's widely used so if

[287]

the dot is on the bottom that means for

[290]

whatever credit score that was the

[292]

application was not approved if it's at

[294]

the top it means it was approved now how

[298]

can we put a best-fit regression line on

[300]

a scatterplot that looks like this it

[303]

doesn't make any sense to do it how we

[306]

will usually do it in normal linear

[308]

regression so obviously we're going to

[310]

come up with some other technique and

[312]

that's what logistic regression allows

[314]

us to do now that we have set the stage

[319]

with the problem we're going to look at

[320]

what is logistic regression

[324]

now logistic regression seeks to do the

[326]

following among other things it seeks to

[329]

model the probability of an event

[331]

occurring depending on the values of the

[335]

independent variables in this case

[337]

credit score which can be categorical or

[339]

numerical so model the probability of an

[344]

event occurring depending on the other

[347]

independent variables it seeks to

[350]

estimate the probability that an event

[353]

occurs for a randomly selected

[356]

observation versus the probability that

[359]

the event does not occur so for a random

[363]

observation in the data or some other

[365]

observation that we would want to

[367]

predict we want to estimate the

[368]

probability that the event occurs versus

[372]

that it does not occur it seeks to

[375]

predict the effect of a series of

[377]

variables on a binary response variable

[380]

so in this case we only have one

[381]

independent variable credit score but we

[384]

can have more so logistic regression can

[386]

work a lot like multiple regression with

[388]

several independent variables and the

[390]

one dependent variable that is binary to

[393]

0 or 1 you can also stick the classify

[396]

observations by estimating the

[398]

probability that an observation is in a

[400]

particular category in this case the

[404]

applicant is either in the approved

[406]

category or they're in the not approved

[409]

category so we can classify observations

[412]

so model estimate predict and classify

[420]

so let's try to understand and visualize

[422]

the problem we're working with so in

[424]

this case we have a bunch of credit

[426]

scores so an applicant walks into the

[428]

bank and they have some sort of credit

[430]

score now the bank or other lending

[432]

institution feeds that into their

[434]

lending model the credit score goes into

[437]

the model and then when it comes out

[439]

it's either approved or it's not

[443]

approved so this black box in the middle

[447]

is what we're trying to understand so we

[450]

could ask what is the probability that

[452]

an application having a credit score of

[454]

670 so it would end up in the approved

[459]

category up here on the top so credit

[462]

scores get put into some model a

[464]

decision model by the bank or other

[466]

lender and then the bank or the lender

[468]

puts that application into the approved

[471]

or not approved categories that's

[473]

basically what we're trying to model in

[475]

this logistic regression problem now I

[479]

am kind of making the assumption that if

[481]

you're studying logistic regression you

[483]

have to some extent studied simple

[486]

linear regression and multiple

[488]

regression now if you studied those you

[490]

might have a very good question

[492]

why can I use one of those for this type

[495]

of problem well here's why number one

[498]

simple linear regression is one

[501]

quantitative variable predicting another

[503]

quantitative variable now in this case

[506]

we have a dichotomous dependent variable

[509]

so approve or not approve is 1 or 0 it's

[512]

not a quantitative variable now multiple

[515]

regression is just simple regression

[518]

with more independent variables so those

[521]

are basically the same type of problem

[523]

now we have nonlinear regression that's

[526]

still two quantitative variables but the

[529]

data is curvilinear now if we ignored

[532]

those morning's running a typical linear

[534]

regression in the same way on this type

[537]

of data

[538]

has some major problems now binary data

[541]

in this case approve are not approved

[543]

does not have a normal distribution and

[546]

you can see that by looking at the

[548]

scatter plot which is a condition needed

[551]

for most other types of regression the

[554]

predicted values of the dependent

[556]

variable can be beyond 0 & 1 in those

[560]

other types of regression so remember in

[562]

logistic regression we're dealing with

[564]

probabilities and the rule of

[566]

probability is that it has to be between

[568]

0 & 1 if we use the other types of

[570]

regression these values can be beyond 0

[574]

& 1 which obviously is not going to work

[576]

and probabilities are often not linear

[581]

such as you shapes where the probability

[584]

is very low or very high at the extremes

[588]

of the X values so you can probably

[590]

think of different examples so one

[593]

example could be the probability of

[594]

contracting the flu so the probability

[597]

of getting the flu is higher if you're

[600]

younger so a baby or an infant or

[602]

toddler and if you're older so say in

[606]

your 60s 70s and 80s so the probability

[609]

is higher in the extremes than it is in

[611]

the middle so probabilities often have

[614]

different shapes in their distribution

[616]

along the X variables so now that we

[621]

have set the stage by introducing our

[622]

problem and going over the basic

[624]

conceptual foundation of what logistic

[626]

regression is let's talk about where

[628]

we're going in the next video so in the

[631]

next video we will do the following we

[633]

will review basic probability so we will

[636]

go into much depth well let's go over

[637]

the basics because I was the

[639]

understanding probability is central to

[642]

learning about logistic regression we

[645]

will learn about what odds are and what

[648]

the odds ratio is because again that's

[650]

central to understanding logistic

[652]

regression we will briefly discuss how

[655]

to interpret the odds ratio in logistic

[659]

regression context and finally we will

[662]

note things we have to keep in mind when

[665]

interpreting the odds ratio so the odds

[668]

ratio is related to probability of

[670]

course

[671]

there are some dangers in how we

[673]

interpret it and we'll definitely

[674]

discuss that in the next video so let's

[676]

go ahead and wrap up this video and I

[678]

will see you in the next one

[680]

you

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage