StatQuest: Logistic Regression - YouTube

Channel: StatQuest with Josh Starmer

[0]
If you can fit a line you can fit a squiggle if you can, make me laugh you can, make me giggle stat quest
[10]
Hello, i'm josh stormer and welcome to stat quest today we're going to talk about logistic regression
[16]
This is a technique that can be used for traditional statistics as, well as machine learning so let's get right to it
[24]
Before we dive into logistic regression let's take. A, step back and review, linear regression in
[31]
Another stat quest, we talked, about linear regression
[35]
We had some data
[37]
Weight and size
[40]
then, we fit a line to it and
[44]
With that line, we could do a lot of things
[48]
First we could calculate r-squared and determine if weight and size are
[53]
correlated large values imply a large effect, and
[58]
Second calculate a p-value to determine if the r-squared value is statistically significant and
[65]
Third, we could use the line, to predict, size given weight if a, new, mouse has this weight
[75]
Then this, is the size that, we predict, from the weight although
[79]
We didn't mention it at the time using data to predict something falls under the category of machine learning
[88]
So plain old linear regression is a form of machine learning
[93]
We also talked a little bit about multiple regression
[98]
Now, we are trying to predict, size, using weight and blood volume
[104]
Alternatively we could, say that, we are trying to model size using weight and blood volume
[111]
Multiple regression, did the same things that normal regression did
[116]
we calculated r-squared and
[119]
we calculated the p-value and
[122]
We could predict, size, using weight and blood volume and
[127]
This, makes multiple regression a slightly fancier machine learning method
[133]
We also talked, about how, we can use discrete measurements like genotype to predict size if you're
[140]
Not familiar with the term genotype don't freak out it's. No, big deal just know that it refers to different types of mice
[149]
lastly, we could compare models
[153]
So on the left side we've got normal regression, using weight to predict size and
[159]
We can, compare those predictions to the ones, we get from multiple regression, where we're using weight and blood volume to predict size
[169]
Comparing the simple model to the complicated one tells us if we need to measure weight and blood volume to accurately predict
[176]
Size or if we can get, away, with just weight
[181]
Now that we remember all the cool, things, we can, do with linear regression
[186]
Let's talk, about logistic regression
[189]
Logistic regression is similar to linear regression
[193]
except
[195]
Logistic regression predicts whether something, is true or false instead of predicting something continuous, like, sighs
[203]
these mice are obese and
[207]
These mice are not
[211]
Also instead of fitting a line to the data logistic regression fits an s-shaped logistic function
[219]
The curve goes from zero to one?
[223]
And that, means that the curve tells you the probability that a mouse is obese based on its weight
[230]
If we weighed a very heavy mouse?
[234]
There is a high probability that the new, mouse is obese?
[238]
If we weighed an intermediate mouse
[242]
Then there is only a 50% chance of the mouse is obese?
[248]
Lastly, there's only a small probability that a light mouse is obese
[254]
Although, logistic regression, tells the probability that a mouse is obese or not it's usually used for classification
[262]
For example if the probability of mouse is obese is greater than 50%
[267]
Then we'll classify it as obese
[270]
Otherwise we'll classify it as not obese
[275]
Just like with linear regression, we can, make simple models in this case, we can have obesity predicted, by weight or?
[284]
more complicated models in this case obesity is predicted by weight and genotype in
[291]
This, case, obesity is predicted. By weight and genotype and age and
[297]
Lastly, obesity is predicted by weight genotype, age and
[302]
Astrological sign in other words just like linear regression logistic
[309]
Regression can work with continuous data, like weight and age and discrete data like genotype and astrological sign
[318]
We can, also test to see if each variable is useful for predicting obesity
[324]
however
[325]
Unlike normal regression, we can't easily compare the complicated model to the simple model and we'll talk more about, why in a bit
[334]
Instead we just test to see if a variables affect on the prediction is significantly different from zero
[342]
If not it, means that the variable is not helping the prediction
[349]
We use, wald's tests to figure this out we'll talk, about that in another stat quest in
[356]
This, case, the astrological sign is totes useless
[361]
That statistical jargon for not helping
[365]
That, means we can, save time and space in our study. By leaving it out
[371]
Logistic regressions ability to provide probabilities and classify, new samples using continuous and discrete measurements
[378]
Makes it a popular machine learning method
[382]
One big difference between linear regression and logistic regression is how the line is fit to the data
[390]
With linear regression, we fit the line, using least squares
[396]
In other words, we find the line that minimizes the sum of the squares of these residuals
[403]
We also use the residuals to calculate r. Squared and to compare simple models to complicated models
[411]
Logistic regression doesn't have the same concept of a residual so it can't use least squares and it can't calculate r squared
[421]
instead it uses something called maximum likelihood
[425]
There's a whole stack quest on maximum likelihood so see that for details but in a nutshell
[432]
You, pick a probability scaled. By weight of observing an obese mouse just like this curve and
[440]
You, use that to calculate the likelihood of observing a, non obese mouse that weighs this much and
[447]
then you calculate the likelihood of observing, this mouse and
[452]
you, do that for all of the mice and
[456]
Lastly, you multiply all of those likelihoods together that's the likelihood of the data given this line
[465]
then you shift the line and calculate a new, likelihood of the data and
[470]
then ship the line and calculate the likelihood, again, and
[476]
again
[478]
Finally the curve with the maximum value for the likelihood is selected bam
[486]
in summary logistic regression can be used to classify samples and
[492]
it can, use different types of data like, size and/or genotype to do that classification and
[500]
it can, also be used to assess what variables are useful for classifying samples ie
[507]
Astrological sign is totes useless
[512]
Hooray, we've made it to the end of another exciting stat quest do you, like this StackQuest, and want to see more please subscribe
[519]
if you, have suggestions for future stat quests, well put them in the comments below, until next time quest on