🔍

Machine Learning Fundamentals: Bias and Variance - YouTube

Channel: StatQuest with Josh Starmer

[0]

Hurricane Florence came by while I was

[4]

working on stat quest dark clouds filled

[9]

the sky but that didn't stop stat quest

[13]

stand quest hello I'm Josh stormer and

[20]

welcome to stat quest today we're going

[22]

to be talking about some machine

[23]

learning fundamentals bias and variance

[25]

and they're gonna be clearly explained

[29]

imagine we measured the weight and

[31]

height of a bunch of mice and plotted

[33]

the data on a graph light mice tend to

[37]

be short and heavier mice tend to be

[41]

taller but after a certain weight mice

[44]

don't get any taller just more obese

[47]

given this data we would like to predict

[50]

Mouse height given its weight for

[53]

example if you told me your mouse

[55]

weighed this much then we might predict

[58]

that the mouse is this tall ideally we

[63]

would know the exact mathematical

[64]

formula that describes the relationship

[66]

between weight and height but in this

[70]

case we don't know the formula so we're

[72]

going to use two machine learning

[74]

methods to approximate this relationship

[77]

however I'll leave the true relationship

[80]

curve in the figure for reference the

[83]

first thing we do is split the data into

[86]

two sets one for training the machine

[88]

learning algorithms and one for testing

[90]

them the blue dots are the training set

[93]

and the green dots are the testing set

[97]

here's just the training set the first

[101]

machine learning algorithm that we will

[103]

use is linear regression

[105]

aka least squares linear regression it's

[110]

a straight line to the training set note

[113]

the straight line doesn't have the

[115]

flexibility to accurately replicate the

[118]

arc in the true relationship no matter

[121]

how we try to fit the line it will never

[124]

curve

[126]

thus the straight line will never

[129]

capture the true relationship between

[131]

weight and height no matter how well we

[133]

fit it to the training set the inability

[136]

for a machine learning method like

[139]

linear regression to capture the true

[141]

relationship is called bias because the

[145]

straight line can't be curved like the

[147]

true relationship it has a relatively

[150]

large amount of bias another machine

[154]

learning method might fit a squiggly

[156]

line to the training set the squiggly

[159]

line is super flexible and hugs the

[161]

training set along the arc of the true

[163]

relationship because the squiggly line

[166]

can handle the arc in the true

[168]

relationship between weight and height

[170]

it has very little bias we can compare

[174]

how well the straight line and the

[176]

squiggly line fit the training set by

[178]

calculating their sums of squares in

[181]

other words we measure the distances

[184]

from the fit lines to the data square

[186]

them and add them up just they are

[190]

squared so that negative distances do

[192]

not cancel out positive distances notice

[197]

how the squiggly line fits the data so

[199]

well that the distances between the line

[201]

and the data are all 0 in the contest to

[206]

see whether the straight line fits the

[208]

training set better than the squiggly

[210]

line the squiggly line wins but remember

[215]

so far we've only calculated the sums of

[218]

squares for the training set we also

[221]

have a testing set now let's calculate

[224]

the sums of squares for the testing set

[226]

in the contest to see whether the

[230]

straight line fits the testing set

[232]

better than the squiggly line the

[234]

straight line wins

[237]

even though the squiggly line did a

[239]

great job fitting the training set it

[242]

did a terrible job fitting the testing

[245]

set in machine learning lingo the

[248]

difference in fits between data sets is

[250]

called variance the squiggly line has

[254]

low bias since it is flexible and can

[257]

adapt to the curve in the relationship

[259]

between weight and height but the

[262]

squiggly line has high variability

[264]

because it results in vastly different

[266]

sums of squares for different data sets

[268]

in other words it's hard to predict how

[272]

well the squiggly line will perform with

[274]

future data sets it might do well

[276]

sometimes and other times it might do

[279]

terribly in contrast the straight line

[283]

has relatively high bias since it cannot

[286]

capture the curve in the relationship

[288]

between weight and height but the

[291]

straight line has relatively low

[293]

variance because the sums of squares are

[295]

very similar for different data sets in

[298]

other words the straight line might only

[301]

give good predictions and not great

[303]

predictions but they will be

[305]

consistently good predictions BAM Oh No

[311]

terminology alert because the squiggly

[314]

line fits the training set really well

[316]

but not the testing set we say that the

[319]

squiggly line is over fit in machine

[323]

learning

[323]

the ideal algorithm has low bias and can

[326]

accurately model the true relationship

[329]

and it has low variability by producing

[333]

consistent predictions across different

[335]

data sets this is done by finding the

[338]

sweet spot between a simple model and a

[342]

complex model oh no another terminology

[346]

alert 3 commonly used methods for

[349]

finding the sweet spot between simple

[351]

and complicated models our

[353]

regularization boosting and bagging the

[358]

stat quest on a random forest show an

[361]

example of bagging in action and we'll

[364]

talk about regularization and boosting

[366]

in future stat quests double bam

[371]

hooray we've made it to the end of

[373]

another exciting stat quest if you liked

[376]

this stack quest and want to see more

[377]

please subscribe and if you want to

[380]

support stack quest well please consider

[382]

buying one or two of my original songs

[384]

alright until next time quest arm

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage