Machine Learning Fundamentals: Bias and Variance - YouTube

Channel: StatQuest with Josh Starmer

[0]
Hurricane Florence came by while I was
[4]
working on stat quest dark clouds filled
[9]
the sky but that didn't stop stat quest
[13]
stand quest hello I'm Josh stormer and
[20]
welcome to stat quest today we're going
[22]
to be talking about some machine
[23]
learning fundamentals bias and variance
[25]
and they're gonna be clearly explained
[29]
imagine we measured the weight and
[31]
height of a bunch of mice and plotted
[33]
the data on a graph light mice tend to
[37]
be short and heavier mice tend to be
[41]
taller but after a certain weight mice
[44]
don't get any taller just more obese
[47]
given this data we would like to predict
[50]
Mouse height given its weight for
[53]
example if you told me your mouse
[55]
weighed this much then we might predict
[58]
that the mouse is this tall ideally we
[63]
would know the exact mathematical
[64]
formula that describes the relationship
[66]
between weight and height but in this
[70]
case we don't know the formula so we're
[72]
going to use two machine learning
[74]
methods to approximate this relationship
[77]
however I'll leave the true relationship
[80]
curve in the figure for reference the
[83]
first thing we do is split the data into
[86]
two sets one for training the machine
[88]
learning algorithms and one for testing
[90]
them the blue dots are the training set
[93]
and the green dots are the testing set
[97]
here's just the training set the first
[101]
machine learning algorithm that we will
[103]
use is linear regression
[105]
aka least squares linear regression it's
[110]
a straight line to the training set note
[113]
the straight line doesn't have the
[115]
flexibility to accurately replicate the
[118]
arc in the true relationship no matter
[121]
how we try to fit the line it will never
[124]
curve
[126]
thus the straight line will never
[129]
capture the true relationship between
[131]
weight and height no matter how well we
[133]
fit it to the training set the inability
[136]
for a machine learning method like
[139]
linear regression to capture the true
[141]
relationship is called bias because the
[145]
straight line can't be curved like the
[147]
true relationship it has a relatively
[150]
large amount of bias another machine
[154]
learning method might fit a squiggly
[156]
line to the training set the squiggly
[159]
line is super flexible and hugs the
[161]
training set along the arc of the true
[163]
relationship because the squiggly line
[166]
can handle the arc in the true
[168]
relationship between weight and height
[170]
it has very little bias we can compare
[174]
how well the straight line and the
[176]
squiggly line fit the training set by
[178]
calculating their sums of squares in
[181]
other words we measure the distances
[184]
from the fit lines to the data square
[186]
them and add them up just they are
[190]
squared so that negative distances do
[192]
not cancel out positive distances notice
[197]
how the squiggly line fits the data so
[199]
well that the distances between the line
[201]
and the data are all 0 in the contest to
[206]
see whether the straight line fits the
[208]
training set better than the squiggly
[210]
line the squiggly line wins but remember
[215]
so far we've only calculated the sums of
[218]
squares for the training set we also
[221]
have a testing set now let's calculate
[224]
the sums of squares for the testing set
[226]
in the contest to see whether the
[230]
straight line fits the testing set
[232]
better than the squiggly line the
[234]
straight line wins
[237]
even though the squiggly line did a
[239]
great job fitting the training set it
[242]
did a terrible job fitting the testing
[245]
set in machine learning lingo the
[248]
difference in fits between data sets is
[250]
called variance the squiggly line has
[254]
low bias since it is flexible and can
[257]
adapt to the curve in the relationship
[259]
between weight and height but the
[262]
squiggly line has high variability
[264]
because it results in vastly different
[266]
sums of squares for different data sets
[268]
in other words it's hard to predict how
[272]
well the squiggly line will perform with
[274]
future data sets it might do well
[276]
sometimes and other times it might do
[279]
terribly in contrast the straight line
[283]
has relatively high bias since it cannot
[286]
capture the curve in the relationship
[288]
between weight and height but the
[291]
straight line has relatively low
[293]
variance because the sums of squares are
[295]
very similar for different data sets in
[298]
other words the straight line might only
[301]
give good predictions and not great
[303]
predictions but they will be
[305]
consistently good predictions BAM Oh No
[311]
terminology alert because the squiggly
[314]
line fits the training set really well
[316]
but not the testing set we say that the
[319]
squiggly line is over fit in machine
[323]
learning
[323]
the ideal algorithm has low bias and can
[326]
accurately model the true relationship
[329]
and it has low variability by producing
[333]
consistent predictions across different
[335]
data sets this is done by finding the
[338]
sweet spot between a simple model and a
[342]
complex model oh no another terminology
[346]
alert 3 commonly used methods for
[349]
finding the sweet spot between simple
[351]
and complicated models our
[353]
regularization boosting and bagging the
[358]
stat quest on a random forest show an
[361]
example of bagging in action and we'll
[364]
talk about regularization and boosting
[366]
in future stat quests double bam
[371]
hooray we've made it to the end of
[373]
another exciting stat quest if you liked
[376]
this stack quest and want to see more
[377]
please subscribe and if you want to
[380]
support stack quest well please consider
[382]
buying one or two of my original songs
[384]
alright until next time quest arm