F1 Score in Machine Learning - YouTube

Channel: Bhavesh Bhatt

[0]
Hi guys
[1]
We have already seen accuracy, precision and recall as measures of a machine learning
[6]
model performance.
[8]
In this video, I'll explain what F1 score is why is it required when you already have
[12]
precision and recall which do a fairly good job in evaluating a machine learning model?
[20]
There is a famous precision recall trade off and I'll explain it using a simple example.
[25]
Say, suppose we have a pond of fish and we know the total number of fishes within.
[29]
Our goal
[30]
is to build a model that catches red fishes.
[34]
In the first model that I make.
[38]
It consist of two fishing nets which have baits containing food which my red fishes like.
[44]
what do we observe?
[45]
we observe that we end up capturing two red fishes, which is
[49]
what we desired.
[52]
Precision is out of all the samples at the classifier classified as positive
[57]
what portion of it where actually correct the following test that we have performed
[61]
shows great precision that means out of the two red fishes that we have got both of them
[68]
are red but if you look carefully there are many more red fishes in my pond so we missed
[73]
out on five red fishes which should have been caught by my classifier.
[77]
So, my precision was
[78]
really high here, but my recall was less.
[81]
So what I do I make a model that increases recall now.
[87]
In order to do that, I make a bigger
[90]
fishnet.
[91]
I change the bait which is now like by all the fishes and I capture a lot of fishes now
[97]
using the new bait and just for a recap recall is out of all the positive samples what portion
[103]
did my classifier pickup.
[105]
Now, if you see in this diagram carefully say, for example, approximately
[109]
there are eight fishes out of eight.
[111]
I'm able to capture six red fishes, which is doing a good job.
[117]
Now the problem with this model now is I am capturing a lot of blue fishes in order to
[123]
incorporate large amount of red fishes.So my precision, although is high but my recall
[129]
is very low.
[130]
Sorry my yes.
[133]
My recall is high but my precision is very low.
[138]
So this is the fundamental trade off between precision and recall in our model with high
[142]
precision, most of most are all of the fishes were caught were red it had a low recall,
[150]
but if you design a model with high recall, that was we got most of the red fishes we
[155]
had a low precision we also caught a lot of blue fishes.
[161]
Can we combine precision and recall
[163]
to have an overall score which depicts how well our model is performing?
[168]
In comes my F1 score.
[171]
So F1 score is basically the harmonic mean of my precision and recall.
[175]
Now, you might wonder why are we taking harmonic mean and why not arithmetic mean or geometric mean?
[182]
Consider
[183]
a simple example of thousand samples and a confusion matrix, that looks like this.
[188]
I have 85 True positives, 890 False positive 15 false negatives and 10 true negatives.
[197]
My accuracy is around 1% my precision based on the formula here, which is TP upon TP plus
[204]
FP.
[205]
Which is 85 upon 85 plus 890 is 1% again but my recall if you notice carefully is 85 upon
[214]
85 plus 15 which is 85% so if now if I use arithmetic mean my F1 score would turn out
[223]
to be 43% but if you look at the confusion matrix, you can easily say that my model is
[228]
not performing well and 43% for such a model is not a realistic score
[233]
when we now look
[234]
at harmonic mean.It says 16% which looks more of a realistic score for this given model
[241]
rather than 43%
[244]
what F1 score does by taking into consideration my harmonic mean
[249]
is it punishes extreme values and our desire of a machine learning model for an F1 score
[256]
is the F1 score should be really high
[260]
In order to have a high F1 score, you need to
[263]
have both a high precision as well as a recall value.
[268]
Thank you so much for watching the video
[270]
hope it was information.