Statistics - How to find outliers - YouTube

Channel: MySecretMathTutor

[0]
In this video we want to identify outliers in a set of data.
[4]
If you are not sure what an outliers is, here is what they are.
[8]
An outlier is an extremely high or extremely low value in the data set.
[13]
Now in addition to just being something extremely high or something low, you want to make sure
[17]
that it satisfies the following criteria.
[19]
If you want to find an outlier it must be greater than Q3 + 1.5(Interquartile Range)
[25]
or it must be lower than Q1 - 1.5(Interquartile Range)
[30]
This is making sure that it really is an extremely high value or extremely low value.
[34]
You can see though that you need to compute a few different things like Q3 and Q1 and
[39]
the Interquartile Range if we are going to properly identify one of these outliers.
[44]
So lets look at some data, and see how this works.
[50]
In my data I have a chart of how many phone calls were received on any given day.
[54]
So I have 10 phone calls on the first day, 12 phone calls on the second day, and so on
[59]
and so forth.
[61]
If I'm going to compute things like Q1 and Q3 and the Interquartile Range, its probably
[65]
a good idea to take all of this data and write it out in order.
[70]
10, 11, 11, 11, 12, 12, 13, 14, 14, 15, 17, 22
[89]
Alright, so you can see that when I list out my data like this 22 does look like a pretty
[98]
high value and 10 looks like a fairly low value.
[102]
To double check that, you know, one of these might be an outlier or maybe both of them,
[106]
lets go ahead and start breaking down our data to find Q1 and Q3.
[113]
So I want to find the half way point of my data, and I have twelve data points, so one,
[117]
two, three, four, five, six.
[122]
Alright, so I need the median of the first half and the median of the second half.
[129]
Let's see, the half way point of the first half lets call this Q1.
[137]
And looks like that is equal to 11.
[140]
Remember you find that by adding 11 plus 11, dividing by 2.
[145]
The median of the second half, this would be 14.5
[150]
Alright, now to find our Interquartile range, we would end
[159]
up subtracting these two values from one another.
[163]
This would give us 3.5.
[165]
Alright, we have all of the information we need, now we can figure out other values so
[173]
we can figure out outliers.
[178]
So to look for an extremely high value it must be larger than Q3, which is 14.5 plus
[187]
1.5 times the interquartile range, 3.5.
[193]
And to find an extremely low value I'd take Q1, 11 and I would minus 1.5 times the interquartile
[204]
range.
[207]
Let's see what these equal.
[214]
19.75 And 5.75
[235]
Alright, so here is how this works, if I have any data points that are larger than 19.75,
[248]
they are an outlier.
[249]
If I have any data points that smaller than 5.75 those are outliers.
[255]
Well looking at all of our data, we can see that the 22 is definitely larger than 19.75,
[261]
so its definitely an outlier.
[264]
Unfortunately I have nothing less than 5.75, so I don't have any lower outliers.
[269]
So this entire set of data only has one outlier and its just the 22, so its definitely an
[274]
extreme value.
[277]
So remember that you have to find a few different bits of information first, but this is how
[281]
you go about finding your outliers.