Examples analyzing clusters, gaps, peaks and outliers for distributions | 6h grade | Khan Academy - YouTube

Channel: Khan Academy

[0]
- [Voiceover] In this video, I wanna do some examples
[1]
looking at distributions, in particular, different features
[4]
in distributions like clusters, gaps, and peaks.
[8]
So over here, I wanna do some examples.
[10]
Which of the following are accurate descriptions
[12]
of the distribution below?
[14]
Select all that apply.
[16]
So the first statement is the distribution has an outlier.
[20]
So an outlier is a data point that's way off
[23]
of where the other data points are,
[24]
it's way larger or way smaller
[27]
than where all of the other data points
[28]
seem to be clustered and if we look over here,
[31]
we have a lot of data points between zero and six.
[34]
And let's just think about what they're measuring:
[35]
this is shelf time for each apple
[38]
at Gorg's Grocier.
[41]
So, for example, we see there's one, two, three, four,
[45]
five, six, seven apples that have a shelf life
[49]
of zero days, so (laughs), they're about to go bad.
[52]
You see you have one, two, three, four, five, six, seven,
[56]
eight apples that are gonna be good for another day.
[59]
You have two apples that are gonna be good
[60]
for another six days, and you have one apple
[63]
that's gonna be good for 10 days, and this is unusual.
[66]
This is an outlier here, it has a way larger shelf life
[70]
than all of the other data, so I would say
[72]
this definitely does have an outlier.
[74]
We just have this one data point
[75]
sitting all the way to the right, way larger,
[78]
way more shelf life than everything else, so it definitely
[80]
has an outlier, and this one would be the outlier.
[83]
The distribution has a cluster from four to six days.
[86]
And we indeed do see a cluster from four to six days.
[89]
A cluster, you can imagine, it's a grouping of data
[93]
that's sitting there, or you have a grouping of apples
[95]
that have a shelf life between four and six days,
[97]
and you definitely do see that cluster there.
[100]
And since I already selected two things,
[102]
I'm definitely not gonna select none of the above.
[104]
Let me check my answer.
[105]
Let me do a few more of these.
[110]
Which of the following are accurate descriptions
[112]
of the distribution below?
[114]
And once again we're going to select all that apply.
[117]
So the distribution has an outlier.
[120]
So let's see this distribution.
[122]
I do have a data point here that is at the high end
[124]
and I have another data point here that's at the low end,
[126]
but I don't have any data points that are sitting
[128]
far above or far below the bulk of the data.
[132]
If I had a data point that was out here, then yeah,
[134]
I would say that was an outlier to the right,
[136]
or a positive outlier, if I had a data point way to the left
[139]
off the screen over here, maybe that would be an outlier,
[141]
but I don't really see any obvious outliers.
[143]
All of the data, it's pretty clustered together.
[147]
So I would not say that the distribution has an outlier.
[150]
The distribution has a peak at 22 degrees.
[154]
Yeah, it does indeed look like we have,
[156]
and let's just look at what we're actually measuring:
[158]
high temperature each day in Edgeton, Iowa in July.
[162]
So it does indeed look like we have the most number
[164]
of days that had a high temperature at 22,
[169]
most number of days in July had a high tempurature
[171]
at 22 degrees Celsius, so that is a peak.
[175]
You can see it, if you imagine this as kind of a mountain
[177]
this is a peak right here, this is a high point.
[179]
You have, at least locally, the most number of days
[183]
at 22 degrees Celsius.
[186]
So I would say it definitely has a peak there.
[188]
Since I selected something, I'm not gonna select
[190]
none of the above.
[192]
Let's do a couple more of these.
[194]
Which of the following are accurate descrptions
[196]
of the distribution below?
[197]
So the first one, the distribution has an outlier.
[200]
So...
[201]
number of guests by day at Seth's Sandwich Shop.
[206]
So, let's see, the lowest...
[209]
They have no days...
[211]
No days where he had between zero and 19 guests,
[216]
no days where he had between 20 and 39 guests,
[218]
looks like there's about nine days
[220]
where he had between 40 and 59 guests,
[222]
looks like 20 days where he had between 60 and 79 guests,
[225]
all the way where it looks like maybe 8 days
[228]
that he had between 180 and 199 guests.
[231]
But the question of outliers, there doesn't seem to be
[234]
any day where he had an unusual number of guests.
[237]
There's not a day that's way out here,
[239]
where he had, like, 500 guests.
[242]
So I would say this distribution does not have an outlier.
[245]
The distribution has a cluster from zero to 39 guests.
[249]
So zero to 39 guests is right over here, zero to 39 guests.
[253]
And there is no days where he had between zero and 39 guests
[256]
neither zero to 19, or 20 to 39.
[259]
So there's definitely not a cluster there.
[261]
I would say that the cluster would be between days
[263]
that had between 40 and 199 guests.
[267]
Definitely not zero and 39, there was no days
[270]
that were between zero and 39 guests.
[272]
So I would say none of the above very confidently.
[275]
Let's do one more of these.
[277]
Which of the following are accurate descriptions
[279]
of the distribution below?
[281]
(laughs) Alright.
[282]
The distribution has a peak from 12 to 13 points.
[285]
Let me see what this is measuring, what this data is about.
[290]
Test scores by student in Mrs. Frine's class.
[294]
So you had one student who got between a zero and a one
[298]
on a 20-point scale, so got between,
[301]
I guess out of 20 questions, got between zero and one point.
[304]
And then you see that there's no students got
[307]
between two and three, or four and five, or six and seven.
[309]
Then we have another student who got between eight and nine,
[312]
looks like three students got between 10 and 11,
[314]
and then we keep increasing, this looks like about
[316]
12 students got either a 16 or a 17,
[320]
or something in between maybe,
[321]
if you could get decimal points on that test.
[325]
And then it looks like 10 students got from 18 to 19.
[329]
Alright, so this says the distribution has a peak
[331]
from 12 to 13 points, 12 to 13 points,
[335]
there were five students, but this isn't a peak.
[337]
If you just go to 14 to 15 points, you have more students.
[340]
So this is definitely not a peak.
[341]
If you were looking at this as a mountain of some kind,
[343]
you definitely wouldn't describe this point as a peak.
[346]
You would say this distribution has a peak,
[348]
it has the most number of students
[349]
who got between 16 and 17 points,
[351]
so that's the peak right there, not 12 to 13 points.
[354]
So I would not select that first choice.
[357]
The distribution has an outlier.
[359]
Well, yeah, look at this: you have this outlier.
[361]
Most of the students scored between eight and 19 points,
[365]
and then you have this one student
[367]
who got between zero and one, it's really an outlier.
[369]
You even see this when you look at it visually,
[371]
it's not even connected to the rest of the distribution.
[373]
It's way to the left.
[375]
If something is way to the left or way to the right,
[377]
that's an outlier if it's unusually low or unusually high.
[381]
So I would say this distribution definitely does
[383]
have an outlier, and I'm not gonna pick none of the above
[386]
since I found a choice.
[389]
And I think we're all done.