馃攳
Statistics 101: Descriptive Statistics, Mean, Median, and Mode - YouTube
Channel: Brandon Foltz
[0]
hello my name is Brandon and welcome to
[2]
the next video in my series on basic
[3]
statistics if you are new to the channel
[5]
it is great to have you welcome if your
[7]
returning viewer it is great to have you
[9]
back if you like the video please
[11]
subscribe give it a thumbs up and share
[13]
it with classmates colleagues or friends
[15]
or anyone else you think might benefit
[18]
from watching so now that we are
[19]
introduced let's go ahead and get
[21]
started
[23]
so this video is about mean median and
[26]
mode now you may have heard of these
[28]
before and on their own mean median and
[31]
mode are not difficult topics however
[34]
what I want to do in this video is three
[36]
fold one I want to visualize them for
[39]
you wherever possible to I want you to
[42]
understand the relationship between the
[44]
three and three I want you to be able to
[46]
develop judgment as to which is
[48]
appropriate given the data you have so
[51]
let's go ahead and get to work now its
[56]
mean median and mode what we are
[58]
beginning to do is measure the center of
[60]
our data now the center of a data set is
[63]
absolutely foundational to everything
[66]
else that you're going to do in
[67]
statistics we use it in hypothesis
[69]
testing we use it in regression and many
[73]
many other things besides that let's go
[76]
ahead and get these fundamental building
[77]
blocks out of the way first a mean
[80]
technically it's called the arithmetic
[82]
mean because there are other types of
[84]
means we can calculate but this is the
[86]
simple one that we are used to and it's
[88]
just the average of all observations in
[90]
the data you've probably calculated the
[93]
average of numbers before it's often
[95]
taught in grade school so it's not
[97]
anything that's all that unfamiliar to
[99]
you probably the next is the medium and
[102]
for students just starting in stats this
[105]
can be a new concept but basically the
[107]
median is the middle observation of a
[110]
data set so what we do is we sort them
[113]
from smallest to largest and if there
[115]
are an odd number of observations it's
[118]
the one literally in the middle after
[120]
sorting if the number of observations is
[122]
an even number the median is the mean or
[126]
the average of the two numbers in the
[128]
middle and we'll see that in a couple
[130]
minutes
[131]
the mode is simply the observation that
[133]
occurs most often in the data or the
[136]
most frequently occurring observation
[138]
now a dataset can have one mode it can
[141]
have multiple modes or could have no
[143]
mode at all it simply depends on the
[146]
data you have so first a mean so here
[151]
are some salary data I created so we
[153]
have 12 observations and 12 salaries so
[157]
to calculate the mean it's very simple
[159]
we add or sum up all the observations
[162]
and then divide by the number of
[163]
observations in the data so in this case
[165]
we would add up all the salaries and
[166]
then we would divide by 12 and the
[169]
notation we denote that by what's called
[171]
x-bar so it's an X with a bar over the
[174]
top so step one we sum our observations
[181]
when we sum all of our salary data we
[183]
have 1 million two hundred ninety one
[185]
thousand four hundred dollars step two
[188]
we count our observations now notice I
[191]
have the word length in parentheses
[193]
there the reason I have length is
[195]
because length is often how we describe
[198]
what data set when we're doing
[199]
programming applications so in
[202]
programming when we have a data set like
[205]
this it's basically what's called an
[206]
array so the salary data here is an
[210]
array with a length of 12 so as students
[213]
get more into coding environments I like
[215]
to describe things in a ways that you're
[217]
going to see in code but if you're not
[219]
into that or not going into that it's
[221]
simply the count of the number of
[223]
observations so in step three we just
[226]
divide the sum by the count or the
[228]
length of our data set so we have 1
[229]
million two hundred ninety one thousand
[231]
four hundred dollars divided by our
[233]
twelve observations and we end up with a
[235]
sample mean of one hundred and seven
[237]
thousand six hundred and $16.67
[242]
so that is the mean or the average
[244]
salary of our twelve observations so
[250]
next we have the median so here is our
[252]
data same numbers but I put it in a
[254]
horizontal format and you'll see why
[256]
here in a second so observation one is
[258]
sixty five thousand six hundred and so
[260]
on and so forth so to find the median
[262]
first we sort our data
[265]
from smallest to largest so we can see
[268]
here that we have a small salary of
[270]
twenty nine thousand five hundred
[271]
dollars all the way up to the maximum
[274]
salary of five hundred thousand dollars
[277]
so they are all in order from smallest
[279]
to largest next we ask ourself is our
[284]
data odd or even length or an odd or
[287]
even count of observations if it is even
[290]
as it is in this case we then just
[292]
divide it in half so you can see here I
[294]
have the first six observations shaded
[297]
in a grey and then I have the last six
[299]
observations shaded in sort of a light
[301]
brown and because this data is an even
[303]
number of count or an even length we
[306]
just divide it in half six on one side
[308]
six on the other so here's where we left
[313]
off now well since it's even we find the
[316]
mean of the two middle values so we can
[318]
see here that observation six and
[320]
observation seven those are the two
[322]
middle values on each side of that
[324]
dividing line so all we do is find the
[327]
average or the mean of those two values
[329]
so we have seventy three thousand six
[331]
hundred plus seventy eight thousand
[333]
eight hundred and then we just divide
[335]
that by two so our median in this case
[338]
is seventy-six thousand two hundred
[339]
dollars so what if our data set has an
[344]
odd number of observations or is an odd
[346]
length well in this case we just simply
[349]
divide our length or our count of
[351]
observations in half and then round up
[353]
to the next number so in this case we
[354]
have eleven observations we divide that
[356]
in half so we get 5.5 and let me just go
[359]
ahead and round up to six and it's the
[361]
sixth value that is our medium so in
[363]
this case it's seventy three thousand
[365]
six hundred if you notice that on either
[368]
side of the sixth value we have five
[370]
below it and five above it and the six
[373]
is situated right in the middle
[376]
now mode is very straightforward it is
[379]
the observation that occurs the most so
[381]
in this case we have two salaries of
[384]
$54,000 and quite simply that is the
[387]
mode now again datasets can have more
[390]
than one mode so if we had two salaries
[393]
that were seventy eight thousand eight
[394]
hundred dollars we would have two modes
[396]
we'd
[397]
of a $54,000 mode in the 70 $8,800 mode
[401]
and some datasets don't have a mode at
[403]
all if all the values in our dataset are
[405]
unique then none of them has more than
[408]
one observation and therefore there is
[410]
no mode and this is actually a warning
[415]
the mean can be influenced by extreme
[418]
observations either low and/or high that
[421]
are different from the rest of the
[423]
observations so here's our data set from
[425]
before notice we have someone in our
[428]
data set that has a salary of $500,000
[432]
now look at the other 11 observations
[434]
they all tend to be around you like
[437]
$75,000 give or take so our sample mean
[441]
was one hundred and seven thousand six
[443]
hundred and sixteen dollars and 67 cents
[446]
however our median was seventy-six
[450]
thousand two hundred dollars that is a
[453]
massive difference a huge difference and
[456]
the question is which measure is more
[459]
accurately representing the center of
[461]
our data set so an analyst could
[465]
accurately represent the center of the
[467]
salary data this mean over here is much
[470]
higher than the median due to the
[472]
presence of someone making five hundred
[475]
thousand dollars when everyone else is
[477]
around seventy-five thousand dollars now
[480]
also a good analyst would double-check
[484]
that value that would double check the
[486]
data to make sure that extreme
[488]
observation of five hundred thousand
[490]
dollars is not a data entry or recording
[493]
error in this case it could be really
[495]
really easy to type in five hundred
[498]
thousand dollars when it's supposed to
[500]
be fifty thousand dollars so if you do
[503]
have an extreme observation and your
[505]
data
[505]
don't just say to yourself oh that's
[507]
just how it is no go back look at your
[509]
data look at the records you have look
[511]
at everything you have at your disposal
[512]
to make sure that data is valid in the
[515]
first place before beginning with your
[517]
analysis so our final concept is called
[523]
the trimmed mean so here's our data set
[525]
again then what we do is we sort them
[528]
smallest to largest like we did for the
[529]
median
[530]
and then we remove the same number of
[533]
observations from each end of our data
[536]
set now sometimes this is expressed as a
[538]
percentage like a five percent trimmed
[540]
mean or a 10 percent trimmed mean so in
[543]
the end it doesn't really matter so long
[546]
as you are removing the same number of
[548]
observations from each end of the data
[550]
set so in this case we will remove the
[553]
smallest of $29,500 and we will remove
[557]
the largest of $500,000 so from there we
[563]
just calculate the mean again so in this
[566]
case we have a sample mean or original
[568]
sample mean of one hundred and seven
[570]
thousand six hundred and $16.67
[573]
our median was 76 thousand two hundred
[575]
dollars and in this case our trimmed
[578]
mean is 76192 median and the trimmed
[585]
mean are they're almost identical so by
[589]
removing the same number of extreme
[591]
values on both ends in this case we get
[594]
a value where the trimmed mean and the
[596]
median are almost identical so what we
[599]
can do is conclude that that's probably
[600]
the best representation of the center of
[603]
our data and our original sample mean is
[606]
heavily influenced by that person who
[609]
had a five hundred thousand dollar
[610]
salary among eleven other individuals
[612]
who had salaries around seventy-five
[615]
thousand dollars now here's a note
[617]
trimmed means are better for a single
[620]
variable what we call univariate so you
[622]
know meaning one variate variable once
[626]
you get other variables involved
[627]
relationships between variables are
[630]
created and things become much more
[632]
complicated now however there are
[634]
methods for trimming or removing
[636]
multivariate extremes but that is beyond
[639]
the scope of this video and I do
[640]
actually talk about that in more
[642]
advanced videos in my playlists so when
[644]
you are doing a univariate analysis what
[646]
I would recommend is reporting all three
[648]
of these values report your original
[651]
sample mean the median and then a
[653]
trimmed mean either 5 percent or 10
[655]
percent or something like that so as
[658]
long as you include the original sample
[659]
mean with the trimmed mean that's
[661]
perfectly fine however never leave out
[664]
the original
[664]
I mean you got to have both so you can
[666]
compare the two and of course have the
[668]
median in there okay so a quick
[672]
conclusion and then we are done so the
[674]
mean median and mode can provide
[675]
information about the center of your
[677]
data the mean is the most often used
[680]
however the median can sometimes be a
[682]
better measure the mean can easily be
[686]
influenced by extreme values so be
[688]
careful extreme values may be a
[691]
recording or data entry error so double
[694]
check the original data look for a large
[697]
difference between the mean and the
[699]
median it could be a warning sign that
[701]
you have an extreme observation that is
[704]
pulling the mean in one direction or
[706]
another and finally a trimmed mean can
[710]
also be calculated and reported which
[712]
chops off the same number or percentage
[715]
of observations on both ends of the data
[718]
and again always report that with the
[721]
original sample mean okay so that wraps
[726]
up this video on mean median and mode
[728]
again on their own very simple concepts
[731]
however we want to visualize them we
[733]
want to understand the relationship
[734]
between the three of them and then
[736]
choose which one is the best depending
[738]
on the data we have and again that's
[740]
very important so thank you very much
[742]
for watching I appreciate your time and
[744]
I will see you again in the next video
[746]
take care
Most Recent Videos:
You can go back to the homepage right here: Homepage





