🔍

Statistics 101: Descriptive Statistics, Mean, Median, and Mode - YouTube

Channel: Brandon Foltz

[0]

hello my name is Brandon and welcome to

[2]

the next video in my series on basic

[3]

statistics if you are new to the channel

[5]

it is great to have you welcome if your

[7]

returning viewer it is great to have you

[9]

back if you like the video please

[11]

subscribe give it a thumbs up and share

[13]

it with classmates colleagues or friends

[15]

or anyone else you think might benefit

[18]

from watching so now that we are

[19]

introduced let's go ahead and get

[21]

started

[23]

so this video is about mean median and

[26]

mode now you may have heard of these

[28]

before and on their own mean median and

[31]

mode are not difficult topics however

[34]

what I want to do in this video is three

[36]

fold one I want to visualize them for

[39]

you wherever possible to I want you to

[42]

understand the relationship between the

[44]

three and three I want you to be able to

[46]

develop judgment as to which is

[48]

appropriate given the data you have so

[51]

let's go ahead and get to work now its

[56]

mean median and mode what we are

[58]

beginning to do is measure the center of

[60]

our data now the center of a data set is

[63]

absolutely foundational to everything

[66]

else that you're going to do in

[67]

statistics we use it in hypothesis

[69]

testing we use it in regression and many

[73]

many other things besides that let's go

[76]

ahead and get these fundamental building

[77]

blocks out of the way first a mean

[80]

technically it's called the arithmetic

[82]

mean because there are other types of

[84]

means we can calculate but this is the

[86]

simple one that we are used to and it's

[88]

just the average of all observations in

[90]

the data you've probably calculated the

[93]

average of numbers before it's often

[95]

taught in grade school so it's not

[97]

anything that's all that unfamiliar to

[99]

you probably the next is the medium and

[102]

for students just starting in stats this

[105]

can be a new concept but basically the

[107]

median is the middle observation of a

[110]

data set so what we do is we sort them

[113]

from smallest to largest and if there

[115]

are an odd number of observations it's

[118]

the one literally in the middle after

[120]

sorting if the number of observations is

[122]

an even number the median is the mean or

[126]

the average of the two numbers in the

[128]

middle and we'll see that in a couple

[130]

minutes

[131]

the mode is simply the observation that

[133]

occurs most often in the data or the

[136]

most frequently occurring observation

[138]

now a dataset can have one mode it can

[141]

have multiple modes or could have no

[143]

mode at all it simply depends on the

[146]

data you have so first a mean so here

[151]

are some salary data I created so we

[153]

have 12 observations and 12 salaries so

[157]

to calculate the mean it's very simple

[159]

we add or sum up all the observations

[162]

and then divide by the number of

[163]

observations in the data so in this case

[165]

we would add up all the salaries and

[166]

then we would divide by 12 and the

[169]

notation we denote that by what's called

[171]

x-bar so it's an X with a bar over the

[174]

top so step one we sum our observations

[181]

when we sum all of our salary data we

[183]

have 1 million two hundred ninety one

[185]

thousand four hundred dollars step two

[188]

we count our observations now notice I

[191]

have the word length in parentheses

[193]

there the reason I have length is

[195]

because length is often how we describe

[198]

what data set when we're doing

[199]

programming applications so in

[202]

programming when we have a data set like

[205]

this it's basically what's called an

[206]

array so the salary data here is an

[210]

array with a length of 12 so as students

[213]

get more into coding environments I like

[215]

to describe things in a ways that you're

[217]

going to see in code but if you're not

[219]

into that or not going into that it's

[221]

simply the count of the number of

[223]

observations so in step three we just

[226]

divide the sum by the count or the

[228]

length of our data set so we have 1

[229]

million two hundred ninety one thousand

[231]

four hundred dollars divided by our

[233]

twelve observations and we end up with a

[235]

sample mean of one hundred and seven

[237]

thousand six hundred and $16.67

[242]

so that is the mean or the average

[244]

salary of our twelve observations so

[250]

next we have the median so here is our

[252]

data same numbers but I put it in a

[254]

horizontal format and you'll see why

[256]

here in a second so observation one is

[258]

sixty five thousand six hundred and so

[260]

on and so forth so to find the median

[262]

first we sort our data

[265]

from smallest to largest so we can see

[268]

here that we have a small salary of

[270]

twenty nine thousand five hundred

[271]

dollars all the way up to the maximum

[274]

salary of five hundred thousand dollars

[277]

so they are all in order from smallest

[279]

to largest next we ask ourself is our

[284]

data odd or even length or an odd or

[287]

even count of observations if it is even

[290]

as it is in this case we then just

[292]

divide it in half so you can see here I

[294]

have the first six observations shaded

[297]

in a grey and then I have the last six

[299]

observations shaded in sort of a light

[301]

brown and because this data is an even

[303]

number of count or an even length we

[306]

just divide it in half six on one side

[308]

six on the other so here's where we left

[313]

off now well since it's even we find the

[316]

mean of the two middle values so we can

[318]

see here that observation six and

[320]

observation seven those are the two

[322]

middle values on each side of that

[324]

dividing line so all we do is find the

[327]

average or the mean of those two values

[329]

so we have seventy three thousand six

[331]

hundred plus seventy eight thousand

[333]

eight hundred and then we just divide

[335]

that by two so our median in this case

[338]

is seventy-six thousand two hundred

[339]

dollars so what if our data set has an

[344]

odd number of observations or is an odd

[346]

length well in this case we just simply

[349]

divide our length or our count of

[351]

observations in half and then round up

[353]

to the next number so in this case we

[354]

have eleven observations we divide that

[356]

in half so we get 5.5 and let me just go

[359]

ahead and round up to six and it's the

[361]

sixth value that is our medium so in

[363]

this case it's seventy three thousand

[365]

six hundred if you notice that on either

[368]

side of the sixth value we have five

[370]

below it and five above it and the six

[373]

is situated right in the middle

[376]

now mode is very straightforward it is

[379]

the observation that occurs the most so

[381]

in this case we have two salaries of

[384]

$54,000 and quite simply that is the

[387]

mode now again datasets can have more

[390]

than one mode so if we had two salaries

[393]

that were seventy eight thousand eight

[394]

hundred dollars we would have two modes

[396]

we'd

[397]

of a $54,000 mode in the 70 $8,800 mode

[401]

and some datasets don't have a mode at

[403]

all if all the values in our dataset are

[405]

unique then none of them has more than

[408]

one observation and therefore there is

[410]

no mode and this is actually a warning

[415]

the mean can be influenced by extreme

[418]

observations either low and/or high that

[421]

are different from the rest of the

[423]

observations so here's our data set from

[425]

before notice we have someone in our

[428]

data set that has a salary of $500,000

[432]

now look at the other 11 observations

[434]

they all tend to be around you like

[437]

$75,000 give or take so our sample mean

[441]

was one hundred and seven thousand six

[443]

hundred and sixteen dollars and 67 cents

[446]

however our median was seventy-six

[450]

thousand two hundred dollars that is a

[453]

massive difference a huge difference and

[456]

the question is which measure is more

[459]

accurately representing the center of

[461]

our data set so an analyst could

[465]

accurately represent the center of the

[467]

salary data this mean over here is much

[470]

higher than the median due to the

[472]

presence of someone making five hundred

[475]

thousand dollars when everyone else is

[477]

around seventy-five thousand dollars now

[480]

also a good analyst would double-check

[484]

that value that would double check the

[486]

data to make sure that extreme

[488]

observation of five hundred thousand

[490]

dollars is not a data entry or recording

[493]

error in this case it could be really

[495]

really easy to type in five hundred

[498]

thousand dollars when it's supposed to

[500]

be fifty thousand dollars so if you do

[503]

have an extreme observation and your

[505]

data

[505]

don't just say to yourself oh that's

[507]

just how it is no go back look at your

[509]

data look at the records you have look

[511]

at everything you have at your disposal

[512]

to make sure that data is valid in the

[515]

first place before beginning with your

[517]

analysis so our final concept is called

[523]

the trimmed mean so here's our data set

[525]

again then what we do is we sort them

[528]

smallest to largest like we did for the

[529]

median

[530]

and then we remove the same number of

[533]

observations from each end of our data

[536]

set now sometimes this is expressed as a

[538]

percentage like a five percent trimmed

[540]

mean or a 10 percent trimmed mean so in

[543]

the end it doesn't really matter so long

[546]

as you are removing the same number of

[548]

observations from each end of the data

[550]

set so in this case we will remove the

[553]

smallest of $29,500 and we will remove

[557]

the largest of $500,000 so from there we

[563]

just calculate the mean again so in this

[566]

case we have a sample mean or original

[568]

sample mean of one hundred and seven

[570]

thousand six hundred and $16.67

[573]

our median was 76 thousand two hundred

[575]

dollars and in this case our trimmed

[578]

mean is 76192 median and the trimmed

[585]

mean are they're almost identical so by

[589]

removing the same number of extreme

[591]

values on both ends in this case we get

[594]

a value where the trimmed mean and the

[596]

median are almost identical so what we

[599]

can do is conclude that that's probably

[600]

the best representation of the center of

[603]

our data and our original sample mean is

[606]

heavily influenced by that person who

[609]

had a five hundred thousand dollar

[610]

salary among eleven other individuals

[612]

who had salaries around seventy-five

[615]

thousand dollars now here's a note

[617]

trimmed means are better for a single

[620]

variable what we call univariate so you

[622]

know meaning one variate variable once

[626]

you get other variables involved

[627]

relationships between variables are

[630]

created and things become much more

[632]

complicated now however there are

[634]

methods for trimming or removing

[636]

multivariate extremes but that is beyond

[639]

the scope of this video and I do

[640]

actually talk about that in more

[642]

advanced videos in my playlists so when

[644]

you are doing a univariate analysis what

[646]

I would recommend is reporting all three

[648]

of these values report your original

[651]

sample mean the median and then a

[653]

trimmed mean either 5 percent or 10

[655]

percent or something like that so as

[658]

long as you include the original sample

[659]

mean with the trimmed mean that's

[661]

perfectly fine however never leave out

[664]

the original

[664]

I mean you got to have both so you can

[666]

compare the two and of course have the

[668]

median in there okay so a quick

[672]

conclusion and then we are done so the

[674]

mean median and mode can provide

[675]

information about the center of your

[677]

data the mean is the most often used

[680]

however the median can sometimes be a

[682]

better measure the mean can easily be

[686]

influenced by extreme values so be

[688]

careful extreme values may be a

[691]

recording or data entry error so double

[694]

check the original data look for a large

[697]

difference between the mean and the

[699]

median it could be a warning sign that

[701]

you have an extreme observation that is

[704]

pulling the mean in one direction or

[706]

another and finally a trimmed mean can

[710]

also be calculated and reported which

[712]

chops off the same number or percentage

[715]

of observations on both ends of the data

[718]

and again always report that with the

[721]

original sample mean okay so that wraps

[726]

up this video on mean median and mode

[728]

again on their own very simple concepts

[731]

however we want to visualize them we

[733]

want to understand the relationship

[734]

between the three of them and then

[736]

choose which one is the best depending

[738]

on the data we have and again that's

[740]

very important so thank you very much

[742]

for watching I appreciate your time and

[744]

I will see you again in the next video

[746]

take care

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage