🔍

Non-Normal Distribution in Statistics – Skewness and Kurtosis (3-9) - YouTube

Channel: Research By Design

[2]

Now that I have explained to you the ubiquity of the normal distribution, Its

[8]

regular appearance in human measurements, you may begin to hope or even expect

[13]

that all of the distributions that we will encounter will be normal curves, but

[19]

if that is your expectation, you will have to get used to disappointment.

[24]

(Princess bride reference there) Because many curves, perhaps most curves, are not

[37]

normal distributions, we need a way to talk about the shape of distributions

[42]

when they differ from normality. The first difference that we may find is

[47]

that the scores in the distribution are more spread out than we would have

[51]

expected, or we may find the scores are more closely packed together than we

[56]

expected. The name for the peaked or flatness of a curve is called kurtosis.

[63]

When the scores are very close together, then the curve becomes peaked. We call

[70]

this a "leptokurtic" curve think of the scores leaping up - leptokurtic. When the

[79]

scores are very spread out, the curve becomes flat like a plate, we call this

[84]

platykurtic. "Plat" rhymes with flat. Platykurtic is a flattened curve in

[91]

the shape of a plate. A normal curve is mesokurtic. It's kurtosis is medium. So

[101]

kurtosis can be measured as leptokurtic - tall, platykurtic - flat, or mesokurtic -

[108]

medium. Kurtosis is caused by the variability in the distribution. Another

[116]

thing that can happen to a curve is when the scores are pulled out in only one

[121]

direction.

[127]

When the scores are dragged down (or rather, out) in only one direction, this

[133]

creates a skew in our curve. Therefore, we need to talk about the skewness of

[139]

our distribution. Negatively skewed distributions have a higher than

[144]

expected frequency of high or extreme scores on the right, and the tail is

[148]

pulled out to the left end of the number line on the x-axis. For example, if we

[156]

were interested in the running speeds of football players, we might find a lot of

[160]

very fast players - high scores, but only a few slower runners - low scores. Skewness

[167]

is always caused by outliers in the direction of the tail. In a positively

[176]

skewed distribution, the higher than expected frequencies are on the low end

[180]

of the curve. The tail is pulled back on the right or positive end of the number

[185]

line. If we were measuring reaction time, we would expect to have a large number

[192]

of very quick responses - low scores, and only a few slower responses, taking more

[198]

time, further up the positive end of that scale. Skewed distributions are not

[204]

normal. How can you remember which direction is positive or negative when

[211]

we talk about skewness? Stats cow tells us that the skew is in the tail. Skewness

[220]

is caused by outliers, extreme scores in the tail of the distribution, the

[226]

direction that the tail is pulled out, (positive or negative) is the direction of

[232]

the skew. Here are two curves. This first one is positively skewed, and the second

[240]

is negatively skewed the top curve is positively skewed because the tail is

[246]

pulled out on the right, or the positive direction of the number line. The bottom

[251]

curve, is negatively skewed, the tail is pulled out on the negative, or left end

[256]

of the number line. Both of these curves show us

[261]

what happens to the mean and the median in the case of kurtosis. In both of these

[267]

curves, you can see what happens to the mean and the median in the case of

[272]

skewness. Both of them are pulled in the direction of the outlier but the mean is

[279]

pulled further. That is because the mean is more susceptible to the outlier that

[285]

is causing the skewness. Mathematically we can calculate a measure of skewness

[290]

by comparing the mean and the median and this will give us a value that we can

[295]

use to quantify the skewness of our curve. But there are other things that

[300]

can go wrong with our normal curve! Instead of having one peak sometimes we

[306]

have two peaks. This occurs when there is more than one most frequently occurring

[312]

score we call this type of curve bimodal. A curve can be bimodal when there really

[319]

are two most frequently occurring scores. For instance when is the best time to go

[325]

fishing? At what time of day will you catch the most fish? Probably early in

[331]

the morning, and then in the evening when the sun is going down. In the middle of

[336]

the day, when the Sun is at its height, you will catch fewer fish. So if we plot

[340]

the number of fish caught, we will see a peak in the morning at dawn, and another

[345]

peak in the evening at dusk. This would be a true bimodal distribution. On the

[351]

other hand, we might have a bimodal distribution when there are actually two

[355]

distributions overlying each other. When we had both males and females on the

[361]

football field and we were comparing heights, we saw that there was a

[364]

distribution for males and another distribution for females. The

[369]

distributions overlapped - some females were taller than some males - but the

[374]

average heights were taller for males. They really were two distinct

[379]

distributions that should be separated before being analyzed. A multimodal

[385]

distribution has three or more most frequently occurring scores.

[390]

You may wonder why we don't call it a trimodal distribution or a quadrimodal

[394]

distribution - four peaks - the answer is that when we start getting three, four,

[400]

five, modes, there is something very wrong in our data set. Three or more modes is

[407]

multimodal, and it's messed up. We need to figure out what is going on before we

[413]

try to analyze those data. Rectangular distributions have the same frequency

[419]

for all scores. If you roll a single die 100 times, how many times do you expect

[426]

to get a one? About one-sixth of the time, in fact you would expect to get each of

[432]

the scores, one through six, approximately one sixth of a time. That is a

[438]

rectangular distribution. Once you add a second die, however your distribution

[443]

will begin to look more normal. Rectangular distributions have exactly

[448]

the same frequency for all scores, and do not have tails. Before we conclude, there

[456]

is - one more thing - that I want to tell you about the normal curve, and that is

[463]

that the normal curve can be overlaid with a number line, and this is where

[468]

things get really interesting and quite useful. If we have a normal curve, we can

[473]

add the value of the mean right in the middle where it belongs, and in this

[478]

example we're going to imagine that our mean is 50, so then we could lay out a

[483]

number line with four point delineations. Half of our scores will always be above

[490]

the mean, or above 50. The remaining half of the scores will always be below 50.

[496]

That is what a measure of central tendency tells us. It is the point at

[501]

which half of the scores fall above and half of the scores fall below. The

[506]

next thing that we could do is measure the proportion of the scores that fall

[511]

within a certain range, above or below the mean. The next thing that we could do

[517]

is measure the proportion of the scores that fall within a certain range above

[521]

or below the mean. The proportion is the total area under the

[525]

normal curve that corresponds to the relative frequency of those scores. To

[531]

better understand this, let's return to our picture of the people standing on

[535]

the football field. Remember that everyone (100%) are

[540]

standing below the rope that represents our distribution we want to know the

[546]

proportion of people who are between five foot six and five foot nine inches

[551]

tall. We ask everyone who is in those rows, five foot six, seven, eight, and nine

[559]

to stay where they are, everyone else please leave the field. So how many

[565]

people are in these four rows? Divide the number of people in the four rows by the

[571]

total number of people and you have a proportion. This is the proportion of

[576]

people who are in that range underneath the distribution. It would also be the

[582]

relative frequency of the number of people in that range, and this is going

[587]

to become a very useful technique when we talk about z-scores. But for now, just

[592]

remember what we've learned about the frequency table, and specifically how the

[596]

relative frequency relates to what we know about the normal curve.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage