🔍

Level I CFA Quant: Sampling and Estimation-Lecture 1 - YouTube

Channel: IFT

[0]

Sampling and Estimation. Here are the sections in

[6]

this reading. We'll talk about how to use sample data

[12]

to estimate population parameters and our main focus

[16]

will be on the population mean generally denoted

[20]

by mu. Let us understand the basic concept of sampling.

[25]

Say you have a large population such as the returns

[29]

on all stocks in the United States for last year, and

[35]

you are interested in the mean return or the average

[39]

return, that is an example of a parameter, generally

[43]

denoted by the symbol, mu, so a parameter such as the

[47]

mean return is used to describe a population. Population

[51]

here being the return on all stocks in the United

[54]

States last year. You might not have the time to go

[58]

through every single stock and come up with the average

[62]

return. So what you could do is pull a sample from

[67]

the population. So let's say you pull out a sample

[70]

of 100 stocks and then you find the average of these

[75]

100, that average is generally denoted by X bar,

[80]

and this is called a statistic so a statistic is used to describe

[85]

the sample. Simple random sampling a simple random

[91]

sample is a subset of a larger population such that

[95]

each element has an equal probability of being selected

[100]

to the subset. So let's say that this population

[103]

has a thousand items. So simplistically, let's

[108]

say that there are a thousand stocks in our overall

[111]

population. When we create a sample of hundred, if

[116]

every stock in this population has an equal probability

[120]

of being selected when we say that we have a simple

[125]

random sample. The next concept you need to know

[129]

is that of Sampling Error. Clearly this population

[134]

has a certain mean called mu, and when you come up

[138]

with a sample and you compute the sample mean, this

[142]

sample mean is not necessarily going to be exactly

[146]

the same as the population mean. The difference between

[150]

the sample mean and the population mean is called

[154]

the sampling error. So to put this formally, sampling

[159]

error is the difference between the observed value

[161]

of the statistic and the quantity it is intended

[165]

to estimate. Sample Distribution of a Statistic.

[171]

Sample Distribution of a Statistic is the distribution

[174]

of all the distinct possible values that the statistic

[179]

can assume when computed from a randomly drawn sample.

[184]

Let's continue with our example. So we have this

[187]

large population which consists of all stocks in

[191]

the United States and you are concerned about the

[195]

population mean. Let's say you draw the first sample

[199]

with a sample size equal to 100 and you come up with the

[204]

mean for sample 1 and let's say that that number

[209]

is 10%. Then you draw another sample, obviously the

[215]

mean here will not be exactly the same as the mean

[218]

for the first sample. Assume that you are drawing

[221]

the same sample size, let's say that the mean for

[224]

sample 2 turns out to be 11% then you draw a third sample,

[229]

and here the mean might be 9% and so on. You keep drawing

[235]

samples, and you will notice that there is a certain

[239]

distribution of the sample means and that is called

[244]

the sample distribution of a statistic. So now this

[248]

definition will make more sense. Sample distribution

[251]

of a statistic is the distribution of all the distinct

[256]

possible values that the statistic can assume when

[261]

computed from a randomly drawn sample. Let us now

[268]

take a look at Stratified Random Sampling. Say you

[272]

want to become a little more sophisticated in terms

[274]

of picking your sample from this large population

[278]

of American stocks and you recognize the fact that

[282]

they are different exchanges and you want to make

[285]

sure that your sample has a representation from

[288]

each exchange. So what you can do is divide the population

[293]

into strata based on one or more classification

[296]

criteria, in my example, the classification criteria

[300]

is the different exchanges so you can say there is

[303]

exchanged 1 exchange 2 exchange 3 and exchange

[307]

4 so now you have the different sub-populations

[312]

or strata next you pull a sample from each sub-population,

[319]

then you draw a simple random sample from each stratum

[323]

in sizes proportional to the relative size of each

[327]

stratum in the population. In other words, if exchange

[331]

2 is two times bigger than exchange 1 then clearly

[335]

the random sample that you draw from exchange 2

[338]

must be two times bigger than exchange 1 so here

[342]

you might have 20 stocks, here, you would pull a sample

[346]

of 10 and so on. In my simple example exchange 3

[351]

is the same size as exchange 2 so here you will

[354]

also have 20 and exchange 4 is the same size as exchange

[357]

1 so here you would have 10. And then you pool these

[363]

samples to form a stratified random sample so hear

[367]

your sample of 60 has representation from all four

[371]

sub-populations, and this would be called a Stratified

[376]

Random Sample. Let us look at a simple practice

[381]

question. Paul wants to categorize publicly listed

[385]

stocks for his research project. He first divides

[388]

the stocks into 15 industries. Then from each industry,

[391]

he categorizes companies into three groups.

[394]

Finally, he divides these into value vs growth

[398]

stocks. How many cells of strata does the sampling

[401]

plan entail? Now this is a little more complicated

[405]

than what I just talked about and here we need to recognize

[409]

that there are 15 Industries, for each there are three groups,

[414]

and then we have value vs growth so we multiply it by 2 so the correct

[420]

answer is simply a product of these three numbers,

[423]

which is 90. Time Series and Cross-Sectional Data. Time

[431]

Series is a sample of observations taken at specific

[435]

and equally spaced points in time. For example,

[439]

the monthly returns on Microsoft stock from January

[442]

1995 to January 2005 that is fairly self-explanatory.

[449]

Cross-sectional data is a sample of observations

[452]

taken at a single point in time. So for example, the

[456]

sample of reported earnings per share for all NASDAQ

[459]

companies for 2005, so t notice that here, the data

[463]

is for 2005 for a range of companies and this would

[468]

be an example of cross-sectional data. For both

[473]

time series and cross-sectional data the random

[476]

sample must be representative of the populations

[480]

we wish to study. Consider this practice question.

[487]

A researcher needs to make use of the 2012 household

[493]

budget data for Scandinavian countries. Is this

[497]

cross-sectional data, time series or panel data? Based

[503]

on what we've just talked about, this is cross-sectional

[506]

data. Distribution of the Sample Mean. Let's say

[514]

you draw several samples from the population and

[517]

for every sample you compute the mean. Clearly, there

[522]

will be a certain distribution for the sample means

[526]

and that is what we are going to talk about. For a population

[531]

with mean, mu, and variance, sigma squared, the sampling

[536]

distribution of the sample means so that is the

[541]

distribution of the X bars. Of all possible samples

[546]

of size n, each of these samples needs to be of size n

[551]

and let's say that n is 100 in our example,

[554]

so the distribution of the sample mean will be approximately

[559]

normal with a mean equal to mu and a variance equal

[566]

two sigma squared over n. So there are 3 core statements

[570]

being made. The first one is that the distribution

[573]

of x bar or the distribution of the sample mean is

[577]

going to be normal. The second point is that the mean

[582]

is mu which is essentially the mean of the population,

[586]

and the variance of this distribution is sigma squared

[591]

over n. Hopefully it is fairly obvious that the mean

[597]

of this distribution is mu because you are drawing

[601]

these samples from the population. So the expected

[605]

value of x bar or the sample mean should be the population

[610]

mean. In terms of the variance think of it this way, if the population has a very

[617]

high variance, so the population data is spread

[620]

out a lot, then you would expect x bar to also be spread

[625]

out. In other words, the distribution of x bar would

[628]

also have a high variance. That is why we have sigma

[632]

squared in the numerator, on the other hand if the

[636]

size of the sample is large then the distribution

[640]

of x bar become smaller because largest samples

[645]

mean that you are more likely to be close to the population

[649]

mean. In other words, larger sample means that you

[653]

are more likely to have X bar or a sample mean that is

[657]

closer to the population mean that's why n is in

[662]

the numerator. What we have just learned is the Central

[666]

Limit Theorem and I will repeat exactly what is

[669]

stated in the curriculum. Given a population described

[673]

by any probability distribution and this is critical

[677]

because the central limit theorem applies to a distribution

[680]

whether or not it is normal. So the population has

[686]

a mean mu and a finite variance sigma squared, this

[690]

is what we saw on the last slide, the sampling distribution

[694]

of the sample mean, the sample mean is the X bar and

[699]

we are talking about the distribution of x bar assuming

[703]

that you pull several samples from the mean computed

[708]

from samples of size n. Again, I'm repeating because

[711]

this is important, the sample size always has to be

[714]

the same. The distribution of x bar will be approximately

[718]

normal with a mean equal to mu and variance equal

[723]

to sigma squared over n. When the sample size is

[727]

large, and large means a sample size of 30 or more.

[732]

The standard error of the sample mean is the standard

[738]

deviation of the distribution of the sample means.

[741]

So there is a new piece of terminology but the concept

[745]

of straightforward. We just said that the variance

[748]

of the distribution of x bar is sigma squared over

[753]

n. The standard deviation, which is simply the square

[759]

root of this expression is essentially sigma over

[765]

root n, that is the standard error of the sample mean.

[769]

If we know the population variance then we simply

[772]

plug that here and we compute the standard deviation

[777]

of X bar or the sample mean using this expression.

[781]

If the population variance is not known then the

[786]

standard deviation for the distribution of x bar

[790]

is equal to s, this is the standard deviation of the

[795]

sample. Clearly, if the population variance is not known

[798]

then we use the variance or the standard deviation

[802]

of the sample as a proxy. And again divide that by root n so both

[810]

these expressions are similar, here we are using

[813]

the population standard deviation because it's

[815]

known. Here we are using the sample standard deviation

[818]

because we don't know the population standard deviation.

[825]

Let's look at this simple question. You need to use

[829]

the formula that we just discussed which is sigma

[831]

over root n, sigma is 3, n is 64 so 3 over root of 64 should

[839]

give you A, which is 0.375

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage