Sampling - YouTube

Channel: unknown

[0]
All right, so for today's lesson we're gonna introduce you to different types
[4]
of sampling methods and we're gonna see how those methods can get rid of any
[10]
inherent bias. So I thought I'd start off today's lesson with an example: a
[17]
political researcher is trying to estimate who's going to win the next presidential
[21]
election. So to study this matter they thought to themselves well I need to
[25]
take a sample. So they went to a large shopping center where a lot of different
[30]
type of people congregate and they randomly sampled a thousand people by
[35]
asking them who are they going to go for. So now what we need to consider is, is
[40]
this a good sampling method? If it is comment on why it's a good method and if
[46]
it's not coming on while you feel that way. So pause the video and spend them
[51]
out a minute trying to figure this one out.
[59]
We're back, hopefully after you've had a few moments to think about that problem
[65]
you've come to the conclusion that this was not a good sampling method and
[70]
that's true for a multitude of reasons. Now while they did randomly sample
[75]
people at that shopping center, we need to realize that that shopping center was
[81]
in a specific city, or a specific town or specific state. And only
[87]
people who went to that center at that particular day or time could have been
[92]
selected. This sampling method is biased, we don't want to have a biased sampling
[101]
method otherwise it's not going to yield accurate results for what we're
[105]
interested in. This may have given us a good representation of who people at
[113]
that shopping center would vote for but not a good representation of the whole
[117]
United States. A bias is any systematic overrepresentation or under-
[123]
representation of certain characteristics or traits from your
[127]
population of interest. In this case we overrepresented people who went to that
[132]
center and underrepresented people who didn't now. We should think about what
[140]
are the different ways we can get rid of inherent bias because bias is bad. We're
[147]
going to spend a lot of time in this course thinking about ways to get rid of bias
[152]
or at least in this next lecture. If you have a biased sample, you're not going to
[158]
accurately estimate the population that you're interested in. How we are going to
[165]
get rid of bias, how bias is is eliminated is by introducing randomness.
[171]
Randomness is key, randomness favors no one, it has no inclination. When you
[180]
introduce randomness into a sample you'll get rid of bias and so what we're
[186]
going to do is we're going to talk about three different sampling methods that
[190]
introduce randomness. So the first sampling method
[195]
that we're going to learn about today is what's called a simple random sample or
[199]
SRS for short and this gives every single member of the population the same
[204]
chance of being selected. This is getting rid of any inherent bias for us, every
[210]
single person, every single u.s. voter rather has the same chance of being
[214]
selected. Regardless of where they live, what their ethnicity is, what is their
[219]
income level, everybody has the same chance of being selected. So one way we
[223]
could do this is to put everybody into a big hat and, you know everybody's name
[228]
into a big hat ,and randomly pull out let's say a thousand names. Or we can
[232]
assign everybody a number, all US voters a number, and randomly pull out or select
[239]
a thousand numbers. Now there's some benefits to this sampling method and
[245]
some downsides. Right, one that we talked about is everybody can be
[249]
selected, everybody has the same chance of being selected and it's not biased in
[255]
and of itself. However though, there are some things we need to consider; imagine
[262]
we did this in practice right. We were trying to estimate or predict who is
[267]
going to win the presidential election and we randomly pulled out a thousand US
[273]
voters. Is it possible for instance that we pulled out a thousand people all from
[281]
California? It technically is, it's not likely but it's possible. You can
[291]
still end up with unintentionally a biased sample when you perform a simple
[298]
random sample by sheer chance. You didn't intend it to be biased but it's possible
[304]
you can end up with a biased sample. Maybe you selected 500 people from
[308]
California and 500 people for Texas and didn't represent the other 48 states. We
[315]
also need to consider that how are we going to get a list of all, I don't know, 250
[323]
million potential U.S. voters. That's going to take
[325]
a lot of time, effort, money and resources that we probably don't have. So while
[331]
this is a completely random method there are some downsides into performing this.
[337]
So another type of sampling method that helps fix some of the issues that an SRS
[343]
has is what's called a stratified random sample. And what this does is it breaks
[349]
up your population into groups which it refers to as strata; but you can still
[353]
think of this as groups and then it performs a simple random sample from
[358]
each group. So for instance, if we were trying to predict the U.S. election, the
[362]
winner of the U.S. presidential election rather, we could separate United States
[366]
voters into East, Central and West. And so then what we would then do is we would
[373]
randomly select a certain number of people from those three regions and in
[381]
doing so we would be able to better represent the United States as a whole.
[386]
Now you don't need to select the same number of people from each group or each
[393]
strata, you can select a different amount of people based on the size of those
[398]
strata. So for instance if there's more people living on the west side of the
[406]
United States compared to the central, you would want to represent that in your
[410]
sample. So the benefit of this sampling method compared to a simple random
[415]
sample is that it can guarantee you represent observations of a certain type.
[422]
A simple random sample cannot guarantee that you would represent all the groups
[429]
that you're interested in based on your population. So there are a lot of
[433]
advantages of working with a stratified random sample in that you break up
[438]
these observations into certain groups and you randomly select observations
[444]
from every group. However, there are still some downsides to a stratified random
[449]
sample. Again this method can take a lot of time, effort, money and resources.
[457]
I mean how are we gonna contact random people from the central United States,
[461]
random people from the east eastern United States, and random people from the
[466]
western United States. I mean that's going to take a lot of time, effort, money
[469]
and resources. You'll also run into issues where if you don't select the
[476]
correct percentage of observations from each group; for instance if there's more
[482]
people in the western United States than there central and you selected the same
[487]
amount of people from the West and the central. Well in that case you would be
[492]
biased, you would be over-representing central United States people and
[496]
under-representing Western United States people. So if you're going to perform a
[503]
random, excuse me, a stratified random sample you want to make sure the
[508]
percentages of each group match up with what you're sampling from each group. So
[515]
the two sampling methods that we cover at simple random sample and stratified
[520]
random sample both of those methods can take a lot of time, effort, money and
[525]
resources. So the last one that we'll cover called a random cluster sample
[530]
that's going to be the easiest for us to accomplish. And what this does is similar
[536]
to a stratified random sample but it breaks up your population into groups
[541]
which in this case it calls clusters. So you can think of strata, clusters and
[547]
groups as synonyms, they all essentially mean the same thing. And so for a random
[554]
cluster sample though, how it's different from a stratified random sample, is that
[559]
it doesn't select observations from every group. So if we were trying to do
[565]
the predict the presidential election of the United States and we had our three
[569]
regions; west, central and east for instance. We could randomly select people
[575]
from one of those three regions maybe from the the west, maybe from the east. We
[580]
can select all people from a certain group, we can select a subset of people
[585]
from a certain group, we could even select two groups, two whole groups,
[590]
simple random samples from two groups. The only real difference between a
[595]
stratified random sample and a cluster random sample is that you're not
[600]
selecting members from every single group in a cluster sample. A stratified
[606]
random sample you select from every single group. Now this is going to be
[612]
beneficial for us because it's easier to perform. Right if you're a researcher in
[619]
the western part of the United States it's probably easier to get observations
[624]
or take a survey from people who live in that region. It breaks up these these
[633]
population you're interested in two groups. It's gonna be your easiest way to
[638]
get a sample. However though, the big downside to this, maybe you kind of came
[643]
across it, is it's going to be biased. There's no way to avoid that because
[648]
you're not selecting observations from every single group, you're guaranteed to
[652]
end up with a biased sample. The only real reason you would want to do something
[658]
like this is because it's the only thing you have
[661]
available. You just don't have the time, effort, money, resources to take or to get
[668]
an unbiased representative sample. In reality companies that try to predict
[676]
the presidential election they'll spend millions of dollars to get a sample of
[682]
just about 3,000 US voters that's unbiased and representative. It can be
[688]
very difficult to get that, so a lot of the times the only option that some
[693]
people are left with or some researchers are left with is taking a random cluster
[698]
sample. Is it ideal? Absolutely not, but it is the only thing that, it is
[702]
sometimes the only thing that's possible. Yeah so these were the three sampling
[708]
methods that we'll cover in this course: simple random sample, random stratified
[712]
sample and a random cluster sample.