đ
Sampling Methods and Bias with Surveys: Crash Course Statistics #10 - YouTube
Channel: CrashCourse
[3]
Hi, Iâm Adriene Hill and welcome back to
Crash Course Statistics.
[6]
In our last episode we talked about how we
use experiments to imitate having two parallel
[10]
universes to test things.
[12]
But sometimes you canât do certain experiments
without becoming an all-powerful and evil
[16]
dictator, and since itâs statistically unlikely
that any of you are evil dictators, today,
[21]
weâll explore those methods.
[23]
Like we mentioned at the beginning of the
series, youâre not always able to answer
[26]
the questions you really want to answer using
statistics.
[30]
For example, it would be great to experimentally
test whether getting married increases your
[34]
lifespan, but you canât randomly assign
some people to be married and force another
[39]
group to be single.
[41]
Not only would that be difficult to enforce,
it would also be pretty unethical, though
[45]
I suppose you being evil takes care of that
particular concern.
[49]
Similarly we canât assign someone to be
a twin, or a Democrat, or a smoker.
[53]
But that doesnât mean we should just give
up and stop trying to find out more about
[57]
these topics.
[58]
Not at all.
[59]
Instead we just need a different method to
collect data.
[62]
Enter Non-Experimental methods.
[64]
INTRO
[74]
One of the most common non-experimental method
is the survey.
[78]
From user experience surveys on websites,
to political polls, to health questionnaires
[82]
at the doctorâs office, youâve probably
taken hundreds of surveys in your lifetime.
[86]
There are two things that can make or break a survey: the questions, and who the researcher
[91]
gives the questions to.
[92]
The goal of a survey is to get specific information.
[95]
Say youâre walking your dog in a local park,
and someone approaches you and asks you to
[99]
take a survey on local businesses in your
town.
[102]
When you look at the questions you notice
that none of them are about local businesses,
[106]
instead you find yourself answering questions
about your politics and religious beliefs.
[110]
Unless the surveyor was lying to you about
their purposes, this is not a very good surveyâŠ.Itâs
[114]
also not a very good lie
A survey should measure what it claims to
[118]
measure.
[119]
It might seem obvious that having only unrelated
questions on your survey is problematic, and
[124]
there are even more subtle ways a question
can be biased.
[127]
Letâs take a look at a few questions from
a health survey you might take at a doctorâs office.
[131]
The first question asks you how often you
exercise: never, less than 30 minutes a week
[136]
or 30 minutes a day.
[137]
So what do you answer if you exercise for
half an hour twice a week?
[140]
Or if youâre on swim team and exercise for
at least an hour a day?
[143]
And does walking count as exercise?
[146]
Multiple choice questions that donât offer
all possible options and/or an âOtherâ
[150]
option can cause respondents to either skip
the question, or feel forced to choose an
[154]
answer that isnât accurate.
[156]
Claims made using these questions arenât
as strong as they could be if people were
[160]
offered a full range of choices.
[162]
The next question asks you âAnswer yes or
no: I donât smoke because I know itâs
[167]
damaging to my healthâ this is a leading
question since the wording leads to towards
[171]
the quote âdesiredâ answer.
[173]
This is especially effective when a question
deals with sensitive issues like smoking,
[178]
politics, or religion.
[180]
People answering the questions want to be
seen in a positive light, and so they tend
[183]
to give the answer they think is âappropriateâ.
[185]
While having people fill surveys out anonymously
by themselves can help, it can sometimes be
[190]
the case that respondents donât want to
admit things--even to themselves--that are
[194]
socially undesirable.
[195]
In general terms, good survey questions are
worded in a neutral way such as asking âhow
[200]
often do you exerciseâ or âdescribe your
smoking habitsâ instead of using wording
[205]
or options that push survey takers in a certain
direction.
[208]
And while your doctor wouldnât...or shouldnât...do
this...sometimes groups purposely use biased
[213]
questions in their surveys to get the results
that they want.
[216]
Apparently, back in 1972, Virginia Slims conducted
a poll asking respondents if they would agree
[223]
with the statement: âThere wonât be a
woman President of the United States for a
[228]
long time and thatâs probably just as well.â
[231]
Not a well-written question.
[233]
Biased questions can be more subtle...and
can lead to skewed reports of very serious
[236]
things like sexual assault, or mental health
conditions.
[239]
Itâs important to always look for biased
questions in surveys, especially when the
[244]
people giving the survey stand to benefit
from a certain response.
[247]
Even when researchers have created a non-biased
survey, they still need to get it into the
[252]
right hands.
[253]
Ideally, a survey should go to a random sample
of the population that theyâre interested
[257]
in.
[258]
Usually this means using a random number generator
to pick who gets the survey.
[262]
We do Simple Random Sampling so that thereâs
no pattern or system for selecting respondents
[267]
and each respondent has an equal chance of
being selected.
[270]
For example, telephone surveys often use Random
Digit Dialing which selects 7 random digits
[274]
and dials them.
[276]
When someone picks up, theyâre asked to
take a survey.
[278]
But hereâs where we hit our first issue.
[279]
If people arenât forced to respond to the
survey, we might experience something called
[283]
Non-Response Bias in which the people who
are most likely to complete a survey are systematically
[289]
different from those who donât.
[291]
For example, people with non-traditional working
schedules like retirees, stay at home parents,
[295]
or people who work from home might be more
likely to answer a middle-of-the-day phone
[300]
survey.
[301]
This is a huge problem if those groups are
different than the population as a whole.
[305]
If your survey was on health insurance plans,
or political opinions, itâs likely that
[309]
these three groups would have different opinions
than the population, but they represent the
[313]
majority of survey responses, which means
your data wonât represent the total population
[319]
very well.
[320]
This is also related to Voluntary Response
Bias in which people who choose to respond
[324]
to voluntary surveys they see on Facebook...or
Twitter... are people who again, are different
[330]
than the broad population.
[331]
This is especially true with things like customer
service surveys.
[334]
People who respond tend to have either very
positive or very negative opinions.
[339]
See the comment section below.
[341]
The majority of customers with an average
experience tend not to respond because service
[346]
wasnât noteworthy.
[347]
Wait.
[348]
Does that mean Iâm not noteworthy?
[350]
Another source of bias is just plain underrepresentation.
[353]
If a group of interest is a minority in the
population, random sampling paired with response
[358]
biases might mean that that minority isnât
represented at all in the sample.
[363]
Let's say there is a city where 5% of the population
is single mothers, itâs entirely possible
[368]
that the sample will contain no single moms.
[371]
To overcome these issues, we have a couple
options.
[373]
We could weight peopleâs responses so that
they match the population (like, counting
[377]
the few single mothers who do respond multiple times so that they count for 5% of the total sample).
[384]
But, this can be problematic for the same
reasons that response bias is problematic.
[388]
If the few single mothers who respond donât
represent all single mothers, our data is
[393]
still biased.
[394]
In a 2016 LA Times/USC political tracking
poll, a 19-year-old black man was one of 3,000
[400]
panelists who was interviewed week after week
about the upcoming presidential election.
[404]
Because he was a member of more than one group
that was underrepresented in this poll, his
[409]
response was weighted 30x more than the average
respondent.
[413]
According to the New York Times, his survey
boost his candidateâs margins by an entire
[418]
percentage point.
[419]
Stratified Random Sampling is another option.
[422]
It splits the population into groups of interest
and randomly selects people from each of the
[426]
âstratasâ so that each group in the overall
sample is represented appropriately.
[431]
Researchers have used stratified sampling
to study differences in the way same-sex and
[435]
different-sex couples parent their kids.
[437]
They randomly select people from the same-sex
parenting group and... randomly select people
[441]
from a different-sex group of parents to make
sure that theyâre well represented in the
[445]
sample.
[446]
Another issue is that getting surveys to people
can be expensive.
[449]
If a cereal company wants to see how families
react to their new cereal, it would be costly
[454]
to send some cereal to a random sample of
all families in the country.
[457]
Instead they use Cluster Sampling which create clusters (not Honey Nut Clusters) that are naturally occuring (like
[464]
schools, or cities) and randomly select a
few clusters to survey instead of randomly
[469]
selecting individuals.
[470]
For this to work, clusters cannot be systematically
different than the population as a whole and
[476]
and they should about equally represent all groups.
[479]
Issues can also arise when the population
being surveyed is very small or difficult
[483]
to reach, like children with rare genetic
disorders, or people addicted to certain drugs.
[488]
In this case, surveyors may choose to not
use randomness at all, and instead use Snowball Sampling.
[493]
Thatâs when current respondents are asked
to help recruit people they know from the
[496]
population of interest... since people tend
to know others in their communities and can
[500]
help researchers get more responses.
[502]
And note that these sampling techniques can
and are used in experiments as well as surveys.
[507]
There are also non-experimental data collection
methods like a Census.
[511]
A Census is a survey that samples an ENTIRE
population.
[515]
The United States conducts a Census every
10 years, with the next one scheduled to be
[519]
done in 2020.
[520]
It attempts to collect data from every.single.resident
of the United States (even undocumented residents,
[525]
and homeless residents).
[526]
As you can imagine, this is hard, and
it is not without error.
[528]
In Medieval Europe, William the I of England
conducted a census in order to properly tax
[533]
the people he had conquered.
[535]
In fact a lot of rulers tended to use censuses
to know just how much money they should be
[539]
demanding.
[540]
Until the widespread availability of computers,
the US census data took almost 10 years to
[545]
collect and analyze.
[547]
Meaning that the data from the last census
wasnât even available until right before
[551]
the next census.
[552]
The length of time it took to complete the
census is part of the reason we even have
[557]
computers...check out our CompSci series for
more on that.
[559]
So why collect census data--instead of just
sampling the population?
[563]
In the US--the Census could cost more than
15 Billion dollars in 2020.
[567]
There are a lot of reasons.
[568]
The constitution says we have to, but also
the census provides the truest measure of
[572]
the population we can get.
[574]
It minimizes sampling error.
[575]
It also functions as a benchmark for future
studies.
[578]
And a census can give researchers really specific
information about small groups of the population--information
[584]
that might be hard to gather with regular
sampling methods.
[588]
Doing statistics on Census data is different,
because most statistical inference aims to
[592]
take a small sample and use it to make guesses
about that population.
[596]
But with a census we already have data from
the entire population, we donât need to
[601]
guess if there are differences, we can just
see them.
[603]
Analysis on Census data is usually more concerned
with whether differences we see are large
[607]
enough to make a difference in everyday life,
rather than guessing IF there is a relationship.
[611]
The census as we said can take years.
[613]
And entire countries to fund.
[615]
That doesnât discount the value of sampling.
[617]
But we should be cautious...Badly worded polls,
fake polls, and biased polls are common.
[623]
So are the results of those polls.
[624]
The statistics-friendly website FiveThirtyEight
put together a great list of advice on how
[629]
not-to-fall for a fake poll.
[631]
Among its advice--Ask yourself if it seems
professional.
[634]
Check to see who conducted the poll--and if
you trust them.
[637]
See how the poll was conducted.
[639]
Check out the questions they asked...and who
they asked.
[642]
If it seems fishy.
[643]
It probably is fishy.
[644]
That said, well done surveys are essential.
[647]
They allow us to get information without all
the trouble of doing an experiment, and since
[651]
theyâre comparatively easy, theyâre popular
ways for businesses, countries, and even Youtube
[656]
channels to collect information.
[657]
In fact Crash Course Statistics has its own
survey! The Link is in the description.
[662]
And it takes way less time than
the Nerdfighteria one. I promise.
[665]
Thanks for watching. I'll see you next time.
You can go back to the homepage right here: Homepage





