🔍

Sampling Methods and Bias with Surveys: Crash Course Statistics #10 - YouTube

Channel: CrashCourse

[3]

Hi, I’m Adriene Hill and welcome back to Crash Course Statistics.

[6]

In our last episode we talked about how we use experiments to imitate having two parallel

[10]

universes to test things.

[12]

But sometimes you can’t do certain experiments without becoming an all-powerful and evil

[16]

dictator, and since it’s statistically unlikely that any of you are evil dictators, today,

[21]

we’ll explore those methods.

[23]

Like we mentioned at the beginning of the series, you’re not always able to answer

[26]

the questions you really want to answer using statistics.

[30]

For example, it would be great to experimentally test whether getting married increases your

[34]

lifespan, but you can’t randomly assign some people to be married and force another

[39]

group to be single.

[41]

Not only would that be difficult to enforce, it would also be pretty unethical, though

[45]

I suppose you being evil takes care of that particular concern.

[49]

Similarly we can’t assign someone to be a twin, or a Democrat, or a smoker.

[53]

But that doesn’t mean we should just give up and stop trying to find out more about

[57]

these topics.

[58]

Not at all.

[59]

Instead we just need a different method to collect data.

[62]

Enter Non-Experimental methods.

[64]

INTRO

[74]

One of the most common non-experimental method is the survey.

[78]

From user experience surveys on websites, to political polls, to health questionnaires

[82]

at the doctor’s office, you’ve probably taken hundreds of surveys in your lifetime.

[86]

There are two things that can make or break a survey: the questions, and who the researcher

[91]

gives the questions to.

[92]

The goal of a survey is to get specific information.

[95]

Say you’re walking your dog in a local park, and someone approaches you and asks you to

[99]

take a survey on local businesses in your town.

[102]

When you look at the questions you notice that none of them are about local businesses,

[106]

instead you find yourself answering questions about your politics and religious beliefs.

[110]

Unless the surveyor was lying to you about their purposes, this is not a very good survey….It’s

[114]

also not a very good lie A survey should measure what it claims to

[118]

measure.

[119]

It might seem obvious that having only unrelated questions on your survey is problematic, and

[124]

there are even more subtle ways a question can be biased.

[127]

Let’s take a look at a few questions from a health survey you might take at a doctor’s office.

[131]

The first question asks you how often you exercise: never, less than 30 minutes a week

[136]

or 30 minutes a day.

[137]

So what do you answer if you exercise for half an hour twice a week?

[140]

Or if you’re on swim team and exercise for at least an hour a day?

[143]

And does walking count as exercise?

[146]

Multiple choice questions that don’t offer all possible options and/or an “Other”

[150]

option can cause respondents to either skip the question, or feel forced to choose an

[154]

answer that isn’t accurate.

[156]

Claims made using these questions aren’t as strong as they could be if people were

[160]

offered a full range of choices.

[162]

The next question asks you “Answer yes or no: I don’t smoke because I know it’s

[167]

damaging to my health” this is a leading question since the wording leads to towards

[171]

the quote “desired” answer.

[173]

This is especially effective when a question deals with sensitive issues like smoking,

[178]

politics, or religion.

[180]

People answering the questions want to be seen in a positive light, and so they tend

[183]

to give the answer they think is “appropriate”.

[185]

While having people fill surveys out anonymously by themselves can help, it can sometimes be

[190]

the case that respondents don’t want to admit things--even to themselves--that are

[194]

socially undesirable.

[195]

In general terms, good survey questions are worded in a neutral way such as asking “how

[200]

often do you exercise” or “describe your smoking habits” instead of using wording

[205]

or options that push survey takers in a certain direction.

[208]

And while your doctor wouldn’t...or shouldn’t...do this...sometimes groups purposely use biased

[213]

questions in their surveys to get the results that they want.

[216]

Apparently, back in 1972, Virginia Slims conducted a poll asking respondents if they would agree

[223]

with the statement: “There won’t be a woman President of the United States for a

[228]

long time and that’s probably just as well.”

[231]

Not a well-written question.

[233]

Biased questions can be more subtle...and can lead to skewed reports of very serious

[236]

things like sexual assault, or mental health conditions.

[239]

It’s important to always look for biased questions in surveys, especially when the

[244]

people giving the survey stand to benefit from a certain response.

[247]

Even when researchers have created a non-biased survey, they still need to get it into the

[252]

right hands.

[253]

Ideally, a survey should go to a random sample of the population that they’re interested

[257]

in.

[258]

Usually this means using a random number generator to pick who gets the survey.

[262]

We do Simple Random Sampling so that there’s no pattern or system for selecting respondents

[267]

and each respondent has an equal chance of being selected.

[270]

For example, telephone surveys often use Random Digit Dialing which selects 7 random digits

[274]

and dials them.

[276]

When someone picks up, they’re asked to take a survey.

[278]

But here’s where we hit our first issue.

[279]

If people aren’t forced to respond to the survey, we might experience something called

[283]

Non-Response Bias in which the people who are most likely to complete a survey are systematically

[289]

different from those who don’t.

[291]

For example, people with non-traditional working schedules like retirees, stay at home parents,

[295]

or people who work from home might be more likely to answer a middle-of-the-day phone

[300]

survey.

[301]

This is a huge problem if those groups are different than the population as a whole.

[305]

If your survey was on health insurance plans, or political opinions, it’s likely that

[309]

these three groups would have different opinions than the population, but they represent the

[313]

majority of survey responses, which means your data won’t represent the total population

[319]

very well.

[320]

This is also related to Voluntary Response Bias in which people who choose to respond

[324]

to voluntary surveys they see on Facebook...or Twitter... are people who again, are different

[330]

than the broad population.

[331]

This is especially true with things like customer service surveys.

[334]

People who respond tend to have either very positive or very negative opinions.

[339]

See the comment section below.

[341]

The majority of customers with an average experience tend not to respond because service

[346]

wasn’t noteworthy.

[347]

Wait.

[348]

Does that mean I’m not noteworthy?

[350]

Another source of bias is just plain underrepresentation.

[353]

If a group of interest is a minority in the population, random sampling paired with response

[358]

biases might mean that that minority isn’t represented at all in the sample.

[363]

Let's say there is a city where 5% of the population is single mothers, it’s entirely possible

[368]

that the sample will contain no single moms.

[371]

To overcome these issues, we have a couple options.

[373]

We could weight people’s responses so that they match the population (like, counting

[377]

the few single mothers who do respond multiple times so that they count for 5% of the total sample).

[384]

But, this can be problematic for the same reasons that response bias is problematic.

[388]

If the few single mothers who respond don’t represent all single mothers, our data is

[393]

still biased.

[394]

In a 2016 LA Times/USC political tracking poll, a 19-year-old black man was one of 3,000

[400]

panelists who was interviewed week after week about the upcoming presidential election.

[404]

Because he was a member of more than one group that was underrepresented in this poll, his

[409]

response was weighted 30x more than the average respondent.

[413]

According to the New York Times, his survey boost his candidate’s margins by an entire

[418]

percentage point.

[419]

Stratified Random Sampling is another option.

[422]

It splits the population into groups of interest and randomly selects people from each of the

[426]

“stratas” so that each group in the overall sample is represented appropriately.

[431]

Researchers have used stratified sampling to study differences in the way same-sex and

[435]

different-sex couples parent their kids.

[437]

They randomly select people from the same-sex parenting group and... randomly select people

[441]

from a different-sex group of parents to make sure that they’re well represented in the

[445]

sample.

[446]

Another issue is that getting surveys to people can be expensive.

[449]

If a cereal company wants to see how families react to their new cereal, it would be costly

[454]

to send some cereal to a random sample of all families in the country.

[457]

Instead they use Cluster Sampling which create clusters (not Honey Nut Clusters) that are naturally occuring (like

[464]

schools, or cities) and randomly select a few clusters to survey instead of randomly

[469]

selecting individuals.

[470]

For this to work, clusters cannot be systematically different than the population as a whole and

[476]

and they should about equally represent all groups.

[479]

Issues can also arise when the population being surveyed is very small or difficult

[483]

to reach, like children with rare genetic disorders, or people addicted to certain drugs.

[488]

In this case, surveyors may choose to not use randomness at all, and instead use Snowball Sampling.

[493]

That’s when current respondents are asked to help recruit people they know from the

[496]

population of interest... since people tend to know others in their communities and can

[500]

help researchers get more responses.

[502]

And note that these sampling techniques can and are used in experiments as well as surveys.

[507]

There are also non-experimental data collection methods like a Census.

[511]

A Census is a survey that samples an ENTIRE population.

[515]

The United States conducts a Census every 10 years, with the next one scheduled to be

[519]

done in 2020.

[520]

It attempts to collect data from every.single.resident of the United States (even undocumented residents,

[525]

and homeless residents).

[526]

As you can imagine, this is hard, and it is not without error.

[528]

In Medieval Europe, William the I of England conducted a census in order to properly tax

[533]

the people he had conquered.

[535]

In fact a lot of rulers tended to use censuses to know just how much money they should be

[539]

demanding.

[540]

Until the widespread availability of computers, the US census data took almost 10 years to

[545]

collect and analyze.

[547]

Meaning that the data from the last census wasn’t even available until right before

[551]

the next census.

[552]

The length of time it took to complete the census is part of the reason we even have

[557]

computers...check out our CompSci series for more on that.

[559]

So why collect census data--instead of just sampling the population?

[563]

In the US--the Census could cost more than 15 Billion dollars in 2020.

[567]

There are a lot of reasons.

[568]

The constitution says we have to, but also the census provides the truest measure of

[572]

the population we can get.

[574]

It minimizes sampling error.

[575]

It also functions as a benchmark for future studies.

[578]

And a census can give researchers really specific information about small groups of the population--information

[584]

that might be hard to gather with regular sampling methods.

[588]

Doing statistics on Census data is different, because most statistical inference aims to

[592]

take a small sample and use it to make guesses about that population.

[596]

But with a census we already have data from the entire population, we don’t need to

[601]

guess if there are differences, we can just see them.

[603]

Analysis on Census data is usually more concerned with whether differences we see are large

[607]

enough to make a difference in everyday life, rather than guessing IF there is a relationship.

[611]

The census as we said can take years.

[613]

And entire countries to fund.

[615]

That doesn’t discount the value of sampling.

[617]

But we should be cautious...Badly worded polls, fake polls, and biased polls are common.

[623]

So are the results of those polls.

[624]

The statistics-friendly website FiveThirtyEight put together a great list of advice on how

[629]

not-to-fall for a fake poll.

[631]

Among its advice--Ask yourself if it seems professional.

[634]

Check to see who conducted the poll--and if you trust them.

[637]

See how the poll was conducted.

[639]

Check out the questions they asked...and who they asked.

[642]

If it seems fishy.

[643]

It probably is fishy.

[644]

That said, well done surveys are essential.

[647]

They allow us to get information without all the trouble of doing an experiment, and since

[651]

they’re comparatively easy, they’re popular ways for businesses, countries, and even Youtube

[656]

channels to collect information.

[657]

In fact Crash Course Statistics has its own survey! The Link is in the description.

[662]

And it takes way less time than the Nerdfighteria one. I promise.

[665]

Thanks for watching. I'll see you next time.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage