🔍

Probability Part 2: Updating Your Beliefs with Bayes: Crash Course Statistics #14 - YouTube

Channel: CrashCourse

[3]

Hi, I’m Adriene Hill, and Welcome back to Crash Course Statistics. We ended the last

[7]

episode by talking about Conditional Probabilities which helped us find the probability of one

[13]

event, given that a second event had already happened.

[16]

But now I want to give you a better idea of why this is true and how this formula--with

[20]

a few small tweaks--has revolutionized the field of statistics.

[25]

INTRO

[34]

In general terms, Conditional Probability says that the probability of an event, B,

[39]

given that event A has already happened, is the probability of A and B happening together,

[46]

Divided by the probability of A happening - that’s the general formula, but let’s

[51]

give you a concrete example so we can visualize it.

[54]

Here’s a Venn Diagram of two events, An Email containing the words “Nigerian Prince”

[58]

and an Email being Spam.

[61]

So I get an email that has the words “Nigerian Prince” in it, and I want to know what the

[66]

probability is that this email is Spam, given that I already know the email contains the

[71]

words “Nigerian Prince.” This is the equation.

[74]

Alright, let’s take this part a little. On the Venn Diagram, I can represent the fact

[78]

that I know the words “Nigerian Prince” already happened by only looking at the events

[83]

where Nigerian Prince occurs, so just this circle.

[86]

Now inside this circle I have two areas, areas where the email is spam, and areas

[91]

where it’s not. According to our formula, the probability of spam given Nigerian Prince

[98]

is the probability of spam AND Nigerian Prince which is this region... where they overlap…divided

[105]

by Probability of Nigerian Prince which is the whole circle that we’re looking at.

[110]

Now...if we want to know the proportion of times when an email is Spam given that we

[114]

already know it has the words “Nigerian Prince”, we need to look at how much of

[119]

the whole Nigerian Prince circle that the region with both Spam and Nigerian Prince

[125]

covers.

[126]

And actually, some email servers use a slightly more complex version of this example to filter

[131]

spam. These filters are called Naive Bayes filters, and thanks to them, you don’t have

[136]

to worry about seeing the desperate pleas of a surprisingly large number of Nigerian

[142]

Princes.

[143]

The Bayes in Naive Bayes comes from the Reverend Thomas Bayes, a Presbyterian minister who

[148]

broke up his days of prayer, with math. His largest contribution to the field of math

[153]

and statistics is a slightly expanded version of our conditional probability formula.

[158]

Bayes Theorem states that:

[160]

The probability of B given A, is equal to the Probability of A given B times the Probability

[166]

of B all divided by the Probability of A

[170]

You can see that this is just one step away from our conditional probability formula.

[174]

The only change is in the numerator where P(A and B) is replaced with P(A|B)P(B). While

[183]

the math of this equality is more than we’ll go into here, you can see with some venn-diagram-algebra

[189]

why this is the case.

[190]

In this form, the equation is known as Bayes’ Theorem, and it has inspired a strong movement

[195]

in both the statistics and science worlds.

[198]

Just like with your emails, Bayes Theorem allows us to figure out the probability that

[202]

you have a piece of spam on your hands using information that we already have, the presence

[207]

of the words “Nigerian Prince”.

[209]

We can also compare that probability to the probability that you just got a perfectly

[213]

valid email about Nigerian Princes. If you just tried to guess your odds of an email

[218]

being spam based on the rate of spam to non-spam email, you’d be missing some pretty useful

[224]

information--the actual words in the email!

[226]

Bayesian statistics is all about UPDATING your beliefs based on new information. When

[231]

you receive an email, you don’t necessarily think it’s spam, but once you see the word

[236]

Nigerian you’re suspicious. It may just be your Aunt Judy telling you what she saw

[240]

on the news, but as soon as you see “Nigerian” and “Prince” together, you’re pretty

[245]

convinced that this is junkmail.

[247]

Remember our Lady Tasting Tea example... where a woman claimed to have superior taste buds

[251]

...that allowed her to know--with one sip--whether tea or milk was poured into a cup first? When

[256]

you’re watching this lady predict whether the tea or milk was poured first, each correct

[261]

guess makes you believe her just a little bit more.

[263]

A few correct guesses may not convince you, but each correct prediction is a little more

[267]

evidence she has some weird super-tasting tea powers.

[271]

Reverend Bayes described this idea of “updating” in a thought experiment.

[275]

Say that you’re standing next to a pool table but you’re faced away from it, so

[278]

you can’t see anything on it. You then have your friend randomly drop a ball onto the

[283]

table, and this is a special, very even table, so the ball has an equal chance of landing

[289]

anywhere on it. Your mission--is to guess how far to the right or left this ball is.

[295]

You have your friend drop another ball onto the table and report whether it’s to the

[298]

left or to the right of the original ball. The new ball is to the right of the original,

[302]

so, we can update our belief about where the ball is.

[306]

If the original is more towards the left, than most of the new balls will fall to the

[310]

right of our original, just because there’s more area there. And the further to the left

[315]

it is, the higher the ratio of new rights to lefts

[319]

Since this new ball is to the right, that means there’s a better chance that our original

[325]

is more toward the left side of the table than the right, since there would be more

[329]

“room” for the new ball to land.

[331]

Each ball that lands to the right of the original is more evidence that our original is towards

[335]

the left of the table. But, if we get a ball landing on the left of our original, then

[340]

we know the original is not at the very left edge. Again, Each new piece of information

[346]

allows us to change our beliefs about the location of the ball, and changing beliefs

[351]

is what Bayesian statistics is all about.

[353]

Outside thought experiments, Bayesian Statistics is being used in many different ways, from

[358]

comparing treatments in medical trials, to helping robots learn language. It’s being

[362]

used by cancer researchers, ecologists, and physicists.

[366]

And this method of thinking about statistics...updating existing information with what’s come before...may

[371]

be different from the logic of some of the statistical tests that you’ve heard of--like

[375]

the t-test. Those Frequentist statistics can sometimes be more like probability done in

[380]

a vacuum. Less reliant on prior knowledge.

[383]

When the math of probability gets hard to wrap your head around, we can use simulations

[389]

to help see these rules in action. Simulations take rules and create a pretend universe that

[394]

follows those rules.

[396]

Let’s say you’re the boss of a company, and you receive news that one of your employees,

[399]

Joe, has failed a drug test. It’s hard to believe. You remember seeing this thing on

[404]

YouTube that told you how to figure out the probability that Joe really is on drugs given

[410]

that he got a positive test.

[411]

You can’t remember exactly what the formula is...but you could always run a simulation.

[416]

Simulations are nice, because we can just tell our computer some rules, and it will

[419]

randomly generate data based on those rules.

[422]

For example, we can tell it the base rate of people in our state that are on drugs,

[426]

the sensitivity (how many true positives we get) of the drug test... and specificity (how

[431]

many true negatives we get). Then we ask our computer to generate 10,000 simulated people

[437]

and tell us what percent of the time people with positive drug tests were actually on

[442]

drugs.

[443]

If the drug Joe tested positive for--in this case Glitterstim--is only used by about 5%

[447]

of the population, and the test for Glitterstim has a 90% sensitivity and 95% specificity,

[454]

I can plug that in and ask the computer to simulate 10,000 people according to these

[459]

rules.

[460]

And when we ran this simulation, only 49.2% of the people who tested positive were actually

[466]

using Glitterstim. So I should probably give Joe another chance...or another test.

[471]

And if I did the math, I’d see that 49.2% is pretty close since the theoretical answer

[476]

is around 48.6%. Simulations can help reveal truths about probability, even without formulas.

[482]

They’re a great way to demonstrate probability and create intuition that can stand alone

[487]

or build on top of more mathematical approaches to probability.

[491]

Let’s use one to demonstrate an important concept in probability that makes it possible

[496]

to use samples of data to make inferences about a population: the Law of Large Numbers.

[503]

In fact we were secretly relying on it when we used empirical probabilities--like how

[507]

many times I got tails when flipping a coin 10 times--to estimate theoretical probabilities--like

[513]

the true probability of getting tails.

[515]

In its weak form, Law of Large Numbers tells us that as our samples of data get bigger

[520]

and bigger, our sample mean will be ‘arbitrarily’ close to the true population mean.

[525]

Before we go into more detail, let’s see a simulation and if you want to follow along

[529]

or run it on your own - instructions are in the description below.

[533]

In this simulation we’re picking values from a new intelligence test--from the normal

[537]

distribution, that has a mean of 50 and a standard deviation of 20. When you have a

[541]

very small sample size, say 2, your sample means are all over the place.

[546]

You can see that pretty much anything goes, we see means between 5 and 95. And this makes

[552]

sense, when we only have two data points in our sample, it’s not that unlikely that

[556]

we get two really small numbers, or two pretty big numbers, which is why we see both low

[561]

and high sample means. Though we can tell that a lot of the means

[565]

are around the true mean of 50 because the histogram is the tallest at values around

[570]

50.

[571]

But once we increase the sample size, even to just 100 values, you can see that the sample

[576]

means are mostly around the real mean of 50. In fact all of the sample means are within

[581]

10 units of the true population mean.

[584]

And when we go up to 1000, just about every sample mean is very very close to the true

[588]

mean. And when you run this simulation over and over, you’ll see pretty similar results.

[593]

The neat thing is that the Law of Large numbers applies to almost any distribution as long

[598]

as the distribution doesn’t have an infinite variance.

[601]

Take the uniform distribution which looks like a rectangle. Imagine a 100-sided die,

[606]

every single value is equally probable.

[609]

Even the sample means that are selected from a uniform distribution get closer and closer

[613]

to the true mean of 50..

[616]

The law of large numbers is the evidence we need to feel confident that the mean of the

[620]

samples we analyze is a pretty good guess for the true population mean. And the bigger

[625]

our samples are, the better we think the guess is! This property allows us to make guesses

[629]

about populations, based on samples.

[632]

It also explains why casinos make money in the long run over hundreds of thousands of

[636]

payouts and losses, even if the experience of each person varies a lot. The casino looks

[642]

at a huge sample--every single bet and payout--whereas your sample as an individual is smaller, and

[647]

therefore less likely to be representative.

[650]

Each of these concepts can help us another way ...another way to look at the data around

[654]

us. The Bayesian framework shows us that every event or data point can and should “update”

[660]

your beliefs but it doesn’t mean you need to completely change your mind.

[664]

And simulations allow us to build upon these observations when the underlying mechanics

[668]

aren’t so clear.

[669]

We are continuously accumulating evidence and modifying our beliefs everyday, adding

[674]

today's events to our conception of how the world works. And hey, maybe one day we’ll

[679]

all start sincerely emailing each other about Nigerian Princes.

[682]

Then we’re gonna have to do some belief-updating. Thanks for watching. I’ll see you next time.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage