Probability Part 2: Updating Your Beliefs with Bayes: Crash Course Statistics #14 - YouTube

Channel: CrashCourse

[3]
Hi, I’m Adriene Hill, and Welcome back to Crash Course Statistics. We ended the last
[7]
episode by talking about Conditional Probabilities which helped us find the probability of one
[13]
event, given that a second event had already happened.
[16]
But now I want to give you a better idea of why this is true and how this formula--with
[20]
a few small tweaks--has revolutionized the field of statistics.
[25]
INTRO
[34]
In general terms, Conditional Probability says that the probability of an event, B,
[39]
given that event A has already happened, is the probability of A and B happening together,
[46]
Divided by the probability of A happening - that’s the general formula, but let’s
[51]
give you a concrete example so we can visualize it.
[54]
Here’s a Venn Diagram of two events, An Email containing the words “Nigerian Prince”
[58]
and an Email being Spam.
[61]
So I get an email that has the words “Nigerian Prince” in it, and I want to know what the
[66]
probability is that this email is Spam, given that I already know the email contains the
[71]
words “Nigerian Prince.” This is the equation.
[74]
Alright, let’s take this part a little. On the Venn Diagram, I can represent the fact
[78]
that I know the words “Nigerian Prince” already happened by only looking at the events
[83]
where Nigerian Prince occurs, so just this circle.
[86]
Now inside this circle I have two areas, areas where the email is spam, and areas
[91]
where it’s not. According to our formula, the probability of spam given Nigerian Prince
[98]
is the probability of spam AND Nigerian Prince which is this region... where they overlap
divided
[105]
by Probability of Nigerian Prince which is the whole circle that we’re looking at.
[110]
Now...if we want to know the proportion of times when an email is Spam given that we
[114]
already know it has the words “Nigerian Prince”, we need to look at how much of
[119]
the whole Nigerian Prince circle that the region with both Spam and Nigerian Prince
[125]
covers.
[126]
And actually, some email servers use a slightly more complex version of this example to filter
[131]
spam. These filters are called Naive Bayes filters, and thanks to them, you don’t have
[136]
to worry about seeing the desperate pleas of a surprisingly large number of Nigerian
[142]
Princes.
[143]
The Bayes in Naive Bayes comes from the Reverend Thomas Bayes, a Presbyterian minister who
[148]
broke up his days of prayer, with math. His largest contribution to the field of math
[153]
and statistics is a slightly expanded version of our conditional probability formula.
[158]
Bayes Theorem states that:
[160]
The probability of B given A, is equal to the Probability of A given B times the Probability
[166]
of B all divided by the Probability of A
[170]
You can see that this is just one step away from our conditional probability formula.
[174]
The only change is in the numerator where P(A and B) is replaced with P(A|B)P(B). While
[183]
the math of this equality is more than we’ll go into here, you can see with some venn-diagram-algebra
[189]
why this is the case.
[190]
In this form, the equation is known as Bayes’ Theorem, and it has inspired a strong movement
[195]
in both the statistics and science worlds.
[198]
Just like with your emails, Bayes Theorem allows us to figure out the probability that
[202]
you have a piece of spam on your hands using information that we already have, the presence
[207]
of the words “Nigerian Prince”.
[209]
We can also compare that probability to the probability that you just got a perfectly
[213]
valid email about Nigerian Princes. If you just tried to guess your odds of an email
[218]
being spam based on the rate of spam to non-spam email, you’d be missing some pretty useful
[224]
information--the actual words in the email!
[226]
Bayesian statistics is all about UPDATING your beliefs based on new information. When
[231]
you receive an email, you don’t necessarily think it’s spam, but once you see the word
[236]
Nigerian you’re suspicious. It may just be your Aunt Judy telling you what she saw
[240]
on the news, but as soon as you see “Nigerian” and “Prince” together, you’re pretty
[245]
convinced that this is junkmail.
[247]
Remember our Lady Tasting Tea example... where a woman claimed to have superior taste buds
[251]
...that allowed her to know--with one sip--whether tea or milk was poured into a cup first? When
[256]
you’re watching this lady predict whether the tea or milk was poured first, each correct
[261]
guess makes you believe her just a little bit more.
[263]
A few correct guesses may not convince you, but each correct prediction is a little more
[267]
evidence she has some weird super-tasting tea powers.
[271]
Reverend Bayes described this idea of “updating” in a thought experiment.
[275]
Say that you’re standing next to a pool table but you’re faced away from it, so
[278]
you can’t see anything on it. You then have your friend randomly drop a ball onto the
[283]
table, and this is a special, very even table, so the ball has an equal chance of landing
[289]
anywhere on it. Your mission--is to guess how far to the right or left this ball is.
[295]
You have your friend drop another ball onto the table and report whether it’s to the
[298]
left or to the right of the original ball. The new ball is to the right of the original,
[302]
so, we can update our belief about where the ball is.
[306]
If the original is more towards the left, than most of the new balls will fall to the
[310]
right of our original, just because there’s more area there. And the further to the left
[315]
it is, the higher the ratio of new rights to lefts
[319]
Since this new ball is to the right, that means there’s a better chance that our original
[325]
is more toward the left side of the table than the right, since there would be more
[329]
“room” for the new ball to land.
[331]
Each ball that lands to the right of the original is more evidence that our original is towards
[335]
the left of the table. But, if we get a ball landing on the left of our original, then
[340]
we know the original is not at the very left edge. Again, Each new piece of information
[346]
allows us to change our beliefs about the location of the ball, and changing beliefs
[351]
is what Bayesian statistics is all about.
[353]
Outside thought experiments, Bayesian Statistics is being used in many different ways, from
[358]
comparing treatments in medical trials, to helping robots learn language. It’s being
[362]
used by cancer researchers, ecologists, and physicists.
[366]
And this method of thinking about statistics...updating existing information with what’s come before...may
[371]
be different from the logic of some of the statistical tests that you’ve heard of--like
[375]
the t-test. Those Frequentist statistics can sometimes be more like probability done in
[380]
a vacuum. Less reliant on prior knowledge.
[383]
When the math of probability gets hard to wrap your head around, we can use simulations
[389]
to help see these rules in action. Simulations take rules and create a pretend universe that
[394]
follows those rules.
[396]
Let’s say you’re the boss of a company, and you receive news that one of your employees,
[399]
Joe, has failed a drug test. It’s hard to believe. You remember seeing this thing on
[404]
YouTube that told you how to figure out the probability that Joe really is on drugs given
[410]
that he got a positive test.
[411]
You can’t remember exactly what the formula is...but you could always run a simulation.
[416]
Simulations are nice, because we can just tell our computer some rules, and it will
[419]
randomly generate data based on those rules.
[422]
For example, we can tell it the base rate of people in our state that are on drugs,
[426]
the sensitivity (how many true positives we get) of the drug test... and specificity (how
[431]
many true negatives we get). Then we ask our computer to generate 10,000 simulated people
[437]
and tell us what percent of the time people with positive drug tests were actually on
[442]
drugs.
[443]
If the drug Joe tested positive for--in this case Glitterstim--is only used by about 5%
[447]
of the population, and the test for Glitterstim has a 90% sensitivity and 95% specificity,
[454]
I can plug that in and ask the computer to simulate 10,000 people according to these
[459]
rules.
[460]
And when we ran this simulation, only 49.2% of the people who tested positive were actually
[466]
using Glitterstim. So I should probably give Joe another chance...or another test.
[471]
And if I did the math, I’d see that 49.2% is pretty close since the theoretical answer
[476]
is around 48.6%. Simulations can help reveal truths about probability, even without formulas.
[482]
They’re a great way to demonstrate probability and create intuition that can stand alone
[487]
or build on top of more mathematical approaches to probability.
[491]
Let’s use one to demonstrate an important concept in probability that makes it possible
[496]
to use samples of data to make inferences about a population: the Law of Large Numbers.
[503]
In fact we were secretly relying on it when we used empirical probabilities--like how
[507]
many times I got tails when flipping a coin 10 times--to estimate theoretical probabilities--like
[513]
the true probability of getting tails.
[515]
In its weak form, Law of Large Numbers tells us that as our samples of data get bigger
[520]
and bigger, our sample mean will be ‘arbitrarily’ close to the true population mean.
[525]
Before we go into more detail, let’s see a simulation and if you want to follow along
[529]
or run it on your own - instructions are in the description below.
[533]
In this simulation we’re picking values from a new intelligence test--from the normal
[537]
distribution, that has a mean of 50 and a standard deviation of 20. When you have a
[541]
very small sample size, say 2, your sample means are all over the place.
[546]
You can see that pretty much anything goes, we see means between 5 and 95. And this makes
[552]
sense, when we only have two data points in our sample, it’s not that unlikely that
[556]
we get two really small numbers, or two pretty big numbers, which is why we see both low
[561]
and high sample means. Though we can tell that a lot of the means
[565]
are around the true mean of 50 because the histogram is the tallest at values around
[570]
50.
[571]
But once we increase the sample size, even to just 100 values, you can see that the sample
[576]
means are mostly around the real mean of 50. In fact all of the sample means are within
[581]
10 units of the true population mean.
[584]
And when we go up to 1000, just about every sample mean is very very close to the true
[588]
mean. And when you run this simulation over and over, you’ll see pretty similar results.
[593]
The neat thing is that the Law of Large numbers applies to almost any distribution as long
[598]
as the distribution doesn’t have an infinite variance.
[601]
Take the uniform distribution which looks like a rectangle. Imagine a 100-sided die,
[606]
every single value is equally probable.
[609]
Even the sample means that are selected from a uniform distribution get closer and closer
[613]
to the true mean of 50..
[616]
The law of large numbers is the evidence we need to feel confident that the mean of the
[620]
samples we analyze is a pretty good guess for the true population mean. And the bigger
[625]
our samples are, the better we think the guess is! This property allows us to make guesses
[629]
about populations, based on samples.
[632]
It also explains why casinos make money in the long run over hundreds of thousands of
[636]
payouts and losses, even if the experience of each person varies a lot. The casino looks
[642]
at a huge sample--every single bet and payout--whereas your sample as an individual is smaller, and
[647]
therefore less likely to be representative.
[650]
Each of these concepts can help us another way ...another way to look at the data around
[654]
us. The Bayesian framework shows us that every event or data point can and should “update”
[660]
your beliefs but it doesn’t mean you need to completely change your mind.
[664]
And simulations allow us to build upon these observations when the underlying mechanics
[668]
aren’t so clear.
[669]
We are continuously accumulating evidence and modifying our beliefs everyday, adding
[674]
today's events to our conception of how the world works. And hey, maybe one day we’ll
[679]
all start sincerely emailing each other about Nigerian Princes.
[682]
Then we’re gonna have to do some belief-updating. Thanks for watching. I’ll see you next time.