đ
Probability Part 2: Updating Your Beliefs with Bayes: Crash Course Statistics #14 - YouTube
Channel: CrashCourse
[3]
Hi, Iâm Adriene Hill, and Welcome back to
Crash Course Statistics. We ended the last
[7]
episode by talking about Conditional Probabilities
which helped us find the probability of one
[13]
event, given that a second event had already
happened.
[16]
But now I want to give you a better idea of
why this is true and how this formula--with
[20]
a few small tweaks--has revolutionized the
field of statistics.
[25]
INTRO
[34]
In general terms, Conditional Probability
says that the probability of an event, B,
[39]
given that event A has already happened, is
the probability of A and B happening together,
[46]
Divided by the probability of A happening
- thatâs the general formula, but letâs
[51]
give you a concrete example so we can visualize
it.
[54]
Hereâs a Venn Diagram of two events, An
Email containing the words âNigerian Princeâ
[58]
and an Email being Spam.
[61]
So I get an email that has the words âNigerian
Princeâ in it, and I want to know what the
[66]
probability is that this email is Spam, given
that I already know the email contains the
[71]
words âNigerian Prince.â This is the equation.
[74]
Alright, letâs take this part a little.
On the Venn Diagram, I can represent the fact
[78]
that I know the words âNigerian Princeâ
already happened by only looking at the events
[83]
where Nigerian Prince occurs, so just this
circle.
[86]
Now inside this circle I have two areas, areas where the email is spam, and areas
[91]
where itâs not. According to our formula,
the probability of spam given Nigerian Prince
[98]
is the probability of spam AND Nigerian Prince
which is this region... where they overlapâŠdivided
[105]
by Probability of Nigerian Prince which is
the whole circle that weâre looking at.
[110]
Now...if we want to know the proportion of
times when an email is Spam given that we
[114]
already know it has the words âNigerian
Princeâ, we need to look at how much of
[119]
the whole Nigerian Prince circle that the
region with both Spam and Nigerian Prince
[125]
covers.
[126]
And actually, some email servers use a slightly
more complex version of this example to filter
[131]
spam. These filters are called Naive Bayes
filters, and thanks to them, you donât have
[136]
to worry about seeing the desperate pleas
of a surprisingly large number of Nigerian
[142]
Princes.
[143]
The Bayes in Naive Bayes comes from the Reverend
Thomas Bayes, a Presbyterian minister who
[148]
broke up his days of prayer, with math. His
largest contribution to the field of math
[153]
and statistics is a slightly expanded version
of our conditional probability formula.
[158]
Bayes Theorem states that:
[160]
The probability of B given A, is equal to
the Probability of A given B times the Probability
[166]
of B all divided by the Probability of A
[170]
You can see that this is just one step away
from our conditional probability formula.
[174]
The only change is in the numerator where
P(A and B) is replaced with P(A|B)P(B). While
[183]
the math of this equality is more than weâll
go into here, you can see with some venn-diagram-algebra
[189]
why this is the case.
[190]
In this form, the equation is known as Bayesâ
Theorem, and it has inspired a strong movement
[195]
in both the statistics and science worlds.
[198]
Just like with your emails, Bayes Theorem
allows us to figure out the probability that
[202]
you have a piece of spam on your hands using
information that we already have, the presence
[207]
of the words âNigerian Princeâ.
[209]
We can also compare that probability to the
probability that you just got a perfectly
[213]
valid email about Nigerian Princes. If you
just tried to guess your odds of an email
[218]
being spam based on the rate of spam to non-spam
email, youâd be missing some pretty useful
[224]
information--the actual words in the email!
[226]
Bayesian statistics is all about UPDATING
your beliefs based on new information. When
[231]
you receive an email, you donât necessarily
think itâs spam, but once you see the word
[236]
Nigerian youâre suspicious. It may just
be your Aunt Judy telling you what she saw
[240]
on the news, but as soon as you see âNigerianâ
and âPrinceâ together, youâre pretty
[245]
convinced that this is junkmail.
[247]
Remember our Lady Tasting Tea example... where
a woman claimed to have superior taste buds
[251]
...that allowed her to know--with one sip--whether
tea or milk was poured into a cup first? When
[256]
youâre watching this lady predict whether
the tea or milk was poured first, each correct
[261]
guess makes you believe her just a little
bit more.
[263]
A few correct guesses may not convince you,
but each correct prediction is a little more
[267]
evidence she has some weird super-tasting
tea powers.
[271]
Reverend Bayes described this idea of âupdatingâ
in a thought experiment.
[275]
Say that youâre standing next to a pool
table but youâre faced away from it, so
[278]
you canât see anything on it. You then have
your friend randomly drop a ball onto the
[283]
table, and this is a special, very even table,
so the ball has an equal chance of landing
[289]
anywhere on it. Your mission--is to guess
how far to the right or left this ball is.
[295]
You have your friend drop another ball onto
the table and report whether itâs to the
[298]
left or to the right of the original ball.
The new ball is to the right of the original,
[302]
so, we can update our belief about where the
ball is.
[306]
If the original is more towards the left,
than most of the new balls will fall to the
[310]
right of our original, just because thereâs
more area there. And the further to the left
[315]
it is, the higher the ratio of new rights
to lefts
[319]
Since this new ball is to the right, that
means thereâs a better chance that our original
[325]
is more toward the left side of the table
than the right, since there would be more
[329]
âroomâ for the new ball to land.
[331]
Each ball that lands to the right of the original
is more evidence that our original is towards
[335]
the left of the table. But, if we get a ball
landing on the left of our original, then
[340]
we know the original is not at the very left
edge. Again, Each new piece of information
[346]
allows us to change our beliefs about the
location of the ball, and changing beliefs
[351]
is what Bayesian statistics is all about.
[353]
Outside thought experiments, Bayesian Statistics
is being used in many different ways, from
[358]
comparing treatments in medical trials, to
helping robots learn language. Itâs being
[362]
used by cancer researchers, ecologists, and
physicists.
[366]
And this method of thinking about statistics...updating
existing information with whatâs come before...may
[371]
be different from the logic of some of the
statistical tests that youâve heard of--like
[375]
the t-test. Those Frequentist statistics can
sometimes be more like probability done in
[380]
a vacuum. Less reliant on prior knowledge.
[383]
When the math of probability gets hard to
wrap your head around, we can use simulations
[389]
to help see these rules in action. Simulations
take rules and create a pretend universe that
[394]
follows those rules.
[396]
Letâs say youâre the boss of a company,
and you receive news that one of your employees,
[399]
Joe, has failed a drug test. Itâs hard to
believe. You remember seeing this thing on
[404]
YouTube that told you how to figure out the
probability that Joe really is on drugs given
[410]
that he got a positive test.
[411]
You canât remember exactly what the formula
is...but you could always run a simulation.
[416]
Simulations are nice, because we can just
tell our computer some rules, and it will
[419]
randomly generate data based on those rules.
[422]
For example, we can tell it the base rate
of people in our state that are on drugs,
[426]
the sensitivity (how many true positives we
get) of the drug test... and specificity (how
[431]
many true negatives we get). Then we ask our
computer to generate 10,000 simulated people
[437]
and tell us what percent of the time people
with positive drug tests were actually on
[442]
drugs.
[443]
If the drug Joe tested positive for--in this
case Glitterstim--is only used by about 5%
[447]
of the population, and the test for Glitterstim
has a 90% sensitivity and 95% specificity,
[454]
I can plug that in and ask the computer to
simulate 10,000 people according to these
[459]
rules.
[460]
And when we ran this simulation, only 49.2%
of the people who tested positive were actually
[466]
using Glitterstim. So I should probably give
Joe another chance...or another test.
[471]
And if I did the math, Iâd see that 49.2%
is pretty close since the theoretical answer
[476]
is around 48.6%. Simulations can help reveal
truths about probability, even without formulas.
[482]
Theyâre a great way to demonstrate probability
and create intuition that can stand alone
[487]
or build on top of more mathematical approaches
to probability.
[491]
Letâs use one to demonstrate an important
concept in probability that makes it possible
[496]
to use samples of data to make inferences
about a population: the Law of Large Numbers.
[503]
In fact we were secretly relying on it when
we used empirical probabilities--like how
[507]
many times I got tails when flipping a coin
10 times--to estimate theoretical probabilities--like
[513]
the true probability of getting tails.
[515]
In its weak form, Law of Large Numbers tells
us that as our samples of data get bigger
[520]
and bigger, our sample mean will be âarbitrarilyâ close to the true population mean.
[525]
Before we go into more detail, letâs see
a simulation and if you want to follow along
[529]
or run it on your own - instructions are in
the description below.
[533]
In this simulation weâre picking values
from a new intelligence test--from the normal
[537]
distribution, that has a mean of 50 and a
standard deviation of 20. When you have a
[541]
very small sample size, say 2, your sample
means are all over the place.
[546]
You can see that pretty much anything goes,
we see means between 5 and 95. And this makes
[552]
sense, when we only have two data points in
our sample, itâs not that unlikely that
[556]
we get two really small numbers, or two pretty
big numbers, which is why we see both low
[561]
and high sample means.
Though we can tell that a lot of the means
[565]
are around the true mean of 50 because the
histogram is the tallest at values around
[570]
50.
[571]
But once we increase the sample size, even
to just 100 values, you can see that the sample
[576]
means are mostly around the real mean of 50.
In fact all of the sample means are within
[581]
10 units of the true population mean.
[584]
And when we go up to 1000, just about every
sample mean is very very close to the true
[588]
mean. And when you run this simulation over
and over, youâll see pretty similar results.
[593]
The neat thing is that the Law of Large numbers
applies to almost any distribution as long
[598]
as the distribution doesnât have an infinite
variance.
[601]
Take the uniform distribution which looks
like a rectangle. Imagine a 100-sided die,
[606]
every single value is equally probable.
[609]
Even the sample means that are selected from
a uniform distribution get closer and closer
[613]
to the true mean of 50..
[616]
The law of large numbers is the evidence we
need to feel confident that the mean of the
[620]
samples we analyze is a pretty good guess
for the true population mean. And the bigger
[625]
our samples are, the better we think the guess
is! This property allows us to make guesses
[629]
about populations, based on samples.
[632]
It also explains why casinos make money in
the long run over hundreds of thousands of
[636]
payouts and losses, even if the experience
of each person varies a lot. The casino looks
[642]
at a huge sample--every single bet and payout--whereas
your sample as an individual is smaller, and
[647]
therefore less likely to be representative.
[650]
Each of these concepts can help us another
way ...another way to look at the data around
[654]
us. The Bayesian framework shows us that every
event or data point can and should âupdateâ
[660]
your beliefs but it doesnât mean you need
to completely change your mind.
[664]
And simulations allow us to build upon these
observations when the underlying mechanics
[668]
arenât so clear.
[669]
We are continuously accumulating evidence
and modifying our beliefs everyday, adding
[674]
today's events to our conception of how the
world works. And hey, maybe one day weâll
[679]
all start sincerely emailing each other about
Nigerian Princes.
[682]
Then weâre gonna have to do some belief-updating. Thanks for watching. Iâll see you next time.
You can go back to the homepage right here: Homepage





