đ
Bayes theorem, the geometry of changing beliefs - YouTube
Channel: 3Blue1Brown
[0]
The goal is for you to come away from this
video understanding one of the most important
[3]
formulas in all of probability, Bayesâ theorem.
[7]
This formula is central to scientific discovery,
itâs a core tool in machine learning and
[12]
AI, and itâs even been used for treasure
hunting, when in the 80âs a small team led
[17]
by Tommy Thompson used Bayesian search tactics
to help uncover a ship that had sunk a century
[23]
and half earlier carrying what, in todayâs
terms, amounts to $700,000,000 worth of gold.
[30]
So it's a formula worth understanding.
[33]
But of course there were multiple different levels of possible understanding.
[37]
At the simplest thereâs just knowing what each part means, so
you can plug in numbers.
[42]
Then thereâs understanding why itâs true; and later
Iâm gonna show you a certain diagram thatâs helpful
[47]
for rediscovering the formula on the fly as
needed.
[51]
Then thereâs being able to recognize when
you need to use it.
[56]
With the goal of gaining a deeper understanding,
you and I will tackle these in reverse order.
[60]
So before dissecting the formula, or explaining
the visual that makes it obvious, Iâd like
[65]
to tell you about a man named Steve. Listen
carefully.
[72]
Steve is very shy and withdrawn, invariably
helpful but with very little interest in people
[78]
or in the world of reality. A meek and tidy
soul, he has a need for order and structure,
[83]
and a passion for detail.
[85]
Which of the following do you find more likely:
âSteve is a librarianâ, or âSteve is
[89]
a farmerâ?
[91]
Some of you may recognize this as an example
from a study conducted by the psychologists
[95]
Daniel Kahneman and Amos Tversky, whose Nobel-prize-winning work was popularized in books like âThinking
[103]
Fast and Slowâ, âThe Undoing Projectâ,
. They researched human
[108]
judgments, with a frequent focus on when these judgments irrationally contradict what the
[113]
laws of probability suggest they should be.
[116]
The example with Steve, the maybe-librarian-maybe-farmer,
illustrates one specific type of irrationality.
[122]
Or maybe I should say âallegedâ irrationality;
some people debate the conclusion, but more
[127]
on all that in a moment.
[130]
According to Kahneman and Tversky, after people are given this description of Steve as âmeek
[134]
and tidy soulâ, most say he is more likely
to be a librarian than a farmer. After all,
[139]
these traits line up better with the stereotypical view of a librarian than that of a farmer.
[143]
And according to Kahneman and Tversky, this is irrational.
[147]
The point is not whether people hold correct or biased views about the personalities of
[151]
librarians or farmers, itâs that almost
no one thinks to incorporate information about
[156]
ratio of farmers to librarians into their
judgments. In their paper, Kahneman and Tversky
[162]
said that in the US that ratio is about 20
to 1. The numbers I can find for today put
[167]
it much higher than that, but letâs just
run with the 20 to 1 ratio since itâs a
[171]
bit easier to illustrate, and proves the point
just as well.
[173]
To be clear, anyone who is asked this question is not expected to have perfect information on the
[179]
actual statistics of farmers, librarians,
and their personality traits. But the question
[184]
is whether people even think to consider this ratio, enough to make a rough estimate. Rationality
[190]
is not about knowing facts, itâs about recognizing which facts are relevant.
[196]
If you do think to make this estimate, thereâs a pretty simple way to reason about the question
[199]
â which, spoiler alert, involves all the
essential reasoning behind Bayesâ theorem.
[204]
You might start by picturing a representative sample of farmers and librarians, say, 200
[209]
farmers and 10 librarians. Then when you hear the meek and tidy soul description, letâs
[215]
say your gut instinct is that 40% of librarians would fit that description and that 10% of
[220]
farmers would. That would mean that from your sample, youâd expect that about 4 librarians
[226]
fit it, and that 20 farmers do. The probability that a random person who fits this description
[235]
is a librarian is 4/24, or 16.7%.
[240]
So even if you think a librarian is 4 times
as likely as a farmer to fit this description,
[245]
thatâs not enough to overcome the fact that
there are way more farmers. The upshot, and
[250]
this is the key mantra underlying Bayesâ
theorem, is that new evidence should not completely
[255]
determine your beliefs in a vacuum; it should update prior beliefs.
[261]
If this line of reasoning makes sense to you, the way seeing evidence restricts the space
[265]
of possibilities, and the ratio you need to consider after that, then congratulations! You understand the heart of Bayesâ theorem.
[273]
Maybe the numbers youâd estimate would be a little bit different, but what matters is how you fit
[277]
the numbers together to update a belief based on evidence. Here, see if you can take a minute
[285]
to generalize what we just did and write it
down as a formula.
[292]
The general situation where Bayesâ theorem is relevant is when you have some hypothesis,
[296]
say that Steve is a librarian, and you see
some evidence, say this verbal description
[302]
of Steve as a âmeek and tidy soulâ, and
you want to know the probability that the
[306]
hypothesis holds given that the evidence is
true. In the standard notation, this vertical
[312]
bar means âgiven thatâ. As in, weâre
restricting our view only to the possibilities
[317]
where the evidence holds.
[320]
The first relevant number is the probability
that the hypothesis holds before considering
[326]
the new evidence. In our example, that was
the 1/21, which came from considering the
[331]
ratio of farmers to librarians in the general
population. This is known as the prior.
[338]
After that, we needed to consider the proportion of librarians that fit this description; the
[342]
probability we would see the evidence given that the hypothesis is true. Again, when you
[348]
see this vertical bar, it means weâre talking
about a proportion of a limited part of the
[353]
total space of possibilities, in this cass,
limited to the left slide where the hypothesis
[358]
holds. In the context of Bayesâ theorem,
this value also has a special name, itâs
[363]
the âlikelihoodâ.
[364]
Similarly, we need to know how much of the other side of our space includes the evidence;
[369]
the probability of seeing the evidence given
that our hypothesis isnât true. This little
[375]
elbow symbol is commonly used to mean ânotâ in probability.
[380]
Now remember what our final answer was. The probability that our librarian hypothesis
[385]
is true given the evidence is the total number of librarians fitting the evidence, 4, divided
[391]
by the total number of people fitting the
evidence, 24.
[395]
Where does that 4 come from? Well itâs the
total number of people, times the prior probability
[401]
of being a librarian, giving us the 10 total
librarians, times the probability that one
[406]
of those fits the evidence. That same number shows up again in the denominator, but we
[412]
need to add in the total number of people
times the proportion who are not librarians,
[417]
times the proportion of those who fit the
evidence, which in our example gave 20.
[423]
The total number of people in our example,
210, gets canceled out â which of course
[427]
it should, that was just an arbitrary choice
we made for illustration â leaving us finally
[432]
with the more abstract representation purely in terms of probabilities. This, my friends,
[438]
is Bayesâ theorem.
[440]
You often see this big denominator written
more simply as P(E), the total probability
[446]
of seeing the evidence. In practice, to calculate it, you almost always have to break it down
[454]
into the case where the hypothesis is true,
and the one where it isnât.
[458]
Piling on one final bit of jargon, this final
answer is called the âposteriorâ; itâs
[465]
your belief about the hypothesis after seeing the evidence.
[470]
Writing it all out abstractly might seem more complicated than just thinking through the
[473]
example directly with a representative sample; and yeah, it is! Keep in mind, though, the
[480]
value of a formula like this is that it lets
you quantify and systematize the idea of changing
[486]
beliefs. Scientists use this formula when
analyzing the extent to which new data validates
[491]
or invalidates their models; programmers use it in building artificial intelligence, where
[497]
you sometimes want to explicitly and numerically model a machineâs belief. And honestly just
[502]
for how you view yourself, your own opinions and what it takes for your mind to change,
[506]
Bayesâ theorem can reframe how you think
about thought itself. Putting a formula to
[513]
it is also all the more important as the examples get more intricate.
[517]
However you end up writing it, Iâd actually
encourage you not to memorize the formula,
[522]
but to draw out this diagram as needed.
[524]
This is sort of the distilled version of thinking with a representative sample where we think
[529]
with areas instead of counts, which is more
flexible and easier to sketch on the fly.
[534]
Rather than bringing to mind some specific
number of examples, think of the space of
[538]
all possibilities as a 1x1 square. Any event
occupies some subset of this space, and the
[546]
probability of that event can be thought about as the area of that subset. For example, I
[552]
like to think of the hypothesis as filling
the left part of this square, with a width
[556]
of P(H).
[557]
I recognize Iâm being a bit repetitive,
but when you see evidence, the space of possibilities
[563]
gets restricted. Crucially, that restriction
may not happen evenly between the left and
[568]
the right. So the new probability for the
hypothesis is the proportion it occupies in
[574]
this restricted subspace.
[578]
If you happen to think a farmer is just as
likely to fit the evidence as a librarian,
[582]
then the proportion doesnât change, which
should make sense. Irrelevant evidence doesnât
[587]
change your belief. But when these likelihoods are very different, that's when your belief changes a lot.
[595]
This is actually a good time to step back
and consider a few broader takeaways about
[619]
how to make probability more intuitive, beyond Bayesâ theorem. First off, thereâs the
[624]
trick of thinking about a representative sample with a specific number of examples, like our
[629]
210 librarians and farmers. Thereâs actually
another Kahneman and Tversky result to this
[635]
effect, which is interesting enough to interject here.
[638]
They did an experiment similar to the one
with Steve, but where people were given the
[642]
following description of a fictitious woman
named Linda:
[646]
Linda is 31 years old, single, outspoken,
and very bright. She majored in philosophy.
[652]
As a student, she was deeply concerned with issues of discrimination and social justice,
[656]
and also participated in anti-nuclear demonstrations.
[660]
They were then asked what is more likely:
That Linda is a bank teller, or that Linda
[667]
is a bank teller and is active in the feminist
movement. 85% of participants said the latter
[674]
is more likely, even though the set of bank
tellers active in the femist movement is a
[681]
subset of the set of bank tellers!
[684]
But, whatâs fascinating is that thereâs
a simple way to rephrase the question that
[691]
dropped this error from 85% to 0. Instead,
if participants are told there are 100 people
[698]
who fit this description, and asked people
to estimate how many of those 100 are bank
[703]
tellers, and how many are bank tellers who
are active in the feminist movement, no one
[707]
makes the error. Everyone correctly assigns a higher number to the first option than to
[712]
the second.
[715]
Somehow a phrase like â40 out of 100â
kicks our intuition into gear more effectively
[720]
than â40%â, much less â0.4â, or abstractly
referencing the idea of something being more
[727]
or less likely.
[729]
That said, representative samples donât
easily capture the continuous nature of probability,
[734]
so turning to area is a nice alternative,
not just because of the continuity, but also
[738]
because itâs way easier to sketch out while
youâre puzzling over some problem.
[744]
You see, people often think of probability
as being the study of uncertainty. While that
[750]
is, of course, how itâs applied in science,
the actual math of probability is really just
[756]
the math of proportions, where turning to
geometry is exceedingly helpful.
[761]
I mean, if you look at Bayesâ theorem as
a statement about proportions â proportions
[769]
of people, of areas, whatever â once you
digest what itâs saying, itâs actually
[773]
kind of obvious. Both sides tell you to look
at all the cases where the evidence is true,
[778]
and consider the proportion where the hypothesis is also true. Thatâs it. Thatâs all itâs
[785]
saying.
[786]
Whatâs noteworthy is that such a straightforward fact about proportions can become hugely significant
[792]
for science, AI, and any situation where you
want to quantify belief. Youâll get a better
[799]
glimpse of this as we get into more examples.
[801]
But before any more examples, we have some unfinished business with Steve. Some psychologists
[808]
debate Kahneman and Tverskyâs conclusion, that the rational thing to do is to bring
[812]
to mind the ratio of farmers to librarians.
They complain that the context is ambiguous.
[818]
Who is Steve, exactly? Should you expect heâs a randomly sampled American? Or would you
[823]
be better to assume heâs a friend of these
two psychologists interrogating you?
[827]
Or perhaps someone youâre personally likely to know? This assumption determines the prior.
[832]
I, for one, run into many more librarians
in a given month than farmers. And needless
[837]
to say, the probability of a librarian or
a farmer fitting this description is highly
[842]
open to interpretation.
[843]
But for our purposes, understanding the math, notice how any questions worth debating can
[850]
be pictured in the context of the diagram.
Questions of context shift around the prior,
[855]
and questions of personalities and stereotypes
shift the relevant likelihoods.
[861]
All that said, whether or not you buy this
particular experiment the ultimate point that
[865]
evidence should not determine beliefs, but
update them, is worth tattooing in your mind.
[871]
Iâm in no position to say whether this does
or doesnât run against natural human intuition,
[876]
weâll leave that to the psychologists. Whatâs
more interesting to me is how we can reprogram
[881]
our intuitions to authentically reflect the
implications of math, and bringing to mind
[886]
the right image can often do just that.
You can go back to the homepage right here: Homepage





