🔍

Bayes theorem, the geometry of changing beliefs - YouTube

Channel: 3Blue1Brown

[0]

The goal is for you to come away from this video understanding one of the most important

[3]

formulas in all of probability, Bayes’ theorem.

[7]

This formula is central to scientific discovery, it’s a core tool in machine learning and

[12]

AI, and it’s even been used for treasure hunting, when in the 80’s a small team led

[17]

by Tommy Thompson used Bayesian search tactics to help uncover a ship that had sunk a century

[23]

and half earlier carrying what, in today’s terms, amounts to $700,000,000 worth of gold.

[30]

So it's a formula worth understanding.

[33]

But of course there were multiple different levels of possible understanding.

[37]

At the simplest there’s just knowing what each part means, so you can plug in numbers.

[42]

Then there’s understanding why it’s true; and later I’m gonna show you a certain diagram that’s helpful

[47]

for rediscovering the formula on the fly as needed.

[51]

Then there’s being able to recognize when you need to use it.

[56]

With the goal of gaining a deeper understanding, you and I will tackle these in reverse order.

[60]

So before dissecting the formula, or explaining the visual that makes it obvious, I’d like

[65]

to tell you about a man named Steve. Listen carefully.

[72]

Steve is very shy and withdrawn, invariably helpful but with very little interest in people

[78]

or in the world of reality. A meek and tidy soul, he has a need for order and structure,

[83]

and a passion for detail.

[85]

Which of the following do you find more likely: “Steve is a librarian”, or “Steve is

[89]

a farmer”?

[91]

Some of you may recognize this as an example from a study conducted by the psychologists

[95]

Daniel Kahneman and Amos Tversky, whose Nobel-prize-winning work was popularized in books like “Thinking

[103]

Fast and Slow”, “The Undoing Project”, . They researched human

[108]

judgments, with a frequent focus on when these judgments irrationally contradict what the

[113]

laws of probability suggest they should be.

[116]

The example with Steve, the maybe-librarian-maybe-farmer, illustrates one specific type of irrationality.

[122]

Or maybe I should say “alleged” irrationality; some people debate the conclusion, but more

[127]

on all that in a moment.

[130]

According to Kahneman and Tversky, after people are given this description of Steve as “meek

[134]

and tidy soul”, most say he is more likely to be a librarian than a farmer. After all,

[139]

these traits line up better with the stereotypical view of a librarian than that of a farmer.

[143]

And according to Kahneman and Tversky, this is irrational.

[147]

The point is not whether people hold correct or biased views about the personalities of

[151]

librarians or farmers, it’s that almost no one thinks to incorporate information about

[156]

ratio of farmers to librarians into their judgments. In their paper, Kahneman and Tversky

[162]

said that in the US that ratio is about 20 to 1. The numbers I can find for today put

[167]

it much higher than that, but let’s just run with the 20 to 1 ratio since it’s a

[171]

bit easier to illustrate, and proves the point just as well.

[173]

To be clear, anyone who is asked this question is not expected to have perfect information on the

[179]

actual statistics of farmers, librarians, and their personality traits. But the question

[184]

is whether people even think to consider this ratio, enough to make a rough estimate. Rationality

[190]

is not about knowing facts, it’s about recognizing which facts are relevant.

[196]

If you do think to make this estimate, there’s a pretty simple way to reason about the question

[199]

– which, spoiler alert, involves all the essential reasoning behind Bayes’ theorem.

[204]

You might start by picturing a representative sample of farmers and librarians, say, 200

[209]

farmers and 10 librarians. Then when you hear the meek and tidy soul description, let’s

[215]

say your gut instinct is that 40% of librarians would fit that description and that 10% of

[220]

farmers would. That would mean that from your sample, you’d expect that about 4 librarians

[226]

fit it, and that 20 farmers do. The probability that a random person who fits this description

[235]

is a librarian is 4/24, or 16.7%.

[240]

So even if you think a librarian is 4 times as likely as a farmer to fit this description,

[245]

that’s not enough to overcome the fact that there are way more farmers. The upshot, and

[250]

this is the key mantra underlying Bayes’ theorem, is that new evidence should not completely

[255]

determine your beliefs in a vacuum; it should update prior beliefs.

[261]

If this line of reasoning makes sense to you, the way seeing evidence restricts the space

[265]

of possibilities, and the ratio you need to consider after that, then congratulations! You understand the heart of Bayes’ theorem.

[273]

Maybe the numbers you’d estimate would be a little bit different, but what matters is how you fit

[277]

the numbers together to update a belief based on evidence. Here, see if you can take a minute

[285]

to generalize what we just did and write it down as a formula.

[292]

The general situation where Bayes’ theorem is relevant is when you have some hypothesis,

[296]

say that Steve is a librarian, and you see some evidence, say this verbal description

[302]

of Steve as a “meek and tidy soul”, and you want to know the probability that the

[306]

hypothesis holds given that the evidence is true. In the standard notation, this vertical

[312]

bar means “given that”. As in, we’re restricting our view only to the possibilities

[317]

where the evidence holds.

[320]

The first relevant number is the probability that the hypothesis holds before considering

[326]

the new evidence. In our example, that was the 1/21, which came from considering the

[331]

ratio of farmers to librarians in the general population. This is known as the prior.

[338]

After that, we needed to consider the proportion of librarians that fit this description; the

[342]

probability we would see the evidence given that the hypothesis is true. Again, when you

[348]

see this vertical bar, it means we’re talking about a proportion of a limited part of the

[353]

total space of possibilities, in this cass, limited to the left slide where the hypothesis

[358]

holds. In the context of Bayes’ theorem, this value also has a special name, it’s

[363]

the “likelihood”.

[364]

Similarly, we need to know how much of the other side of our space includes the evidence;

[369]

the probability of seeing the evidence given that our hypothesis isn’t true. This little

[375]

elbow symbol is commonly used to mean “not” in probability.

[380]

Now remember what our final answer was. The probability that our librarian hypothesis

[385]

is true given the evidence is the total number of librarians fitting the evidence, 4, divided

[391]

by the total number of people fitting the evidence, 24.

[395]

Where does that 4 come from? Well it’s the total number of people, times the prior probability

[401]

of being a librarian, giving us the 10 total librarians, times the probability that one

[406]

of those fits the evidence. That same number shows up again in the denominator, but we

[412]

need to add in the total number of people times the proportion who are not librarians,

[417]

times the proportion of those who fit the evidence, which in our example gave 20.

[423]

The total number of people in our example, 210, gets canceled out – which of course

[427]

it should, that was just an arbitrary choice we made for illustration – leaving us finally

[432]

with the more abstract representation purely in terms of probabilities. This, my friends,

[438]

is Bayes’ theorem.

[440]

You often see this big denominator written more simply as P(E), the total probability

[446]

of seeing the evidence. In practice, to calculate it, you almost always have to break it down

[454]

into the case where the hypothesis is true, and the one where it isn’t.

[458]

Piling on one final bit of jargon, this final answer is called the “posterior”; it’s

[465]

your belief about the hypothesis after seeing the evidence.

[470]

Writing it all out abstractly might seem more complicated than just thinking through the

[473]

example directly with a representative sample; and yeah, it is! Keep in mind, though, the

[480]

value of a formula like this is that it lets you quantify and systematize the idea of changing

[486]

beliefs. Scientists use this formula when analyzing the extent to which new data validates

[491]

or invalidates their models; programmers use it in building artificial intelligence, where

[497]

you sometimes want to explicitly and numerically model a machine’s belief. And honestly just

[502]

for how you view yourself, your own opinions and what it takes for your mind to change,

[506]

Bayes’ theorem can reframe how you think about thought itself. Putting a formula to

[513]

it is also all the more important as the examples get more intricate.

[517]

However you end up writing it, I’d actually encourage you not to memorize the formula,

[522]

but to draw out this diagram as needed.

[524]

This is sort of the distilled version of thinking with a representative sample where we think

[529]

with areas instead of counts, which is more flexible and easier to sketch on the fly.

[534]

Rather than bringing to mind some specific number of examples, think of the space of

[538]

all possibilities as a 1x1 square. Any event occupies some subset of this space, and the

[546]

probability of that event can be thought about as the area of that subset. For example, I

[552]

like to think of the hypothesis as filling the left part of this square, with a width

[556]

of P(H).

[557]

I recognize I’m being a bit repetitive, but when you see evidence, the space of possibilities

[563]

gets restricted. Crucially, that restriction may not happen evenly between the left and

[568]

the right. So the new probability for the hypothesis is the proportion it occupies in

[574]

this restricted subspace.

[578]

If you happen to think a farmer is just as likely to fit the evidence as a librarian,

[582]

then the proportion doesn’t change, which should make sense. Irrelevant evidence doesn’t

[587]

change your belief. But when these likelihoods are very different, that's when your belief changes a lot.

[595]

This is actually a good time to step back and consider a few broader takeaways about

[619]

how to make probability more intuitive, beyond Bayes’ theorem. First off, there’s the

[624]

trick of thinking about a representative sample with a specific number of examples, like our

[629]

210 librarians and farmers. There’s actually another Kahneman and Tversky result to this

[635]

effect, which is interesting enough to interject here.

[638]

They did an experiment similar to the one with Steve, but where people were given the

[642]

following description of a fictitious woman named Linda:

[646]

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy.

[652]

As a student, she was deeply concerned with issues of discrimination and social justice,

[656]

and also participated in anti-nuclear demonstrations.

[660]

They were then asked what is more likely: That Linda is a bank teller, or that Linda

[667]

is a bank teller and is active in the feminist movement. 85% of participants said the latter

[674]

is more likely, even though the set of bank tellers active in the femist movement is a

[681]

subset of the set of bank tellers!

[684]

But, what’s fascinating is that there’s a simple way to rephrase the question that

[691]

dropped this error from 85% to 0. Instead, if participants are told there are 100 people

[698]

who fit this description, and asked people to estimate how many of those 100 are bank

[703]

tellers, and how many are bank tellers who are active in the feminist movement, no one

[707]

makes the error. Everyone correctly assigns a higher number to the first option than to

[712]

the second.

[715]

Somehow a phrase like “40 out of 100” kicks our intuition into gear more effectively

[720]

than “40%”, much less “0.4”, or abstractly referencing the idea of something being more

[727]

or less likely.

[729]

That said, representative samples don’t easily capture the continuous nature of probability,

[734]

so turning to area is a nice alternative, not just because of the continuity, but also

[738]

because it’s way easier to sketch out while you’re puzzling over some problem.

[744]

You see, people often think of probability as being the study of uncertainty. While that

[750]

is, of course, how it’s applied in science, the actual math of probability is really just

[756]

the math of proportions, where turning to geometry is exceedingly helpful.

[761]

I mean, if you look at Bayes’ theorem as a statement about proportions – proportions

[769]

of people, of areas, whatever – once you digest what it’s saying, it’s actually

[773]

kind of obvious. Both sides tell you to look at all the cases where the evidence is true,

[778]

and consider the proportion where the hypothesis is also true. That’s it. That’s all it’s

[785]

saying.

[786]

What’s noteworthy is that such a straightforward fact about proportions can become hugely significant

[792]

for science, AI, and any situation where you want to quantify belief. You’ll get a better

[799]

glimpse of this as we get into more examples.

[801]

But before any more examples, we have some unfinished business with Steve. Some psychologists

[808]

debate Kahneman and Tversky’s conclusion, that the rational thing to do is to bring

[812]

to mind the ratio of farmers to librarians. They complain that the context is ambiguous.

[818]

Who is Steve, exactly? Should you expect he’s a randomly sampled American? Or would you

[823]

be better to assume he’s a friend of these two psychologists interrogating you?

[827]

Or perhaps someone you’re personally likely to know? This assumption determines the prior.

[832]

I, for one, run into many more librarians in a given month than farmers. And needless

[837]

to say, the probability of a librarian or a farmer fitting this description is highly

[842]

open to interpretation.

[843]

But for our purposes, understanding the math, notice how any questions worth debating can

[850]

be pictured in the context of the diagram. Questions of context shift around the prior,

[855]

and questions of personalities and stereotypes shift the relevant likelihoods.

[861]

All that said, whether or not you buy this particular experiment the ultimate point that

[865]

evidence should not determine beliefs, but update them, is worth tattooing in your mind.

[871]

I’m in no position to say whether this does or doesn’t run against natural human intuition,

[876]

we’ll leave that to the psychologists. What’s more interesting to me is how we can reprogram

[881]

our intuitions to authentically reflect the implications of math, and bringing to mind

[886]

the right image can often do just that.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage