🔍

Monte Carlo Tree Search p1 - YouTube

Channel: unknown

[0]

Okay Michael, so if you go all the way back to the very first slide,

[4]

when we started this conversation.

[6]

The slide said something about generalizing generalizing.

[9]

But one of the words that was written on there was scalability.

[12]

So one of the things that we've been talking about, sometimes explicitly,

[15]

sometimes implicitly, is this notion of getting scale to actually happen.

[20]

So just because I think it's important to think about scale

[23]

more than in terms of abstraction.

[25]

I just want to take a moment to talk about a algorithmic approach to getting

[29]

scaling to work.

[30]

That's different from just doing abstraction of the sort that

[33]

we've done before.

[34]

Although it's going to turn out that all the things that we've done with our

[37]

abstraction will actually work in this new algorithmic view of scalability.

[42]

That seem interesting?

[44]

>> Yeah that would be really helpful.

[45]

>> Cool let's do that, all right, so

[46]

here's a particular algorithm that I want to introduce.

[49]

It's called Monte Carlo Tree Search, and there are four words there, and

[53]

there's two different parts.

[54]

So for the beginning I just want to concentrate on the tree search part, and

[59]

then tell you what I mean by the Monte Carlo part.

[62]

So, when I use this picture on the right, the little tree, all I'm trying

[66]

to get you to think about with this tree, is what we've done in the past.

[70]

Say in an AI course, where you've got nodes representing states.

[74]

There are actions that you might take that would get you to other states.

[78]

And then more actions to get other states and so on and so forth.

[82]

But this particular tree has a nice little form that's kind of

[85]

hidden by the edges and the nodes, and I want to harp on that for a little bit.

[89]

And I think it would help if you understand the algorithm that I'm trying

[92]

to get through on this data structure.

[94]

So the algorithm is written on the left, and

[96]

it's a big loop loop loop loop loop loop loop.

[98]

And it works like this, there are four steps, the first step is selection.

[102]

And I'll define what all these things mean in but

[105]

a moment, but I just want to go through at a high level what each one is.

[108]

So the first one is selection.

[110]

And selection is basically the way in which you're going to

[114]

decide what actions to take from a particular state.

[118]

What's going to happen is you're going to take a bunch of actions and

[121]

eventually going to get to some point in this tree here.

[123]

Of possible states and actions you might see,

[126]

where you don't know enough to know how to make a selection.

[131]

Once you're at a place where you don't exactly know how you should make

[134]

a selection, the idea is that you're going to expand the tree there.

[139]

And then do simulation to figure out what it is you ought to be doing

[143]

to make selections.

[145]

And the way that's going to happen, is you're going to estimate

[148]

from that expanded set of nodes the true value of taking actions in those states.

[153]

And then in a very kind of, in a very Bellman equation way,

[159]

you're going to back up what you learned through your simulation.

[161]

And then that will allow you to select what to do next.

[165]

And you'll just keep doing that over and

[167]

over and over again making selections where you know what to do.

[170]

Expanding, simulating the world for a little while, so

[173]

that you learn what to do, and then make those decisions, and so on and so forth.

[178]

So does that makes sense a very high level?

[179]

See what I'm trying to accomplish here?

[181]

>> Yeah I think so.

[182]

>> Okay. >> It reminds me of a lot of other

[183]

kinds of tree search that I've seen in AI classes.

[186]

>> Like what?

[187]

>> A star is a kind of tree search,

[189]

a game tree search is a kind of tree search.

[191]

They have similar pictures where you repeatedly expand nodes and

[198]

get estimates of values.

[199]

>> Right, and in fact I like the game trees or the games search one.

[202]

This is not game search because we're living in an MVP, and it's just us, and

[205]

so we're not playing against an opponent.

[207]

But something very nice about that particular one,

[210]

is the way it works in that world often is.

[212]

You have a particular way of figuring out what action to take or

[215]

you sort of expand and do a search among a bunch of possibilities.

[218]

And then eventually you run out of time, and so

[220]

you have to decide what the value is or whatever leaf nodes you've gotten to.

[224]

And the way you do that is use something like an evaluation function,

[228]

which tells you how good we think this node is.

[231]

And that's just what you what you've got to work on.

[233]

And you use that and

[234]

you back up the values, that helps you to make a decision about what to do.

[238]

And then you make a decision, then your opponent makes a decision,

[241]

you end up in a state, and you do it all over again.

[243]

Basically you search out as far as you have time to search out for.

[246]

When you run out of time, you use some estimate that you got from somewhere of

[251]

how good the node is, and then you back that up.

[254]

And that's basically what we're going to be doing here,

[256]

except we have another wrinkle.

[258]

And the wrinkle here is where the Monte Carlo part comes in.

[261]

So the tree search is very standard.

[263]

The Monte Carlo part is, well, we've got randomness,

[266]

we've got stochasticity in our transition model.

[269]

And so we're going to have to come up with some way to do this

[272]

estimation that takes that into account.

[274]

And that's where the simulation is going to come from.

[276]

All right, so let's break that up in the little parts.

[279]

Actually that's a lot of words,

[280]

but I think walking through this picture might help a little bit.

Most Recent Videos:

WE KILLED 6 HEROIC BOSSES! - YouTube

¿Quién inventó el dinero? - YouTube

Cuándo se inventó el dinero y cómo el dólar se convirtió en la principal moneda del mundo - YouTube

This Citizenship Program is Failing - YouTube

Candida Treatment Protocol w/ Dr. DiNezza - YouTube

$500M investor reacts to Real Estate Tik Toks 2 - YouTube

You can go back to the homepage right here: Homepage