Monte Carlo Tree Search p1 - YouTube

Channel: unknown

[0]
Okay Michael, so if you go all the way back to the very first slide,
[4]
when we started this conversation.
[6]
The slide said something about generalizing generalizing.
[9]
But one of the words that was written on there was scalability.
[12]
So one of the things that we've been talking about, sometimes explicitly,
[15]
sometimes implicitly, is this notion of getting scale to actually happen.
[20]
So just because I think it's important to think about scale
[23]
more than in terms of abstraction.
[25]
I just want to take a moment to talk about a algorithmic approach to getting
[29]
scaling to work.
[30]
That's different from just doing abstraction of the sort that
[33]
we've done before.
[34]
Although it's going to turn out that all the things that we've done with our
[37]
abstraction will actually work in this new algorithmic view of scalability.
[42]
That seem interesting?
[44]
>> Yeah that would be really helpful.
[45]
>> Cool let's do that, all right, so
[46]
here's a particular algorithm that I want to introduce.
[49]
It's called Monte Carlo Tree Search, and there are four words there, and
[53]
there's two different parts.
[54]
So for the beginning I just want to concentrate on the tree search part, and
[59]
then tell you what I mean by the Monte Carlo part.
[62]
So, when I use this picture on the right, the little tree, all I'm trying
[66]
to get you to think about with this tree, is what we've done in the past.
[70]
Say in an AI course, where you've got nodes representing states.
[74]
There are actions that you might take that would get you to other states.
[78]
And then more actions to get other states and so on and so forth.
[82]
But this particular tree has a nice little form that's kind of
[85]
hidden by the edges and the nodes, and I want to harp on that for a little bit.
[89]
And I think it would help if you understand the algorithm that I'm trying
[92]
to get through on this data structure.
[94]
So the algorithm is written on the left, and
[96]
it's a big loop loop loop loop loop loop loop.
[98]
And it works like this, there are four steps, the first step is selection.
[102]
And I'll define what all these things mean in but
[105]
a moment, but I just want to go through at a high level what each one is.
[108]
So the first one is selection.
[110]
And selection is basically the way in which you're going to
[114]
decide what actions to take from a particular state.
[118]
What's going to happen is you're going to take a bunch of actions and
[121]
eventually going to get to some point in this tree here.
[123]
Of possible states and actions you might see,
[126]
where you don't know enough to know how to make a selection.
[131]
Once you're at a place where you don't exactly know how you should make
[134]
a selection, the idea is that you're going to expand the tree there.
[139]
And then do simulation to figure out what it is you ought to be doing
[143]
to make selections.
[145]
And the way that's going to happen, is you're going to estimate
[148]
from that expanded set of nodes the true value of taking actions in those states.
[153]
And then in a very kind of, in a very Bellman equation way,
[159]
you're going to back up what you learned through your simulation.
[161]
And then that will allow you to select what to do next.
[165]
And you'll just keep doing that over and
[167]
over and over again making selections where you know what to do.
[170]
Expanding, simulating the world for a little while, so
[173]
that you learn what to do, and then make those decisions, and so on and so forth.
[178]
So does that makes sense a very high level?
[179]
See what I'm trying to accomplish here?
[181]
>> Yeah I think so.
[182]
>> Okay. >> It reminds me of a lot of other
[183]
kinds of tree search that I've seen in AI classes.
[186]
>> Like what?
[187]
>> A star is a kind of tree search,
[189]
a game tree search is a kind of tree search.
[191]
They have similar pictures where you repeatedly expand nodes and
[198]
get estimates of values.
[199]
>> Right, and in fact I like the game trees or the games search one.
[202]
This is not game search because we're living in an MVP, and it's just us, and
[205]
so we're not playing against an opponent.
[207]
But something very nice about that particular one,
[210]
is the way it works in that world often is.
[212]
You have a particular way of figuring out what action to take or
[215]
you sort of expand and do a search among a bunch of possibilities.
[218]
And then eventually you run out of time, and so
[220]
you have to decide what the value is or whatever leaf nodes you've gotten to.
[224]
And the way you do that is use something like an evaluation function,
[228]
which tells you how good we think this node is.
[231]
And that's just what you what you've got to work on.
[233]
And you use that and
[234]
you back up the values, that helps you to make a decision about what to do.
[238]
And then you make a decision, then your opponent makes a decision,
[241]
you end up in a state, and you do it all over again.
[243]
Basically you search out as far as you have time to search out for.
[246]
When you run out of time, you use some estimate that you got from somewhere of
[251]
how good the node is, and then you back that up.
[254]
And that's basically what we're going to be doing here,
[256]
except we have another wrinkle.
[258]
And the wrinkle here is where the Monte Carlo part comes in.
[261]
So the tree search is very standard.
[263]
The Monte Carlo part is, well, we've got randomness,
[266]
we've got stochasticity in our transition model.
[269]
And so we're going to have to come up with some way to do this
[272]
estimation that takes that into account.
[274]
And that's where the simulation is going to come from.
[276]
All right, so let's break that up in the little parts.
[279]
Actually that's a lot of words,
[280]
but I think walking through this picture might help a little bit.