馃攳
Monte Carlo Tree Search p1 - YouTube
Channel: unknown
[0]
Okay Michael, so if you go all
the way back to the very first slide,
[4]
when we started this conversation.
[6]
The slide said something about
generalizing generalizing.
[9]
But one of the words that was
written on there was scalability.
[12]
So one of the things that we've been
talking about, sometimes explicitly,
[15]
sometimes implicitly, is this notion
of getting scale to actually happen.
[20]
So just because I think it's
important to think about scale
[23]
more than in terms of abstraction.
[25]
I just want to take a moment to talk
about a algorithmic approach to getting
[29]
scaling to work.
[30]
That's different from just doing
abstraction of the sort that
[33]
we've done before.
[34]
Although it's going to turn out that
all the things that we've done with our
[37]
abstraction will actually work in this
new algorithmic view of scalability.
[42]
That seem interesting?
[44]
>> Yeah that would be really helpful.
[45]
>> Cool let's do that, all right, so
[46]
here's a particular algorithm
that I want to introduce.
[49]
It's called Monte Carlo Tree Search,
and there are four words there, and
[53]
there's two different parts.
[54]
So for the beginning I just want to
concentrate on the tree search part, and
[59]
then tell you what I mean
by the Monte Carlo part.
[62]
So, when I use this picture on the
right, the little tree, all I'm trying
[66]
to get you to think about with this
tree, is what we've done in the past.
[70]
Say in an AI course, where you've
got nodes representing states.
[74]
There are actions that you might take
that would get you to other states.
[78]
And then more actions to get other
states and so on and so forth.
[82]
But this particular tree has
a nice little form that's kind of
[85]
hidden by the edges and the nodes, and
I want to harp on that for a little bit.
[89]
And I think it would help if you
understand the algorithm that I'm trying
[92]
to get through on this data structure.
[94]
So the algorithm is written on the left,
and
[96]
it's a big loop loop
loop loop loop loop loop.
[98]
And it works like this, there are four
steps, the first step is selection.
[102]
And I'll define what all
these things mean in but
[105]
a moment, but I just want to go through
at a high level what each one is.
[108]
So the first one is selection.
[110]
And selection is basically
the way in which you're going to
[114]
decide what actions to take
from a particular state.
[118]
What's going to happen is you're
going to take a bunch of actions and
[121]
eventually going to get to
some point in this tree here.
[123]
Of possible states and
actions you might see,
[126]
where you don't know enough to
know how to make a selection.
[131]
Once you're at a place where you don't
exactly know how you should make
[134]
a selection, the idea is that you're
going to expand the tree there.
[139]
And then do simulation to figure out
what it is you ought to be doing
[143]
to make selections.
[145]
And the way that's going to happen,
is you're going to estimate
[148]
from that expanded set of nodes the true
value of taking actions in those states.
[153]
And then in a very kind of,
in a very Bellman equation way,
[159]
you're going to back up what you
learned through your simulation.
[161]
And then that will allow you
to select what to do next.
[165]
And you'll just keep doing that over and
[167]
over and over again making selections
where you know what to do.
[170]
Expanding, simulating the world for
a little while, so
[173]
that you learn what to do, and then make
those decisions, and so on and so forth.
[178]
So does that makes sense
a very high level?
[179]
See what I'm trying to accomplish here?
[181]
>> Yeah I think so.
[182]
>> Okay.
>> It reminds me of a lot of other
[183]
kinds of tree search that
I've seen in AI classes.
[186]
>> Like what?
[187]
>> A star is a kind of tree search,
[189]
a game tree search is
a kind of tree search.
[191]
They have similar pictures where
you repeatedly expand nodes and
[198]
get estimates of values.
[199]
>> Right, and in fact I like the game
trees or the games search one.
[202]
This is not game search because we're
living in an MVP, and it's just us, and
[205]
so we're not playing
against an opponent.
[207]
But something very nice
about that particular one,
[210]
is the way it works in
that world often is.
[212]
You have a particular way of
figuring out what action to take or
[215]
you sort of expand and do a search
among a bunch of possibilities.
[218]
And then eventually you run out of time,
and so
[220]
you have to decide what the value is or
whatever leaf nodes you've gotten to.
[224]
And the way you do that is use
something like an evaluation function,
[228]
which tells you how good
we think this node is.
[231]
And that's just what you
what you've got to work on.
[233]
And you use that and
[234]
you back up the values, that helps you
to make a decision about what to do.
[238]
And then you make a decision,
then your opponent makes a decision,
[241]
you end up in a state, and
you do it all over again.
[243]
Basically you search out as far as
you have time to search out for.
[246]
When you run out of time, you use some
estimate that you got from somewhere of
[251]
how good the node is, and
then you back that up.
[254]
And that's basically what
we're going to be doing here,
[256]
except we have another wrinkle.
[258]
And the wrinkle here is where
the Monte Carlo part comes in.
[261]
So the tree search is very standard.
[263]
The Monte Carlo part is,
well, we've got randomness,
[266]
we've got stochasticity
in our transition model.
[269]
And so we're going to have to
come up with some way to do this
[272]
estimation that takes that into account.
[274]
And that's where the simulation
is going to come from.
[276]
All right, so
let's break that up in the little parts.
[279]
Actually that's a lot of words,
[280]
but I think walking through this
picture might help a little bit.
Most Recent Videos:
You can go back to the homepage right here: Homepage





