Parsing Bottom Up - Computerphile - YouTube

Channel: Computerphile

[0]
last time
[1]
you did a really good animation about
[4]
top-down parsing we had these sentences
[7]
in this totally artificial grammar which
[9]
has only got about
[10]
30 words in it all in all but we
[14]
started here with sentence is subject
[18]
verb object
[20]
and in the top-down approach you
[21]
basically say okay i'll push those on
[24]
the stack in a reverse order subject
[27]
verb object and starting at the top of
[29]
the stack at the left you say well what
[31]
can a subject be
[33]
so you do these things on your stack top
[36]
down one at a time and you start off at
[39]
the root of the tree and you develop the
[41]
component parts left to right one at a
[44]
time it is a lot easier totally by hand
[49]
to write a top down parser on the bottom
[52]
up one but what i should be getting to
[54]
later on and in fact i'm starting right
[56]
now is to say well
[58]
what happens if instead of starting up
[61]
at the root and developing the leaves of
[63]
the tree as it were
[65]
you start right down at the bottom
[67]
with
[68]
the texturing that you know is correct
[70]
and try and work upwards
[72]
you know can you work back upwards from
[74]
the leaves to the root
[78]
so in fact let's write that
[80]
test sentence below here the
[83]
robot
[84]
stroked to furry and actually since
[88]
layout doesn't matter too much at the
[90]
moment i'll squeeze the word dice in at
[92]
the right there now be clear we're
[94]
talking about bottom up parsing now
[97]
in bottom up parsing you start with
[100]
the
[101]
string that you think is correct and you
[104]
say
[105]
start with a string can i look into the
[107]
rules and see how to
[109]
work up the tree not down the tree
[112]
and yes so therefore you're looking at
[115]
possible
[116]
matches on the right hand side
[120]
for
[121]
components of this string reading from
[123]
left to right okay
[125]
the how many ways are there you can
[128]
match the string the against one of
[130]
these classifications here well the text
[134]
the string the is the right hand side
[137]
possibility of an article that's one way
[139]
to do it oh and then look up here right
[141]
at the top of the grammar if you have an
[144]
article followed by a noun like the
[147]
followed by something else that could be
[149]
a subject and that's looking good
[151]
because right up at the top of the
[152]
grammar we want to end up with subject
[154]
verb object looking just at the robot
[158]
and looking at the grammar right hand
[159]
sides i could do it by saying well it's
[162]
the subject of the sentence it's at the
[163]
left hand side and if i go article noun
[167]
i get the and i get robot
[169]
but what i've done
[171]
just to
[173]
act as a talking point and it
[174]
illustrates a lot of things here i've
[176]
given you a shortcut if you want to you
[179]
can just do the robot with no further
[181]
interior analysis at all it's an allowed
[184]
phrase it's the subject
[186]
now i have to say
[188]
that as we develop this story we will
[191]
get into bottom-up parsing because one
[193]
of the tools we're going to use called
[195]
yak basically produces bottom-up parses
[198]
for you not top-down ones
[200]
and
[201]
it's a yak
[203]
behavior symptom that it loves
[206]
when you're trying to match text strings
[209]
it likes to match the longest one that
[211]
it can manage so it is going to seize on
[214]
the robot all as one phrase as being a
[217]
wonderful long
[219]
solution why doesn't it just do there
[221]
and then wait patiently for now that's
[223]
not the bottom-up way if you can see a
[226]
longer oh and this thing by the way that
[228]
you're looking at is built up on a stack
[230]
of course it's called the handle i get a
[233]
longer handle by going for this option
[236]
here and getting the robot all in one go
[238]
okay the robot then and you've got to
[241]
get used to reading from right to left
[243]
now in bottom up parsing the robot all
[245]
as one phrase is an example of a subject
[249]
so we can now say okay
[252]
the robot is my subject now that act of
[256]
picking up a sub string from your
[258]
sentence and going upwards and making it
[261]
more abstract if you like is called
[263]
reducing in bottom-up parsing
[266]
so
[267]
looking for a longer and longer and
[269]
longer string to get your handle that's
[271]
called shifting because you're shifting
[273]
characters
[275]
one after another and making the string
[276]
longer and longer and saying can i go
[278]
any further start shifting but when you
[281]
say oh that's a nice long string and it
[283]
matches and then you go up and say oh
[285]
that's my subject that is called
[287]
reduction because you're going to
[289]
something simpler further up the tree so
[291]
you can take that off as being done
[293]
bottom up next thing is you see this
[297]
string of characters called stroke and
[299]
once again it's right hand side driven
[301]
what is there on the right hand side and
[304]
which rule is it that could possibly
[306]
match stroked you see in here against
[309]
verb bit kicked or stroked those three
[312]
strings are your possibilities so that's
[313]
fine going right to left you say stroked
[316]
is an example of a verb so we've got our
[321]
verb there
[323]
now again
[325]
cheated but it's wonderful fun i have
[327]
not analyzed two furry dice
[330]
into adjectives and nouns anything like
[332]
that i've just put it in as a
[334]
interesting shortcut to have there and
[336]
it is an example of what i would call an
[339]
object phrase
[341]
some of you who are really good english
[342]
linguists may want to go on about my
[344]
lack of understanding about what a
[345]
direct and indirect object are not to
[348]
mention predicates and so on but
[350]
please forgive me i regard it as being a
[352]
phrase in an object position so i'm
[355]
saying there's a quick match here and
[357]
bottom is gonna love this two fairy dice
[360]
a great long handle oh and if i match it
[362]
there what's the left-hand side it
[364]
corresponds to obj okay then we've won
[368]
we have worked bottom up to having
[370]
subject verb objects on our stack
[373]
starting with the string and once more
[376]
we've exhausted the string now it's the
[378]
end of it there's a sort of full stop
[379]
after that there we are then we've got
[382]
top down
[383]
which tends to be more
[386]
how should we say eager
[388]
you know a top-down pars would very
[390]
probably leap on the word the and not
[392]
bother to go any further because it's
[393]
found a quick match for it whereas
[395]
bottom up is the other way around it's
[398]
basically saying i want the longest
[400]
possible handle even at this stage in
[403]
the late 50s and early 60s there was a
[405]
sneaky suspicion coming around
[408]
that actually bottom up parsing was a
[411]
little bit more powerful
[414]
than top down
[416]
i'm going to put out a set of notes for
[418]
this so that you can look up for
[419]
yourself just examples of why it is more
[422]
powerful but
[423]
roughly speaking i think you can sense
[425]
that because you've not only got
[427]
something you're looking for but you've
[429]
got a handle that you've already
[431]
accumulated it's like gathering more
[433]
contextual information
[436]
going bottom up but on the other hand
[438]
handling the stack and working out
[439]
what's happening is a dot sight more
[441]
complicated if you do it by hand coming
[444]
bottom up around the doodle by hand
[447]
why not
[448]
me and you lot it's a good way to learn
[452]
lex and yak in other words don't write
[454]
the c directly yourself get a software
[457]
tool of these
[459]
like these two to do it for you
[462]
so that's exactly what we're going to do
[463]
i've got the program putty that does ssh
[466]
connected here i'm talking to my other
[468]
linux machine in the other room where i
[471]
have got set up a parser complete parser
[475]
front-end electrical analyzer syntax
[477]
analysis for this furry grammar and all
[480]
legal sentences in it and i know first
[483]
of all you will want me
[485]
to call up the program that implements
[488]
this and the test sentence first of all
[490]
is the robot's throat
[494]
here we go
[496]
so furry it's hanging there it's waiting
[499]
for us to give a correct furry sentence
[502]
dice
[504]
return
[505]
look at that it's happy with it i've
[507]
given it to in subject verb object order
[511]
and i have numbered those rules in the
[514]
grammar as i showed you earlier and i
[516]
now have as it were a map of how it has
[519]
passed it rule three now that is the one
[523]
that effectively says i can do the robot
[526]
all as one phrase it has chosen not to
[530]
go
[531]
for the
[532]
and robot as article and noun as
[534]
separate entities it might well have
[536]
done that had i gone top down but
[538]
because this yak confected parser system
[541]
goes bottom up it's gone for the longest
[543]
possible handle at that stage and it's
[545]
matched it rule four the middle piece is
[548]
matched stroked as the verb and finally
[552]
it just spotted right at the very end
[554]
that i put in another sneaky shortcut
[556]
two furry dices rule six and that is my
[560]
parse
[561]
so should we try one more just to make
[563]
sure go on sean tell me which one to try
[565]
um the
[567]
woman
[568]
bit the dog
[569]
yeah
[570]
the woman bit
[572]
the dog
[574]
hey well look rule two the woman now
[576]
rule two not rule three if it's followed
[578]
rule two it's gone down the article noun
[581]
route which means it knows that's the
[583]
only way to match the war the woman
[586]
there is no shortcut way okay
[588]
rule four verb rule again you chose b
[591]
rule five now this time again there is
[594]
no shortcut to two furry dice at rule
[597]
six you've got to go the long way around
[599]
and following rule five you break it
[601]
down into article now and again the dog
[604]
so
[605]
there we are we
[607]
you could say well you've written a
[609]
compiler for the furry language with the
[611]
help of lex and yak we can go into
[614]
details of that later if need be but not
[617]
now it's fair enough but it's not doing
[619]
anything really is it what more should
[621]
we do with this now we've written it
[622]
this far then sure can you tell me well
[624]
i think next time we need to come up
[626]
with an action it needs to do something
[627]
right it needs to transform that grammar
[630]
in some way those of us those of you who
[634]
in the previous video actually bother to
[637]
look at the extra bits may have had a
[640]
sneak preview as to what we're going to
[642]
do as our much more interesting actions
[645]
now we have recognized the innate
[647]
structure so remember
[650]
always watch the extra bits
[654]
computer scientists began to say well
[656]
you know this is great because
[659]
if we're writing a computer language
[660]
like algol 60
[662]
we can have the whole of the algol 60
[664]
grammar display to us and as we're
[666]
writing it in assembler or whatever we
[669]
could have half an eye on what to do
[671]
next because