In this section we will explore using CFG's (Context Free Grammars) and parsing using NLTK_LITE.
Our first step would be to encode a CFG. For the CFG given below:
S -> NP VP NP -> ART N NP -> ART ADJ N VP -> V VP -> V NP ART -> the ART -> a N -> cat N -> dog N -> fish V -> saw V -> ate V -> cried ADJ -> black
You have to do the following: First import the entire parse module:
from nltk_lite.parse import *
and then:
You can also create a grammar as follows:
>>> S, NP, VP, ART, N, ADJ, V = cfg.nonterminals('S,
NP, VP, ART, N, ADJ, V')
>>> myGrammar = cfg.parse_grammar('''
S -> NP VP
NP -> ART N | ART ADJ N
VP -> V | V NP
ART -> 'the' | 'a'
N -> 'cat' | 'dog' | 'fish'
V -> 'saw' | 'ate' | 'cried'
ADJ -> 'black'
''')
Once you have a grammar, you can try this grammer on different parsing schemes. Below, we will look at one top-down scheme: recursive descent, and one bottom-up scheme: shift-reduce.
You can create an instance of a recursive descent parser for your grammar as follows:
>>> myRDparser = RecursiveDescent(myGrammar)
Next, create a sentence using tokenization:
>>> sent = list(tokenize.whitespace('the dog cried'))
>>> sent
['the', 'dog', 'cried']
To parse this sentence using the parser you created, use:
>>> myRDparser.parse(sent)
(S: (NP: (ART:
'the') (N: 'dog')) (VP: (V: 'cried')))
Similarly, you can create an instance of a shift-reduce parser and then parse:
>>> mySRparser = ShiftReduce(myGrammar)
>>> mySRparser.parse(sent)
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))
You can also see the operation of the parser by turning on the tracing feature:
>>> mySRparser.trace()
>>> mySRparser.parse(sent)
Parsing 'the dog cried'
[ * the dog cried]
S [ 'the' * dog cried]
R [ <ART> * dog cried]
S [ <ART> 'dog' * cried]
R [ <ART> <N> * cried]
R [ <NP> * cried]
S [ <NP> 'cried' * ]
R [ <NP> <V> * ]
R [ <NP> <VP> * ]
R [ <S> * ]
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))
Try the same for the recursive descent parser above.
Chart Parsing
You can also use the nltk_lite's chart parsing modules similarly to do bottom-up or top-down chart parsing. This is shown below.
First, you creat a bottom-up chart parser, given a specific grammar:
>>> myBUChartParser = ChartParse(myGrammar, BU_STRATEGY)
Then, as above, parse:
>>> myBUChartParser.parse(sent)
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))
You can also turn the trace on of off. But this is different from the above.
>>> myBUChartParser = ChartParse(myGrammar, BU_STRATEGY, trace=1)
>>> myBUChartParser.parse(sent)
|. the . dog . cried .|
Bottom Up Init Rule:
- Added 3 edges
Bottom Up Predict Rule:
- Added 3 edges
Fundamental Rule:
- Added 3 edges
Bottom Up Predict Rule:
- Added 4 edges
Fundamental Rule:
- Added 5 edges
Bottom Up Predict Rule:
- Added 1 edges
Fundamental Rule:
- Added 2 edges
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))
Similarly, you can create a top-down chart parser as shown below:
>>> myTDChartParser = ChartParse(myGrammar, TD_STRATEGY)
>>> myTDChartParser.parse(sent)
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))
>>> myTDChartParser = ChartParse(myGrammar, TD_STRATEGY, trace = 1)
>>> myTDChartParser.parse(sent)
|. the . dog . cried .|
Top Down Init Rule:
- Added 1 edges
Top Down Expand Rule:
- Added 4 edges
Top Down Match Rule:
- Added 1 edges
Fundamental Rule:
- Added 3 edges
Top Down Expand Rule:
- Added 4 edges
Top Down Match Rule:
- Added 1 edges
Fundamental Rule:
- Added 3 edges
Top Down Expand Rule:
- Added 5 edges
Top Down Match Rule:
- Added 1 edges
Fundamental Rule:
- Added 4 edges
Top Down Expand Rule:
- Added 4 edges
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))
Go ahead, try these various parsers for CFG's of your own.