Context Free Grammars & Parsing in NLTK_LITE

In this section we will explore using CFG's (Context Free Grammars) and parsing using NLTK_LITE.

Creating a CFG in NLTK_LITE

Our first step would be to encode a CFG. For the CFG given below:

S -> NP VP

NP -> ART N

NP -> ART ADJ N

VP -> V

VP -> V NP



ART -> the

ART -> a

N -> cat

N -> dog

N -> fish

V -> saw

V -> ate

V -> cried

ADJ -> black


You have to do the following: First import the entire parse module:

from nltk_lite.parse import *

and then:

  1. Create non-terminals:

    S, NP, VP, ART, N, ADJ, V = cfg.nonterminals('S, NP, VP, ART, N, ADJ, V')

  2. Make individual productions:

    p1 = cfg.Production(S, [NP, VP])
    p2 = cgf.Production(NP, [ART, N])
    ...
    p6 = cfg.Production(ART, ['the'])
    ...
    p14 = cfg.Production(ADJ, ['black'])


  3. Make the grammar:

    myGrammar = cfg.Grammar(S, [p1, p2, ...., p14])

    Where S, above is defining it as the start symbol.

You can also create a grammar as follows:

>>> S, NP, VP, ART, N, ADJ, V = cfg.nonterminals('S, NP, VP, ART, N, ADJ, V')
>>> myGrammar = cfg.parse_grammar('''
S -> NP VP
NP -> ART N | ART ADJ N
VP -> V | V NP
ART -> 'the' | 'a'
N -> 'cat' | 'dog' | 'fish'
V -> 'saw' | 'ate' | 'cried'
ADJ -> 'black'
''')

Parsing

Once you have a grammar, you can try this grammer on different parsing schemes. Below, we will look at one top-down scheme: recursive descent, and one bottom-up scheme: shift-reduce.

Recursive Descent Parsing

You can create an instance of a recursive descent parser for your grammar as follows:


>>> myRDparser = RecursiveDescent(myGrammar)

Next, create a sentence using tokenization:

>>> sent = list(tokenize.whitespace('the dog cried'))
>>> sent
['the', 'dog', 'cried']

To parse this sentence using the parser you created, use:

>>> myRDparser.parse(sent)
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))

Shift-Reduce Parsing

Similarly, you can create an instance of a shift-reduce parser and then parse:


>>> mySRparser = ShiftReduce(myGrammar)
>>> mySRparser.parse(sent)
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))

You can also see the operation of the parser by turning on the tracing feature:

>>> mySRparser.trace()
>>> mySRparser.parse(sent)
Parsing 'the dog cried'
[ * the dog cried]
S [ 'the' * dog cried]
R [ <ART> * dog cried]
S [ <ART> 'dog' * cried]
R [ <ART> <N> * cried]
R [ <NP> * cried]
S [ <NP> 'cried' * ]
R [ <NP> <V> * ]
R [ <NP> <VP> * ]
R [ <S> * ]
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))

Try the same for the recursive descent parser above.

Chart Parsing

You can also use the nltk_lite's chart parsing modules similarly to do bottom-up or top-down chart parsing. This is shown below.

 

First, you creat a bottom-up chart parser, given a specific grammar:

>>> myBUChartParser = ChartParse(myGrammar, BU_STRATEGY)

Then, as above, parse:


>>> myBUChartParser.parse(sent)
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))

You can also turn the trace on of off. But this is different from the above.

>>> myBUChartParser = ChartParse(myGrammar, BU_STRATEGY, trace=1)
>>> myBUChartParser.parse(sent)
|. the . dog . cried .|
Bottom Up Init Rule:
- Added 3 edges
Bottom Up Predict Rule:
- Added 3 edges
Fundamental Rule:
- Added 3 edges
Bottom Up Predict Rule:
- Added 4 edges
Fundamental Rule:
- Added 5 edges
Bottom Up Predict Rule:
- Added 1 edges
Fundamental Rule:
- Added 2 edges
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))

Similarly, you can create a top-down chart parser as shown below:

>>> myTDChartParser = ChartParse(myGrammar, TD_STRATEGY)
>>> myTDChartParser.parse(sent)
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))
>>> myTDChartParser = ChartParse(myGrammar, TD_STRATEGY, trace = 1)
>>> myTDChartParser.parse(sent)
|. the . dog . cried .|
Top Down Init Rule:
- Added 1 edges
Top Down Expand Rule:
- Added 4 edges
Top Down Match Rule:
- Added 1 edges
Fundamental Rule:
- Added 3 edges
Top Down Expand Rule:
- Added 4 edges
Top Down Match Rule:
- Added 1 edges
Fundamental Rule:
- Added 3 edges
Top Down Expand Rule:
- Added 5 edges
Top Down Match Rule:
- Added 1 edges
Fundamental Rule:
- Added 4 edges
Top Down Expand Rule:
- Added 4 edges
(S: (NP: (ART: 'the') (N: 'dog')) (VP: (V: 'cried')))

Go ahead, try these various parsers for CFG's of your own.