Definite Clause Grammars

7.2. Definite Clause Grammars

If we want to build a parser in Prolog, we need a representation for sentences. Ignoring capitals and punctuation marks, a sentence can simply be represented by the list of its words in the same order, for instance

[the,rapid,turtle,beats,achilles]

Given this representation, a grammar rule like

sentence --> noun_phrase,verb_phrase.

has the following meaning: a list of words represents a sentence, if some first part of it represents a noun phrase, and the rest represents a verb phrase. This statement can easily be expressed as a Prolog clause:

sentence(S):-
    noun_phrase(NP),
    verb_phrase(VP),
    append(NP,VP,S)

Similarly, a grammar rule containing a terminal

verb --> [sleeps].

means: a list of words represents a verb if it is the list consisting of the single word ‘sleeps’. Translated to Prolog:

verb([sleeps]).

Obviously, there is a very close relationship between context-free grammar rules and definite clauses, and any context-free grammar can easily be translated to a set of Prolog clauses. The exciting thing about this is that these Prolog clauses are nothing less than a parsing program: for instance, we could ask the query

?-sentence([the,rapid,turtle,beats,achilles]).

and get an affirmative answer.

We can actually push the correspondence between grammar rules and definite clauses further by employing difference lists (Section 3.6). This allows us to get rid of the append literals:

sentence(NP1-VP2):-
    noun_phrase(NP1-VP1),
    verb_phrase(VP1-VP2)

This clause should be read as follows: NP1 is a sentence followed by VP2, if NP1 is a noun phrase followed by VP1, and VP1 is a verb phrase followed by VP2 (Figure 7.2). Queries now should take the form

?-sentence([the,rapid,turtle,beats,achilles]-[]).

(after parsing the initial part of the list as a sentence, nothing should be left).

../../../_images/image0042.svg

Figure 7.2 The use of difference lists in grammar rules.

We have shown that there is a one-to-one correspondence between context-free grammars and Prolog programs interpreting those grammars. In fact, the translation from the first to the second is so straightforward that it is built into Prolog. That is, meta-level grammar rules like

sentence --> noun_phrase,verb_phrase.

are allowed in Prolog programs. When interpreting these rules, Prolog will invisibly convert them to object-level program clauses like

sentence(L,L0):-
    noun_phrase(L,L1),
    verb_phrase(L1,L0).

in which the additional variable is an accumulator rather than the minus list of a difference list (Section 3.6). Furthermore, Prolog provides the meta-level predicate phrase/2, such that the object-level query ?-sentence(L,[]) can be replaced by the meta-level query ?-phrase(sentence,L) (Figure 7.3).

../../../_images/image0062.svg

Figure 7.3 Meta-level and object-level in Definite Clause Grammars.

These Prolog grammars are known as Definite Clause Grammars (DCG’s). They are an excellent illustration of the power of declarative programming: specifying a grammar gives you the parser for free. That is, a grammar is a declarative specification of the corresponding parser, and Prolog directly converts this specification into an executable parser. Moreover, since a grammar is purely declarative, the program is also a sentence generator: for instance, it is possible to generate every sentence starting with ‘Achilles’ by means of the query ?-phrase(sentence,[achilles|Rest]).

Definite Clause Grammars further extend the power of context-free grammars in two ways:

  1. arguments can be added to non-terminals;

  2. Prolog goals can be added to the body of grammar rules.

As an illustration of the first feature, we show how plurality agreement can be achieved by means of a DCG instead of a context-sensitive grammar:

sentence                      --> noun_phrase(N),verb_phrase(N).
noun_phrase(N)                --> article(N),noun(N).
verb_phrase(N)                --> intransitive_verb(N).
article(singular)             --> [a].
article(singular)             --> [the].
article(plural)               --> [the].
noun(singular)                --> [turtle].
noun(plural)                  --> [turtles].
intransitive_verb(singular)   --> [sleeps].
intransitive_verb(plural)     --> [sleep].
/** <examples> ?- phrase(sentence,L). */

The first rule states that the pluralities of noun phrase and verb phrase should correspond. The second rule states that the plurality of a noun phrase is determined by both article and noun, which should have corresponding pluralities as well. The remaining rules assign pluralities to specific articles, nouns and verbs.

We can also use this feature to construct a parse tree while parsing a sentence. Parse trees can be represented by Prolog terms (Section 4.1):

  • a parse tree for a terminal T of syntactic category S is represented by the term S(T);

  • a parse tree for a sequence N1Nk of non-terminals of syntactic category S is represented by the term S(N1,,Nk).

Thus, a parse tree for the verb ‘sleeps’ is represented by the term verb(sleeps), and a parse tree for the sentence ‘the turtle sleeps’ is represented by the term

s(np(art(the),n(turtle)),vp(iv(sleeps)))

(for brevity, syntactic categories are abbreviated). The following grammar indicates how parse trees are built up from their constituents.

%:-use_rendering(svgtree).  % uncomment to render parse tree graphically

sentence(s(NP,VP))            --> noun_phrase(NP),
                                  verb_phrase(VP).
noun_phrase(np(N))            --> proper_noun(N).
noun_phrase(np(Art,Adj,N))    --> article(Art),
                                  adjective(Adj),
                                  noun(N).
noun_phrase(np(Art,N))        --> article(Art),noun(N).
verb_phrase(vp(IV))           --> intransitive_verb(IV).
verb_phrase(vp(TV,NP))        --> transitive_verb(TV),
                                  noun_phrase(NP).
article(art(the))             --> [the].
adjective(adj(lazy))          --> [lazy].
adjective(adj(rapid))         --> [rapid].
proper_noun(pn(achilles))     --> [achilles].
noun(n(turtle))               --> [turtle].
intransitive_verb(iv(sleeps)) --> [sleeps].
transitive_verb(tv(beats))    --> [beats].
/** <examples> ?- phrase(sentence(T),[achilles,beats,the,lazy,turtle]). ?- phrase(sentence(s(np(pn(achilles)), vp(tv(beats),np(art(the),adj(lazy),n(turtle))) )), L). */

In the query, the argument of the non-terminal sentence will be instantiated to the final parse tree:

?-phrase(sentence(T),[achilles,beats,the,lazy,turtle]).
  T = s(np(pn(achilles)),
        vp(tv(beats),
           np(art(the),
              adj(lazy),
              n(turtle))))

If we use the predicate term_write/1 given in Section 4.1, a nice tree-like output is obtained:

?-phrase(sentence(T),[achilles,beats,the,lazy,turtle]),
  term_write(T).

  ---------s--------np--------pn--achilles
            --------vp--------tv-----beats
                      --------np-------art-------the
                                -------adj------lazy
                                ---------n----turtle

These examples show one way to use arguments of non-terminals: to collect information coming out of the parsing process. In addition, we might want to express that arguments of different non-terminals in a rule are related in some way. To this end, we can add Prolog goals to the body of grammar rules, by enclosing them in curly brackets {}. For instance, suppose we have a grammar for English numerals like ‘one hundred twenty three’, and we want to calculate the number represented by such numerals during parsing. We could write the following DCG:

numeral(N) --> n1_999(N).
numeral(N) --> n1_9(N1),[thousand],n1_999(N2),{N is N1*1000+N2}.

n1_999(N)  --> n1_99(N).
n1_999(N)  --> n1_9(N1),[hundred],n1_99(N2),{N is N1*100+N2}.

n1_99(N)   --> n0_9(N).
n1_99(N)   --> n10_19(N).
n1_99(N)   --> n20_90(N).
n1_99(N)   --> n20_90(N1),n1_9(N2),{N is N1+N2}.

n0_9(0)    --> [].
n0_9(N)    --> n1_9(N).

n1_9(1)    --> [one].
n1_9(2)    --> [two].
n1_9(3)    --> [three].
n1_9(4)    --> [four].
n1_9(5)    --> [five].
n1_9(6)    --> [six].
n1_9(7)    --> [seven].
n1_9(8)    --> [eight].
n1_9(9)    --> [nine].

n10_19(10) --> [ten].
n10_19(11) --> [eleven].
n10_19(12) --> [twelve].
n10_19(13) --> [thirteen].
n10_19(14) --> [fourteen].
n10_19(15) --> [fifteen].
n10_19(16) --> [sixteen].
n10_19(17) --> [seventeen].
n10_19(18) --> [eighteen].
n10_19(19) --> [nineteen].

n20_90(20) --> [twenty].
n20_90(30) --> [thirty].
n20_90(40) --> [fourty].
n20_90(50) --> [fifty].
n20_90(60) --> [sixty].
n20_90(70) --> [seventy].
n20_90(80) --> [eighty].
n20_90(90) --> [ninety].
/** <examples> ?- phrase(numeral(N),[nine,thousand,nine,hundred,ninety,nine]). */

We could use this DCG for parsing a given numeral, but also for generating the numeral corresponding to a given number:

?-phrase(numeral(2211),N).
  N = [two,thousand,two,hundred,eleven]

Exercise 7.3

Write a DCG that parses time indications like ‘twenty minutes to four’, and converts them to terms like 3:40.

Tip

A parser for Roman numerals (adapted from a post on the comp.compilers Google group).

roman(N) --> d(N4,4),d(N3,3),d(N2,2),d(N1,1),{N is N4*1000+N3*100+N2*10+N1}.

i(1) --> [i].
i(2) --> [x].
i(3) --> [c].
i(4) --> [m].

v(1) --> [v].
v(2) --> [l].
v(3) --> [d].

d(0,_) --> [].
d(1,N) --> i(N).
d(2,N) --> i(N),i(N).
d(3,N) --> i(N),i(N),i(N).
d(4,N) --> i(N),v(N).
%d(4,N) --> i(N),i(N),i(N),i(N).
d(5,N) --> v(N).
d(6,N) --> v(N),i(N).
d(7,N) --> v(N),i(N),i(N).
d(8,N) --> v(N),i(N),i(N),i(N).
d(9,N) --> i(N),{N1 is N+1},i(N1).
/** <examples> ?- phrase(roman(1984),R). ?- phrase(roman(Y),[m,c,m,l,x,x,x,i,v]). */

In this section, we have seen that writing parsers in Prolog is easy: just write the context-free grammar, possibly extended by arguments to non-terminals and Prolog goals in the body of grammar rules, and you have a program for parsing and sentence generation. However, parsing is not an end in itself: we want to assign an interpretation to a sentence. This is the topic of the following section.