Generalisation and specialisation

9.1. Generalisation and specialisation

An example is a ground fact for the predicate of which a definition is to be induced. A positive example is true in the intended interpretation, while a negative example is false. Consequently, the inductive \(Hypothesis\) should be such that for every positive example \(p\)

\[ Theory \cup Hypothesis \models p \]

while for every negative example \(n\)

\[ Theory \cup Hypothesis \nvDash n \]

We say that \(p\) is covered by \(Hypothesis\), given \(Theory\). For instance, if \(Hypothesis\) is the standard recursive definition of element/2:

element(X,[X|Z]).
element(X,[Y|Z]):-element(X,Z).

then the example element(b,[a,b]) is covered (with empty \(Theory\)). This can be demonstrated by a simple meta-interpreter for definite clauses. Note that this proof requires both of the above clauses. Alternatively, if element(b,[b]) is also known to be a positive example, we can say that element(b,[a,b]) is covered by the second, recursive clause alone. The first definition of coverage, which refers to the complete hypothesis, is called intensional coverage, while the second, referring to single clauses plus the rest of the examples, is called extensional coverage. In the induction programs to be developed, we will employ both notions of coverage; for the moment, however, the distinction is immaterial.

Exercise 9.2

Write a predicate covers_ex/3 which, given a clause, an example, and a list of positive examples, tests whether the clause extensionally covers the example.

If \(Hypothesis1\) covers at least all the examples covered by \(Hypothesis2\), we say that \(Hypothesis1\) is at least as general as \(Hypothesis2\), or that \(Hypothesis2\) is at least as specific as \(Hypothesis1\). From the definition of coverage, one can see that \(Hypothesis2\) must be a logical consequence of \(Hypothesis1\), given \(Theory\):

\[ Theory \cup Hypothesis1 \models Hypothesis2 \]

Suppose \(p\) is a positive example covered by \(Hypothesis1\) but not by \(Hypothesis2\). This means that \(Hypothesis2\) is too specific; if it is our current hypothesis, it needs to be generalised, for instance to \(Hypothesis1\). Similarly, if a hypothesis covers a negative example, it needs to be specialised. Generalisation and specialisation are the basic operations of induction.

Although we defined generality between hypotheses being sets of clauses, practical approaches to induction usually generalise or specialise single clauses. For instance, the following are clauses of increasing generality:

element(X,[Y|Z]):-element(X,Z).
element(X,V):-element(X,Z).
element(X,V).

This shows that a more specific clause can be constructed by adding a literal, by applying a substitution, or both. This relation of generality between clauses is called \(\theta\)-subsumption. Formally, Clause1 \(\theta\)-subsumes Clause2 if there is a substitution \(\theta\) that can be applied to Clause1, such that every literal in the resulting clause occurs in Clause2.

Notice that \(\theta\) only replaces variables in Clause1, not in Clause2. One way to test if such a \(\theta\) exists is to ground all variables in Clause2, and then unify the ground version of Clause2 with Clause1. Grounding the variables in a term can be done by means of the built-in predicate numbervars/3, which unifies different variables with terms of the form '$VAR(N)'.

theta_subsumes1((H:-B1),(H:-B2)):-
    ground(B2),
    subset(B1,B2).

ground(Term):-
    numbervars(Term,0,N).

%%% subset/2: see Section 10.2 (appendix)

This approach has the disadvantage that one or both clauses are changed after a call to theta_subsumes1/2. To avoid this, we apply the following little programming trick:

theta_subsumes((H1:-B1),(H2:-B2)):-
    not((H1=H2,ground(B2),
         not subset(B1,B2))).

theta_subsumes/2 succeeds exactly when theta_subsumes1/2 does, but by means of the double negation unifications are ‘undone’ after the call succeeds.

Next, we turn to the issue of how to construct generalisations of clauses. First we consider the simpler case of generalising two atoms. Consider the following two ground facts:

element(1,[1]).
element(z,[z,y,x]).

The following atom \(\theta\)-subsumes both of them:

element(X,[X|Y])

Note that this atom is \(\theta\)-subsumed by every other possible generalisation (such as element(X,[Y|Z]) or element(X,Y)). For this reason, it is called a least general generalisation under \(\theta\)-subsumption or \(\theta\)-LGG. \(\theta\)-LGG’s of atoms can be computed by means of anti-unification. This operation is the dual of unification. It operates by comparing the terms occurring at the same position in the two atoms, and replacing them by a new variable if they are different. The terms which have already been replaced by a variable are collected in two lists, because if the same pair of terms is encountered again, it should be replaced by the same variable (see 1 and z in the example above). For obvious reasons, such lists are called inverse substitutions.

:-op(600,xfx,'->'). % operator for inverse substitution

% anti_unify(T1,T2,T) <-  T is the anti-unification
%                         of T1 and T2
anti_unify(Term1,Term2,Term):-
    anti_unify(Term1,Term2,Term,[],_S1,[],_S2).

% anti-unification with inverse subst.s and accumulators
anti_unify(Term1,Term2,Term1,S1,S1,S2,S2):-
    Term1 == Term2,!.                         % same terms
anti_unify(Term1,Term2,V,S1,S1,S2,S2):-
    subs_lookup(S1,S2,Term1,Term2,V),!.       % already substituted
anti_unify(Term1,Term2,Term,S10,S1,S20,S2):-
    nonvar(Term1),nonvar(Term2),
    functor(Term1,F,N),functor(Term2,F,N),!,  % same
    functor(Term,F,N),                        % functor
    anti_unify_args(N,Term1,Term2,Term,S10,S1,S20,S2).
anti_unify(T1,T2,V,S10,[T1->V|S10],S20,[T2->V|S20]).

anti_unify_args(0,_Term1,_Term2,_Term,S1,S1,S2,S2).
anti_unify_args(N,Term1,Term2,Term,S10,S1,S20,S2):-
    N>0,N1 is N-1,
    arg(N,Term1,Arg1),
    arg(N,Term2,Arg2),
    arg(N,Term,Arg),
    anti_unify(Arg1,Arg2,Arg,S10,S11,S20,S21),
    anti_unify_args(N1,Term1,Term2,Term,S11,S1,S21,S2).

subs_lookup([T1->V|_Subs1],[T2->V|_Subs2],Term1,Term2,V):-
    T1 == Term1,
    T2 == Term2,!.  % no alternative solutions needed
subs_lookup([_S1|Subs1],[_S2|Subs2],Term1,Term2,V):-
    subs_lookup(Subs1,Subs2,Term1,Term2,V).
/** <examples> ?-anti_unify(2*2=2+2,2*3=3+3,T). ?-anti_unify(2*2=2+2,2*3=3+3,T,[],S1,[],S2). ?-anti_unify(2*(1/3)=2/3,3*(1/2)=3/2,T,[],S1,[],S2). */

The following query illustrates the operation of the program, including the use of inverse substitutions:

?-anti_unify(2*2=2+2,2*3=3+3,T,[],S1,[],S2).
  T = 2*X=X+X
  S1 = [2->X]
  S2 = [3->X]

Note that the inverse substitution [2->X] does not indicate which occurrences of 2 should be replaced by X. This means that S1 applied to the first term does not yield T (the inverse of S1 applied to T yields the first term, however). Therefore, a proper definition of inverse substitution should include the positions of terms which are to be replaced by variables. We will not elaborate this any further here.

The construction of the \(\theta\)-LGG of two clauses makes use of, but is more complicated than anti-unification. The basic difference with anti-unification is that the body of a clause is logically speaking unordered, whereas subterms within a term have fixed positions. Therefore, we cannot just compare the literals occurring at the same position in the respective bodies, but should consider all pairs of literals, one from each body. For instance, the \(\theta\)-LGG of the following two clauses

element(c,[b,c]):-element(c,[c]).
element(d,[b,c,d]):-element(d,[c,d]),element(d,[d]).

is the clause

element(X,[b,c|Y]):-element(X,[c|Y]),element(X,[X]).

The head of this clause is simply obtained by anti-unifying the heads of the original clauses, and the body is obtained by anti-unification of element(c,[c]) and element(d,[c,d]), giving element(X,[c|Y]), and anti-unification of element(c,[c]) and element(d,[d]), giving element(X,[X]).

The program for constructing \(\theta\)-LGG’s is given below. Note that the inverse substitutions found in each step are passed on to the next, so that the literals share variables.

:-op(900,fy,not).

% theta_lgg(C1,C2,C) <- C is the  θ-LGG of clause C1 and C2
theta_lgg((H1:-B1),(H2:-B2),(H:-B)):-
    anti_unify(H1,H2,H,[],S10,[],S20),             % heads
    theta_lgg_bodies(B1,B2,[],B,S10,_S1,S20,_S2).  % bodies

% select literal from first body...
theta_lgg_bodies([],_B2,B,B,S1,S1,S2,S2).
theta_lgg_bodies([L|B1],B2,B0,B,S10,S1,S20,S2):-
    theta_lgg_literal(L,B2,B0,B00,S10,S11,S20,S21),
    theta_lgg_bodies(B1,B2,B00,B,S11,S1,S21,S2).

% and one from second body
theta_lgg_literal(_L1,[],B,B,S1,S1,S2,S2).
theta_lgg_literal(L1,[L2|B2],B0,B,S10,S1,S20,S2):-
    same_predicate(L1,L2),anti_unify(L1,L2,L,S10,S11,S20,S21),
    theta_lgg_literal(L1,B2,[L|B0],B,S11,S1,S21,S2).
theta_lgg_literal(L1,[L2|B2],B0,B,S10,S1,S20,S2):-
    not same_predicate(L1,L2),
    theta_lgg_literal(L1,B2,B0,B,S10,S1,S20,S2).

% same_predicate(L1,L2) <- literals L1 and L2 have
%                          the same predicate and arity
same_predicate(L1,L2):-functor(L1,P,N),functor(L2,P,N).
/** <examples> ?-theta_lgg((element(c,[b,c]):-[element(c,[c])]), (element(d,[b,c,d]):-[element(d,[c,d]),element(d,[d])]), C). ?-theta_lgg((reverse([2,1],[3],[1,2,3]):-[reverse([1],[2,3],[1,2,3])]), (reverse([a],[],[a]):-[reverse([],[a],[a])]), C). */

To check the above example, we pose the following query:

?-theta_lgg((element(c,[b,c]):-[element(c,[c])]),
            (element(d,[b,c,d]):-
                 [element(d,[c,d]),element(d,[d])]), C).
  C = element(X,[b,c|Y]):-[element(X,[X]),element(X,[c|Y])]

Tip

Use portray_clause/1 to ‘pretty-print’ a clause with readable variable names:

?-theta_lgg((element(c,[b,c]):-[element(c,[c])]),
            (element(d,[b,c,d]):-
                 [element(d,[c,d]),element(d,[d])]), C),
  portray_clause(C).

Technically, this grounds the variables with numbervars/3 which was mentioned earlier.

Exercise 9.3

Determine the \(\theta\)-LGG of the following two clauses:

reverse([2,1],[3],[1,2,3]):-reverse([1],[2,3],[1,2,3]).
reverse([a],[],[a]):-reverse([],[a],[a]).

The relation between \(\theta\)-subsumption and logical consequence

If Clause1 \(\theta\)-subsumes Clause2, then also Clause1 \(\models\) Clause2. The reverse, however, is not always true. Consider the following two clauses:

list([V|W]):-list(W).
list([X,Y|Z]):-list(Z).

Given list([]), the first clause covers lists of arbitrary length, while the second covers only lists of even length. All lists covered by the second clause are also covered by the first, which is therefore more general. However, there is no substitution that can be applied to the first clause to yield the second (such a substitution should map W both to [Y|Z] and to Z, which is impossible).

It may seem that \(\models\) provides a better notion of generality than \(\theta\)-subsumption. However, such a semantic definition of generality introduces two problems. One is that it does not suggest a simple procedure to generalise clauses, as \(\theta\)-subsumption does. The second problem is that LGG’s under logical consequence are not always unique. Consider the two clauses

list([A,B|C]):-list(C).
list([P,Q,R|S]):-list(S).

Under logical consequence, these clauses have two LGG’s: one is list([X|Y]):-list(Y), and the other is list([X,Y|Z]):-list(V). Under \(\theta\)-subsumption, only the latter is an LGG. Note that the first LGG looks in fact more plausible!

In the following section we develop a program which generalises the examples by constructing \(\theta\)-LGG’s. This corresponds to a specific-to-general search of the space of possible predicate definitions; it is also called bottom-up induction. Alternatively, one could start with the most general definition, which is specialised as long as it covers some negative example. A program for top-down induction is given in Section 9.3.