(sec:9.1)=
# Generalisation and specialisation #
An *example* is a ground fact for the predicate of which a definition is to be induced. A *positive* example is true in the intended interpretation, while a *negative* example is false. Consequently, the inductive $Hypothesis$ should be such that for every positive example $p$
$$
Theory \cup Hypothesis \models p
$$
while for every negative example $n$
$$
Theory \cup Hypothesis \nvDash n
$$
We say that $p$ is *covered* by $Hypothesis$, given $Theory$. For instance, if $Hypothesis$ is the standard recursive definition of `element/2`:
```Prolog
element(X,[X|Z]).
element(X,[Y|Z]):-element(X,Z).
```
then the example `element(b,[a,b])` is covered (with empty $Theory$). This can be demonstrated by a simple meta-interpreter for definite clauses. Note that this proof requires **both** of the above clauses. Alternatively, if `element(b,[b])` is also known to be a positive example, we can say that `element(b,[a,b])` is covered by the second, recursive clause alone. The first definition of coverage, which refers to the complete hypothesis, is called *intensional* coverage, while the second, referring to single clauses plus the rest of the examples, is called *extensional* coverage. In the induction programs to be developed, we will employ both notions of coverage; for the moment, however, the distinction is immaterial.
```{exercise} ex:9.2
```
+++
If $Hypothesis1$ covers at least all the examples covered by $Hypothesis2$, we say that $Hypothesis1$ is at least as *general* as $Hypothesis2$, or that $Hypothesis2$ is at least as *specific* as $Hypothesis1$. From the definition of coverage, one can see that $Hypothesis2$ must be a logical consequence of $Hypothesis1$, given $Theory$:
$$
Theory \cup Hypothesis1 \models Hypothesis2
$$
Suppose $p$ is a positive example covered by $Hypothesis1$ but not by $Hypothesis2$. This means that $Hypothesis2$ is too specific; if it is our current hypothesis, it needs to be *generalised*, for instance to $Hypothesis1$. Similarly, if a hypothesis covers a negative example, it needs to be *specialised*. Generalisation and specialisation are the basic operations of induction.
+++
Although we defined generality between hypotheses being **sets** of clauses, practical approaches to induction usually generalise or specialise single clauses. For instance, the following are clauses of increasing generality:
```Prolog
element(X,[Y|Z]):-element(X,Z).
element(X,V):-element(X,Z).
element(X,V).
```
This shows that a more specific clause can be constructed by adding a literal, by applying a substitution, or both. This relation of generality between clauses is called $\theta$-subsumption. Formally, `Clause1` *$\theta$-subsumes* `Clause2` if there is a substitution $\theta$ that can be applied to `Clause1`, such that every literal in the resulting clause occurs in `Clause2`.
+++
Notice that $\theta$ only replaces variables in `Clause1`, not in `Clause2`. One way to test if such a $\theta$ exists is to ground all variables in `Clause2`, and then unify the ground version of `Clause2` with `Clause1`. Grounding the variables in a term can be done by means of the built-in predicate `numbervars/3`, which unifies different variables with terms of the form `'$VAR(N)'`.
```Prolog
theta_subsumes1((H:-B1),(H:-B2)):-
ground(B2),
subset(B1,B2).
ground(Term):-
numbervars(Term,0,N).
%%% subset/2: see Section 10.2 (appendix)
```
This approach has the disadvantage that one or both clauses are changed after a call to `theta_subsumes1/2`. To avoid this, we apply the following little programming trick:
```Prolog
theta_subsumes((H1:-B1),(H2:-B2)):-
not((H1=H2,ground(B2),
not subset(B1,B2))).
```
`theta_subsumes/2` succeeds exactly when `theta_subsumes1/2` does, but by means of the double negation unifications are 'undone' after the call succeeds.
+++
Next, we turn to the issue of how to construct generalisations of clauses. First we consider the simpler case of generalising two atoms. Consider the following two ground facts:
```Prolog
element(1,[1]).
element(z,[z,y,x]).
```
The following atom $\theta$-subsumes both of them:
```Prolog
element(X,[X|Y])
```
Note that this atom is $\theta$-subsumed by every other possible generalisation (such as `element(X,[Y|Z])` or `element(X,Y)`). For this reason, it is called a *least general generalisation under $\theta$-subsumption* or $\theta$-LGG. $\theta$-LGG's of atoms can be computed by means of *anti-unification*. This operation is the dual of unification. It operates by comparing the terms occurring at the same position in the two atoms, and replacing them by a new variable if they are different. The terms which have already been replaced by a variable are collected in two lists, because if the same pair of terms is encountered again, it should be replaced by the same variable (see `1` and `z` in the example above). For obvious reasons, such lists are called *inverse substitutions*.
```{swish} swish:9.1.1
---
query-id: swishq:9.1.1
query-text: ?-anti_unify(2*2=2+2,2*3=3+3,T). ?-anti_unify(2*(1/3)=2/3,3*(1/2)=3/2,T,[],S1,[],S2).
---
```
The following query illustrates the operation of the program, including the use of inverse substitutions:
```{swish-query} swishq:9.1.1
?-anti_unify(2*2=2+2,2*3=3+3,T,[],S1,[],S2).
T = 2*X=X+X
S1 = [2->X]
S2 = [3->X]
```
Note that the inverse substitution `[2->X]` does not indicate which occurrences of `2` should be replaced by `X`. This means that `S1` applied to the first term does not yield `T` (the inverse of `S1` applied to `T` yields the first term, however). Therefore, a proper definition of inverse substitution should include the positions of terms which are to be replaced by variables. We will not elaborate this any further here.
+++
The construction of the $\theta$-LGG of two clauses makes use of, but is more complicated than anti-unification. The basic difference with anti-unification is that the body of a clause is logically speaking unordered, whereas subterms within a term have fixed positions. Therefore, we cannot just compare the literals occurring at the same position in the respective bodies, but should consider all pairs of literals, one from each body. For instance, the $\theta$-LGG of the following two clauses
```Prolog
element(c,[b,c]):-element(c,[c]).
element(d,[b,c,d]):-element(d,[c,d]),element(d,[d]).
```
is the clause
```Prolog
element(X,[b,c|Y]):-element(X,[c|Y]),element(X,[X]).
```
The head of this clause is simply obtained by anti-unifying the heads of the original clauses, and the body is obtained by anti-unification of `element(c,[c])` and `element(d,[c,d])`, giving `element(X,[c|Y])`, and anti-unification of `element(c,[c])` and `element(d,[d])`, giving `element(X,[X])`.
+++
The program for constructing $\theta$-LGG's is given below. Note that the inverse substitutions found in each step are passed on to the next, so that the literals share variables.
```{swish} swish:9.1.2
---
inherit-id: swish:9.1.1
---
```
To check the above example, we pose the following query:
```Prolog
?-theta_lgg((element(c,[b,c]):-[element(c,[c])]),
(element(d,[b,c,d]):-
[element(d,[c,d]),element(d,[d])]), C).
C = element(X,[b,c|Y]):-[element(X,[X]),element(X,[c|Y])]
```
````{tip}
Use `portray_clause/1` to 'pretty-print' a clause with readable variable names:
```Prolog
?-theta_lgg((element(c,[b,c]):-[element(c,[c])]),
(element(d,[b,c,d]):-
[element(d,[c,d]),element(d,[d])]), C),
portray_clause(C).
```
Technically, this grounds the variables with `numbervars/3` which was mentioned earlier.
````
```{exercise} ex:9.3
```
````{infobox}
---
title: The relation between $\theta$-subsumption and logical consequence
---
If `Clause1` $\theta$-subsumes `Clause2`, then also `Clause1` $\models$ `Clause2`. The reverse, however, is not always true. Consider the following two clauses:
```Prolog
list([V|W]):-list(W).
list([X,Y|Z]):-list(Z).
```
Given `list([])`, the first clause covers lists of arbitrary length, while the second covers only lists of even length. All lists covered by the second clause are also covered by the first, which is therefore more general. However, there is no substitution that can be applied to the first clause to yield the second (such a substitution should map `W` both to `[Y|Z]` and to `Z`, which is impossible).
+++
It may seem that $\models$ provides a better notion of generality than $\theta$-subsumption. However, such a semantic definition of generality introduces two problems. One is that it does not suggest a simple procedure to generalise clauses, as $\theta$-subsumption does. The second problem is that LGG's under logical consequence are not always unique. Consider the two clauses
```Prolog
list([A,B|C]):-list(C).
list([P,Q,R|S]):-list(S).
```
Under logical consequence, these clauses have two LGG's: one is `list([X|Y]):-list(Y)`, and the other is `list([X,Y|Z]):-list(V)`. Under $\theta$-subsumption, only the latter is an LGG. Note that the first LGG looks in fact more plausible!
````
+++
In the following section we develop a program which generalises the examples by constructing $\theta$-LGG's. This corresponds to a *specific-to-general* search of the space of possible predicate definitions; it is also called *bottom-up* induction. Alternatively, one could start with the most general definition, which is specialised as long as it covers some negative example. A program for *top-down* induction is given in {numref}`sec:9.3`.