III

In Part I, we introduced the formalism of clausal logic and showed how it can be used in practice to perform logical inferences. In Part II, we discussed the basic issues one encounters when writing a program to solve some reasoning task: how to represent the knowledge needed to solve the task, and how to search the space of possible solutions. In Part III, we will go beyond the power of clausal logic in a number of ways.

Why would one want to have a formalism more powerful than first-order clausal logic? One reason could be that we want to perform inferences that are simply not expressible in first-order clausal logic. We might want to express knowledge such as ‘he inherited all his father’s bad characteristics’, which is a second-order statement (section 2.5). We might want to express statements like ‘Peter believes his boss knows he is writing a book’, where ‘Peter’s boss knows’ is a modality of the formula ‘Peter is writing a book’, and ‘Peter believes’ is a modality of the formula ‘Peter’s boss knows Peter is writing a book’. We might want to reason about sequences of events happening over time. Each of these examples requires a specialised logic extending the syntax of first-order logic. Needless to say, this increased expressiveness also requires more powerful semantics and proof theory.

There are also reasoning tasks which use the language of clausal logic, but differ nonetheless from standard clausal logic in the validity of the conclusions drawn. For instance, the truth of a conclusion might not be guaranteed but only plausible, given the premises. Alternatively, a conclusion might be a general theory derived from a number of specific observations, a theory which might be falsified by new contradicting evidence. Typically, such non-standard reasoning tasks require a more elaborate semantic and proof-theoretic characterisation than required for standard logical inference.

Thirdly, the knowledge required for the reasoning task might be available in non-logical form only: think of pictures, spoken or written text, or video images. In such cases, the reasoning task requires pre- and postprocessing stages, in which the non-logical data are converted to and from logical formulas.

In the following three chapters, we will encounter each of these three types of reasoning. Chapter 7 is devoted to reasoning with knowledge expressed in natural language. We demonstrate how to translate sentences like ‘Socrates is human’ and ‘all humans are mortal’, and questions like ‘is Socrates mortal?’ into clausal logic, and how to obtain a natural language sentence as the answer. In Chapter 8, we discuss a number of approaches to reasoning with incomplete information. Most of these approaches are of the second type, extending semantics and proof theory but not syntax of clausal logic; one approach extends syntax as well. We provide a detailed discussion of how these approaches are related. Finally, inductive reasoning is discussed in Chapter 9. Induction aims at completing partial knowledge about specific instances of a theory, and is therefore, although related to, much harder than the forms of reasoning with incomplete knowledge discussed in Chapter 8. We give an in-depth analysis of the problem, and develop two Prolog programs that can inductively infer simple predicate definitions from exampes.

7

## Reasoning with natural language

A language which is used for communication between humans is commonly called a natural language, in order to distinguish it from an artificial computer language. Despite their apparent differences, artificial and natural language can be described by the same tools, some of which will be studied in this chapter.

Natural language can be described on a number of different levels:

(i)   Prosody: rhythm and intonation of spoken language;

(ii)  Phonology: how to combine simple sounds (phonemes) in spoken language;

(iiiMorphology: how to build words from meaningful components (morphemes);

(ivSyntax: how to build sentences from words;

(v)  Semantics: how to assign meaning to words and sentences;

(viPragmatics: how to use sentences in communication.

Here, we are mainly concerned with written language, so we will not talk about prosody and phonology. Morphology tells us, for instance, that in English the plural of many nouns can be obtained by adding the suffix -s (house–houses, chair–chairs). Syntax allows us to distinguish well-formed sentences (like ‘I sleep’) from ill-formed ones (like ‘me sleeps’), and to discover their grammatical structure. Semantics allows us to understand sentences like ‘time flies like an arrow, but fruit flies like a banana’. Pragmatics tells us that ‘yes’ is in general not a very helpful answer to questions of the form ‘do you know …?’.

It should be noted that this distinction between different levels is not as clear-cut as it may seem. For instance, the sentences ‘time flies like an arrow’ and ‘fruit flies like a banana’ look very similar; yet, semantic analysis shows that they have a different grammatical structure: ‘time (noun) flies (verb) like an arrow’ in the first case, and ‘fruit flies (noun) like (verb) a banana (noun phrase)’ in the second. That is, both sentences have at least two possible grammatical structures, and we need semantics to prefer one over the other.

Without doubt, the language level which has been formalised most successfully is the syntactic level. The process of deriving the grammatical structure of a sentence is called parsing. The outcome of the parsing process is a parse tree, showing the grammatical constituents of the sentence, like verb phrase and noun phrase. This grammatical structure can be further used in the semantic analyis of the sentence. The reverse process, starting from a semantic representation and producing a sentence, is called sentence generation. It is applied in dialogue systems, where answers to queries must be formulated in natural language.

### 7.1   Grammars and parsing

The syntax of a language is specified by a grammar, which is a set of grammar rules of the form

Category1 --> Category2,Category3

Category2 --> [Terminal]

Here, CategoryX denotes a syntactic category, specifying the type of a sentence part (e.g. noun, noun phrase, etc.). The first rule states that a Category2 followed by a Category3 is a Category1. For instance, the fact that a sentence may consist of a noun phrase followed by a verb phrase is expressed by the rule

sentence --> noun_phrase,verb_phrase

A terminal is any word which occurs in the language. The second rule above assigns a syntactic category to a word. For instance:

noun --> [bicycle]

Syntactic categories are also called non-terminals.

A grammar which specifies a tiny bit of the English language is given below. As in clausal logic, grammar rules are separated by periods.

```sentence             --> noun_phrase,verb_phrase.
noun_phrase          --> proper_noun.
noun_phrase          --> article,noun.
verb_phrase          --> intransitive_verb.
verb_phrase          --> transitive_verb,noun_phrase.
article              --> [the].
proper_noun          --> [achilles].
noun                 --> [turtle].
intransitive_verb    --> [sleeps].
transitive_verb      --> [beats].
```

Some sentences generated by this grammar are: ‘the lazy turtle sleeps’, ‘Achilles beats the turtle’, and ‘the rapid turtle beats Achilles’. The grammatical structure of these sentences can be described by a parse tree, which is a tree containing the words of the sentence as leaves, and the syntactic categories assigned to parts of the sentence as nodes (fig. 7.1). Figure 7.1. Parse tree for the sentence ‘the rapid turtle beats Achilles’.

Exercise 7.1. Redraw this parse tree in the manner of an SLD proof tree, where ‘resolvents’ are partially parsed sentences such as
[the],[rapid],noun,verb_phrase
and ‘clauses’ are grammar rules.

Such a parse tree can be constructed by starting with the non-terminal sentence, and repeatedly replacing non-terminals by the righthand side of an applicable rule, until the given sentence is obtained as a sequence of terminals. This method is called top-down parsing. Alternatively, we could start with the sentence and look for parts of it which occur on the righthand side of a rule, and replace that part of the sentence with the non-terminal on the lefthand side of the rule, until we obtain the single non-terminal sentence. This procedure is called bottom-up parsing. It should be noted that both methods require search: at any stage, several rules might be applicable.

Exercise 7.2. Draw the search space generated by the above grammar for a top-down parse, if grammar rules are applied to sentences from left to right. Discuss the similarities and differences with SLD-trees.

In general, grammar rules are allowed to be recursive. For instance, a noun phrase can contain several adjectives, as described by the following rules:

```sentence             --> noun_phrase.
noun_phrase          --> article,noun_phrase2.
noun_phrase2         --> noun.
article              --> [the].
noun                 --> [turtle].
```

This set of rules allows ‘the lazy rapid turtle’ as a noun phrase. Recursion extends the descriptive power of a grammar considerably, by allowing repetitive structures.

Grammars like the ones we have seen are called context-free grammars. This name derives from the fact that only one non-terminal is allowed on the left of a grammar rule. A grammar rule which contains several non-terminals on its lefthand side is called context-sensitive: some of those non-terminals act as a context for the others, allowing the rule to be used only when that context is present. As an example, consider a grammar which would rule out sentences like ‘the turtles sleeps’, in which the ‘plurality’ (singular, plural) of noun and verb disagree. A candidate would be:

```noun_phrase 			--> article,noun.
plurality 			--> singular.
plurality			--> plural.
verb_phrase			--> intransitive_verb.
article				--> [the].
noun,singular			--> [turtle],singular.
noun,plural			--> [turtles],plural.
singular,intransitive_verb	--> [sleeps].
plural,intransitive_verb	--> [sleep].
```

In this grammar, the non-terminal plurality creates a context for the applicability of the rewrite rules for noun and intransitive verb. Procedural programming languages like Pascal are also, to some extent, context-sensitive: statements like X:=10 can only be parsed in the context created by the declaration of the variable X (if it is declared to be a Boolean, the statement is illegal). Apart from this, such programming languages are context-free: each statement can be parsed without referring to its context.

Context-sensitive grammars greatly increase the complexity of the parsing task; moreover, the grammatical structure of sentences cannot be simply described by a parse tree. In this chapter, we will restrict attention to context-free grammars, extended with some Prolog-specific features. The resulting grammars are called Definite Clause Grammars, and will be introduced in the next section.

### 7.2   Definite Clause Grammars

If we want to build a parser in Prolog, we need a representation for sentences. Ignoring capitals and punctuation marks, a sentence can simply be represented by the list of its words in the same order, for instance

[the,rapid,turtle,beats,achilles]

Given this representation, a grammar rule like

sentence --> noun_phrase,verb_phrase

has the following meaning: a list of words representes a sentence, if some first part of it represents a noun phrase, and the rest represents a verb phrase. This statement can easily be expressed as a Prolog clause:

sentence(S):-
noun_phrase(NP),
verb_phrase(VP),
append(NP,VP,S)

Similarly, a grammar rule containing a terminal

verb --> [sleeps]

means: a list of words represents a verb if it is the list consisting of the single word ‘sleeps’. Translated to Prolog:

verb([sleeps])

Obviously, there is a very close relationship between context-free grammar rules and definite clauses, and any context-free grammar can easily be translated to a set of Prolog clauses. The exciting thing about this is that these Prolog clauses are nothing less than a parsing program: for instance, we could ask the query

?-sentence([the,rapid,turtle,beats,achilles]). Figure 7.2. The use of difference lists in grammar rules.

We can actually push the correspondence between grammar rules and definite clauses further by employing difference lists (section 3.6). This allows us to get rid of the append literals:

sentence(NP1-VP2):-
noun_phrase(NP1-VP1),
verb_phrase(VP1-VP2)

This clause should be read as follows: NP1 is a sentence followed by VP2, if NP1 is a noun phrase followed by VP1, and VP1 is a verb phrase followed by VP2 (fig. 7.2). Queries now should take the form

?-sentence([the,rapid,turtle,beats,achilles]-[])

(after parsing the initial part of the list as a sentence, nothing should be left).

We have shown that there is a one-to-one correspondence between context-free grammars and Prolog programs interpreting those grammars. In fact, the translation from the first to the second is so straightforward that it is built into Prolog. That is, meta-level grammar rules like

sentence --> noun_phrase,verb_phrase

are allowed in Prolog programs. When interpreting these rules, Prolog will invisibly convert them to object-level program clauses like

sentence(L,L0):-
noun_phrase(L,L1),
verb_phrase(L1,L0)

in which the additional variable is an accumulator rather than the minus list of a difference list (section 3.6). Furthermore, Prolog provides the meta-level predicate phrase/2, such that the object-level query ?-sentence(L,[]) can be replaced by the meta-level query ?-phrase(sentence,L) (fig. 7.3). Figure 7.3. Meta-level and object-level in Definite Clause Grammars.

These Prolog grammars are known as Definite Clause Grammars (DCG’s). They are an excellent illustration of the power of declarative programming: specifying a grammar gives you the parser for free. That is, a grammar is a declarative specification of the corresponding parser, and Prolog directly converts this specification into an executable parser. Moreover, since a grammar is purely declarative, the program is also a sentence generator: for instance, it is possible to generate every sentence starting with ‘Achilles’ by means of the query ?-phrase(sentence,[achilles|Rest]).

Definite Clause Grammars further extend the power of context-free grammars in two ways:

(i)   arguments can be added to non-terminals;

(ii)  Prolog goals can be added to the body of grammar rules.

As an illustration of the first feature, we show how plurality agreement can be achieved by means of a DCG instead of a context-sensitive grammar:

```sentence                      --> noun_phrase(N),verb_phrase(N).
noun_phrase(N)                --> article(N),noun(N).
verb_phrase(N)                --> intransitive_verb(N).
article(singular)             --> [a].
article(singular)             --> [the].
article(plural)               --> [the].
noun(singular)                --> [turtle].
noun(plural)                  --> [turtles].
intransitive_verb(singular)   --> [sleeps].
intransitive_verb(plural)     --> [sleep].
```

The first rule states that the pluralities of noun phrase and verb phrase should correspond. The second rule states that the plurality of a noun phrase is determined by both article and noun, which should have corresponding pluralities as well. The remaining rules assign pluralities to specific articles, nouns and verbs.

We can also use this feature to construct a parse tree while parsing a sentence. Parse trees can be represented by Prolog terms (section 4.1):

• a parse tree for a terminal T of syntactic category S is represented by the term S(T);

• a parse tree for a sequence N1Nk of non-terminals of syntactic category S is represented by the term S(N1,,Nk).

Thus, a parse tree for the verb ‘sleeps’ is represented by the term verb(sleeps), and a parse tree for the sentence ‘the turtle sleeps’ is represented by the term

s(np(art(the),n(turtle)),vp(iv(sleeps)))

(for brevity, syntactic categories are abbreviated). The following grammar indicates how parse trees are built up from their constituents.

```%:-use_rendering(svgtree).	% uncomment to render parse tree graphically

sentence(s(NP,VP))            --> noun_phrase(NP),
verb_phrase(VP).
noun_phrase(np(N))            --> proper_noun(N).
noun(N).
noun_phrase(np(Art,N))        --> article(Art),noun(N).
verb_phrase(vp(IV))           --> intransitive_verb(IV).
verb_phrase(vp(TV,NP))        --> transitive_verb(TV),
noun_phrase(NP).
article(art(the))             --> [the].
proper_noun(pn(achilles))     --> [achilles].
noun(n(turtle))               --> [turtle].
intransitive_verb(iv(sleeps)) --> [sleeps].
transitive_verb(tv(beats))    --> [beats].
```

In the query, the argument of the non-terminal sentence will be instantiated to the final parse tree:

?-phrase(sentence(T),[achilles,beats,the,lazy,turtle]).
T = s(np(pn(achilles)),
vp(tv(beats),
np(art(the),
n(turtle))))

If we use the predicate term_write/1 given in section 4.1, a nice tree-like output is obtained:

?-phrase(sentence(T),[achilles,beats,the,lazy,turtle]),
term_write(T).

---------s--------np--------pn--achilles
--------vp--------tv-----beats
--------np-------art-------the
---------n----turtle

These examples show one way to use arguments of non-terminals: to collect information coming out of the parsing process. In addition, we might want to express that arguments of different non-terminals in a rule are related in some way. To this end, we can add Prolog goals to the body of grammar rules, by enclosing them in curly brackets {}. For instance, suppose we have a grammar for English numerals like ‘one hundred twenty three’, and we want to calculate the number represented by such numerals during parsing. We could write the following DCG:

```numeral(N) --> n1_999(N).
numeral(N) --> n1_9(N1),[thousand],n1_999(N2),{N is N1*1000+N2}.

n1_999(N)  --> n1_99(N).
n1_999(N)  --> n1_9(N1),[hundred],n1_99(N2),{N is N1*100+N2}.

n1_99(N)   --> n0_9(N).
n1_99(N)   --> n10_19(N).
n1_99(N)   --> n20_90(N).
n1_99(N)   --> n20_90(N1),n1_9(N2),{N is N1+N2}.

n0_9(0)    --> [].
n0_9(N)    --> n1_9(N).

n1_9(1)    --> [one].
n1_9(2)    --> [two].
n1_9(3)    --> [three].
n1_9(4)    --> [four].
n1_9(5)    --> [five].
n1_9(6)    --> [six].
n1_9(7)    --> [seven].
n1_9(8)    --> [eight].
n1_9(9)    --> [nine].

n10_19(10) --> [ten].
n10_19(11) --> [eleven].
n10_19(12) --> [twelve].
n10_19(13) --> [thirteen].
n10_19(14) --> [fourteen].
n10_19(15) --> [fifteen].
n10_19(16) --> [sixteen].
n10_19(17) --> [seventeen].
n10_19(18) --> [eighteen].
n10_19(19) --> [nineteen].

n20_90(20) --> [twenty].
n20_90(30) --> [thirty].
n20_90(40) --> [fourty].
n20_90(50) --> [fifty].
n20_90(60) --> [sixty].
n20_90(70) --> [seventy].
n20_90(80) --> [eighty].
n20_90(90) --> [ninety].
```

We could use this DCG for parsing a given numeral, but also for generating the numeral corresponding to a given number:

```?- phrase(numeral(2211),N).
```

N = [two,thousand,two,hundred,eleven]

Exercise 7.3. Write a DCG that parses time indications like ‘twenty minutes to four’, and converts them to terms like 3:40.

In this section, we have seen that writing parsers in Prolog is easy: just write the context-free grammar, possibly extended by arguments to non-terminals and Prolog goals in the body of grammar rules, and you have a program for parsing and sentence generation. However, parsing is not an end in itself: we want to assign an interpretation to a sentence. This is the topic of the following section.

### 7.3   Interpretation of natural language

Suppose we want to build a rulebase consisting of rules like ‘every human is mortal’ and ‘Socrates is a human’. A small grammar for rules of this form is given below.

```sentence      --> determiner,noun,verb_phrase.
sentence      --> proper_noun,verb_phrase.
verb_phrase   --> [is],property.
property      --> [a],noun.
property      --> [mortal].
determiner    --> [every].
proper_noun   --> [socrates].
noun          --> [human].
```

If the rulebase consists of Prolog clauses, then we need a way to convert natural language rules to clauses. For instance, ‘every man is human’ must be translated to the clause human(X):-man(X). The clause represents the meaning of the sentence, and assigning clauses to sentences can be seen as interpreting the sentences.

We will build such an interpreter by extending each non-terminal in the above grammar with one or more arguments, which give the meaning of that non-terminal. We start with the simplest case: the meaning of the proper noun ‘Socrates’ is the term socrates:

proper_noun(socrates) --> [socrates]

Proper nouns occur in the second rule for sentences:

sentence --> proper_noun,verb_phrase

which can be used to construct the sentence ‘Socrates is a human’. The meaning of this sentence is the clause human(socrates):-true, which can be constructed as follows:

sentence((P(X):-true)) --> proper_noun(X),verb_phrase(P)

This rule states: P(X):-true is the meaning of a sentence if it is composed of a proper noun with meaning X followed by a verb phrase with meaning P.

However, there are several problems with this grammar rule. For one thing, not every Prolog interpreter allows a variable in functor position, as in P(X). This could be solved by constructing the literal P(X) separately by means of a Prolog goal:

sentence((L:-true)) --> proper_noun(X),verb_phrase(P),
{L=..[P,X]}

A more serious problem, however, is that verb phrases are not necessarily interpreted as unary predicates. For instance, transitive verbs are interpreted as binary predicates, and the meaning of the verb phrase ‘likes Achilles’ is the literal likes(X,achilles), where X is the meaning of the proper noun preceding the verb phrase.

In general, a verb phrase defines a mapping from a term X to a literal L:

sentence((L:-true)) --> proper_noun(X),verb_phrase(X=>L)

The declarative reading of this rule is: a sentence is interpreted as L:-true if it starts with a proper noun with meaning X, and it ends with a verb phrase whose meaning is applied to X to yield L. The meaning of the verb phrase is a mapping from terms to literals indicated as X=>L, where ‘ => ’ is a user-defined operator. In our case, the mapping is determined by the property in the verb phrase:

verb_phrase(M)               --> [is],property(M).
property(M)                  --> [a],noun(M).
property(X=>mortal(X))       --> [mortal].
noun(X=>human(X))            --> [human].

For instance, the declarative reading of the last rule is: the meaning of the noun ‘human’ is a mapping from X to human(X).

Exercise 7.4. Extend the following grammar rules with arguments expressing their interpretation:
verb_phrase     --> transitive_verb,proper_noun.
transitive_verb --> [likes].

It remains to consider the first rule for sentences:

sentence     --> determiner,noun,verb_phrase

which constructs sentences like ‘every human is mortal’. As explained above, the meaning of the noun in this sentence is the mapping from X to human(X), and the meaning of the verb phrase is the mapping from X to mortal(X). These two mappings are ‘glued together’ by the non-terminal determiner:

```sentence(C)   			--> 	determiner(M1,M2,C), noun(M1),verb_phrase(M2).
determiner(X=>B,X=>H,(H:-B))  	--> 	[every].
```

One could say that the meaning of the determiner ‘every’ is a second-order mapping which, given the mappings defined by the noun and verb phrase, determines a clause. Note that the noun determines the body literal, while the verb phrase determines the head; note also that the variables in the two literals are unified in the determiner rule.

With this DCG, the query ?-phrase(sentence(C),S) now produces the following answers:

C = human(X):-human(X)
S = [every,human,is,a,human];
C = mortal(X):-human(X)
S = [every,human,is,mortal];
C = human(socrates):-true
S = [socrates,is,a,human];
C = mortal(socrates):-true
S = [socrates,is,mortal]

Note that this very simple language already allows some form of reasoning: for instance, given the second and third sentence, we could conclude the fourth. We will implement a program which performs this kind of reasoning, taking sentences and questions in natural language, converting them to clausal logic, and converting the answers back to natural language. In order to make the program a bit more interesting, we will extend the grammar with existentially quantified sentences.

Consider the sentence ‘some living beings are mortal’, where ‘some’ is a determiner. The meaning of this sentence is ‘some things are living beings, and they are mortal’, which can be expressed by two clauses:

living_being(sk):-true
mortal(sk):-true.

where sk is a Skolem constant introducing a new name for the things known to exist (see section 2.5). The two head literals in these clauses are determined by the noun and the verb phrase, and the only thing we have to do is to substitute the Skolem constant and add the empty body:

determiner(sk=>H1,sk=>H2,[(H1:-true),(H2:-true)])--> [some]

The complete DCG is given below. Since the determiner ‘some’ requires a plural form of noun and verb phrase, an argument for plurality (s for singular, p for plural) has been added to each non-terminal. Furthermore, since the determiner ‘some’ results in a list of clauses, the other rules for determiner and sentence have been changed accordingly.

```:-op(600,xfy,'=>').
sentence(C)                   --> determiner(N,M1,M2,C), noun(N,M1), verb_phrase(N,M2).
sentence([(L:-true)])         --> proper_noun(N,X), verb_phrase(N,X=>L).
verb_phrase(s,M)              --> [is],property(s,M).
verb_phrase(p,M)              --> [are],property(p,M).
property(s,M)                 --> [a],noun(s,M).
property(p,M)                 --> noun(p,M).
property(_N,X=>mortal(X))      --> [mortal].
determiner(s,X=>B,X=>H,[(H:-B)]) --> [every].
determiner(p,sk=>H1,sk=>H2,[(H1:-true),(H2:-true)]) --> [some].
proper_noun(s,socrates)       --> [socrates].
noun(s,X=>human(X))           --> [human].
noun(p,X=>human(X))           --> [humans].
noun(s,X=>living_being(X))    --> [living],[being].
noun(p,X=>living_being(X))    --> [living],[beings].
```

In addition, we give a small grammar for allowable questions, which are of the form ‘who is mortal?’, ‘is Socrates mortal?’, and ‘are some living beings mortal?’:

```question(Q)          --> [who],[is],property(s,_X=>Q).
question(Q)          --> [is],proper_noun(N,X),
property(N,X=>Q).
question((Q1,Q2))    --> [are],[some],noun(p,sk=>Q1),
property(p,sk=>Q2).
```

The program below is a shell for interactively building up and querying a small rulebase. User inputs are handled by the predicate handle_input/2; possible inputs are ‘stop’, ‘show’, a new rule, or a question. For the latter to be answered, we use a simple depth-first meta-interpreter, which possibly instantiates variables in the query. For instance, the question ‘who is mortal’ is interpreted as the goal mortal(X), which is instantiated by the meta-interpreter to mortal(socrates).

Interestingly, for transforming this answer back to natural language we do not need a separate grammar for answers: we can use the existing grammar for sentences! For instance, we can generate the answer ‘Socrates is mortal’ by means of the query

Therefore, the only thing we have to do after the meta-interpreter has found an answer is to transform the instantiated query (a conjunction of literals) to a list of clauses with empty body (see predicate transform/2). Again, we encounter the declarative power of DCG’s, which can at the same time be used for interpreting natural language sentences, and for constructing sentences that express a certain logical meaning.

```% natural language shell
nl_shell(Rulebase):- get_input(Input),handle_input(Input,Rulebase).

% handle input from user
handle_input(stop,_Rulebase):-!.
handle_input(show,Rulebase):-!,show_rules(Rulebase),nl_shell(Rulebase).
handle_input(Sentence,Rulebase):-phrase(sentence(Rule),Sentence),!,nl_shell([Rule|Rulebase]).
handle_input(Question,Rulebase):-phrase(question(Query),Question),prove_rb(Query,Rulebase),!,

% show current rulebase
show_rules([]).

% meta-interpreter
prove_rb(true,_Rulebase):-!.
prove_rb((A,B),Rulebase):-!,prove_rb(A,Rulebase),prove_rb(B,Rulebase).
prove_rb(A,Rulebase):-find_clause((A:-B),Rulebase),prove_rb(B,Rulebase).

% find applicable clause in rulebase
find_clause(Clause,[Rule|_Rules]):- copy_element(Clause,Rule).
find_clause(Clause,[_Rule|Rules]):- find_clause(Clause,Rules).

copy_element(X,Ys):-element(X1,Ys),copy_term(X1,X).

% element(X,Ys) <- X is an element of the list Ys
element(X,[X|_Ys]).
element(X,[_Y|Ys]):-element(X,Ys).

transform((A,B),[(A:-true)|Rest]):-!,transform(B,Rest).
transform(A,[(A:-true)]).

% get input from user

```

A conversation with this program might proceed as follows (following ? is user input, following ! is program output):

? [every,human,is,mortal].
? [socrates,is,a,human].
? [who,is,mortal].
! [socrates,is,mortal]
? [some,living,beings,are,humans].
? show.
! [some,living,beings,are,humans]
! [socrates,is,a,human]
! [every,human,is,mortal]
? [are,some,living,beings,mortal].
! [some,living,beings,are,mortal]
? stop.

Exercise 7.5. The predicates for user-interaction nl_shell/1 and handle_input/2 are mutually recursive. This might cause memory
problems in longer sessions. Rewrite the interactive loop into a
so-called failure-driven loop:
shell:-repeat,get_input(X),handle_input(X).
handle_input(stop):-!.
handle_input(X):- /* do something */,fail.
handle_input/1
is now a predicate which always fails, unless the loop should be terminated. Upon its failure, the first clause will backtrack to repeat, which is a built-in predicate which succeeds an indefinite number of times. Thus, get_input/1 will again be called.
(NB. Since it is impossible to pass arguments on to the next iteration, the changes to the rulebase have to be made through side-effects, i.e. by means of assert/1 and retract/1.)

(Pereira & Warren, 1980) contains a detailed discussion of the DCG formalism. More Prolog programs for natural language processing can be found in (Gazdar & Mellish, 1989) and (Pereira & Shieber, 1987).

G. Gazdar & C. Mellish ( 1989) , Natural Language Processing in Prolog, Addison-Wesley.

F.C.N. Pereira & D.H.D. Warren ( 1980) , ‘Definite Clause Grammars for language analysis: a survey of the formalism and a comparison with Augmented Transition Networks’, Artificial Intelligence 13: 231-278.

F.C.N. Pereira & S.M. Shieber ( 1987) , Prolog and Natural-language Analysis, Center for the Study of Language and Information, Menlo Park, CA.

8

## Reasoning with incomplete information

In everyday life, we use a surprising number of different reasoning methods, exemplified by the following arguments:

— ‘It is getting dark already, it must be after five.’

— ‘If I push this button, the light in my room will switch on.’

— ‘The light doesn’t switch on!? The lightbulb must be broken!’

The first argument is based on general knowledge about the regular hours of sunset. This knowledge is reached after numerous observations, and it is embedded in a theory about the movements of the heavenly bodies. We are pretty confident that this theory is true; that is, it accurately describes the actual state of affairs. This justifies its use to predict events in the future. However, it should be noted that we can never be absolutely sure that the theory is true: tomorrow the sun may set at eleven in the morning, falsifying our theory. The theory is reached by induction: given a number of distinct but similar observations, conclude that they are governed by a general law. Induction is an important reasoning method in the natural sciences, and it also underlies some forms of learning, like learning from examples. Despite this common usage, it is surprisingly hard to formalise inductive reasoning: in particular, what it takes to justify an inductive hypothesis remains a largely unresolved question.

The second argument above seems perfectly alright, given knowledge about how the switch is connected to the lightbulb and the power supply. However, this argument requires a lot of implicit assumptions: the switch is not broken, the connections are in order, the lightbulb is not broken, there is a supply of power, and so on. The argument is not in general true, but it describes the normal case; there might be some exceptional circumstance, invalidating the argument. Typically, we assume things to be normal, unless there is evidence to the contrary. We call this default reasoning.

In the third argument, we give an explanation for the observation that the light doesn’t switch on. It is a sort of reversed implication: we know that if the lightbulb is broken, the light won’t switch on; we observe that the light doesn’t work, so we conclude that the lightbulb must be broken. This is but one of several possible explanations, however: the switch might be broken, or the power supply might be down. This process of finding explanations for observed facts is called abduction.

The common characteristic of these three types of reasoning is that their conclusions, however plausible they may seem, are not guaranteed to be true in the intended interpretation, because the information we have is incomplete. In default reasoning, the conclusion might be false because the state of affairs is not so normal as it is assumed to be. In abduction, there might be several alternative explanations, and we do not know which one to choose. In induction, we typically base our conclusion on only a fraction of the possible observations. Thus, the general rule (e.g. all swans are white) might be invalidated by the next observation (a black swan).

In other words, such common-sense arguments are unsound. Recall that an inference rule is sound if the truth of its conclusion is guaranteed by the truth of its premises. Sound reasoning is also called deduction; it is the only allowed form of reasoning in fields where rigorous arguments are needed, like mathematics. However, deductive conclusions only make explicit what is already implicitly present in the premises (e.g. the mathematical axioms, or a logic program). In everyday reasoning we often want to reach conclusions which contain new information, information that is not present in the premises. In this chapter, we will take a closer look at various forms of reasoning with incomplete information, such as default reasoning, abduction, and diagnostic reasoning. Inductive reasoning is a subject which deserves a chapter of its own (Chapter 9).

### 8.1   Default reasoning

Consider the following argument:

‘Tweety is a bird.’

‘Normally, birds fly.’

‘Therefore, Tweety flies.’

There are several ways to translate this argument into logic. One is to read the second statement as ‘normal birds fly’, such that the following clauses represent the premises of the argument:

bird(tweety).
flies(X):-bird(X),normal(X).

Can we draw the conclusion that Tweety flies? There are three models:

{ bird(tweety) }
{ bird(tweety) , flies(tweety) }
{ bird(tweety) , flies(tweety) , normal(tweety) }

In the first two models, Tweety is a bird but not normal; hence, it might or might not fly. In the third model, Tweety is a normal flying bird. Since flies(tweety) is not true in every model, it is not a logical consequence of the program.

If we want to conclude that Tweety flies, we must explicitly state that Tweety is a normal bird, thus ruling out the first two of the above models. However, in default reasoning we do not want to say that a case is normal: rather, we assume a case to be normal, unless it is known to be abormal. Therefore, it is more natural to use a predicate abnormal/1 representing the negation of normal/1. Adding abnormal(X) to the head of the clause leads to the indefinite clause

flies(X);abnormal(X):-bird(X)

As has already been indicated in section 2.4, such indefinite clauses can be transformed into ‘pseudo-definite’ or general clause s by moving all but one of the positive literals to the body of the clause, preceded by the negation symbol not. This results in the following program:

bird(tweety).
flies(X):-bird(X),not abnormal(X).

Since general clauses extend the language of definite clauses, we must extend both proof theory and semantics to deal with the negation symbol not. A practical way to do this has been discussed in section 3.3, where we treated not/1 as a Prolog meta-predicate, implemented by means of cut. Under this interpretation, we can prove that Tweety flies
(fig. 8.1). Figure 8.1. Tweety flies by negation as failure.

What happens if we learn that Tweety is an ostrich, and that ostriches are non-flying birds? We should add a clause which says that ostriches are abnormal (when it comes to flying):

bird(tweety).
ostrich(tweety).
flies(X):-bird(X),not abnormal(X).
abnormal(X):-ostrich(X).

As the SLD-tree in fig. 8.2 shows, Prolog is now unable to prove that Tweety flies, since Tweety is provably abnormal. We say that the default rule ‘normally birds fly’ is cancelled by a more specific rule (about ostriches). Figure 8.2. Tweety doesn’t fly, since it is an ostrich.

Exercise 8.1. Give the models of this program (interpreting the general clause as the corresponding indefinite clause). Which one is the intended model (see section 2.4)?

This example shows that in default reasoning, new information can invalidate previous conclusions, if these conclusions are based on unprovable assumptions which are contradicted by the new information. This property clearly distinguishes default reasoning from deductive reasoning, which is monotonic in the following sense:

TheoryConclusion       ⇒      Theory { AnyFormula }⊢ Conclusion

That is, adding AnyFormula to a set of formulas Theory does not invalidate any Conclusion drawn from Theory alone. If we define the deductive closure of a theory as the set of conclusions that can be drawn from it:

Closure (Theory) = { Conclusion | TheoryConclusion }

then the property of monotonicity can also be stated as a relation between theories and their closures:

Theory1 Theory2        ⇒      Closure (Theory1) Closure (Theory2)

This formulation clearly demonstrates the use of the term ‘monotonic’. Since default reasoning lacks this property, it is often called non-monotonic reasoning.

Although Prolog’s not/1 meta-predicate can handle default arguments such as the above, there are a couple of problems. First of all, as has been shown in section 3.3, the implementation of not/1 by means of cut may misbehave if the goal to be negated contains variables. The second problem is that, since cut is a procedural feature without declarative semantics, we likewise have no declarative semantics for not implemented by means of cut. Thus, even if we avoid the first problem by a clever re-ordering of literals in a clause, we do not know what we are computing! This problem will be addressed in the next section.

An alternative to handling possible exceptions to rules via negation as failure, is to distinguish between two possible types of rules, those with exceptions, and those without exceptions. For instance, the rule ‘penguins are birds’ is a rule without exceptions, whereas the rule ‘birds fly’ is a rule with exceptions. Let us call a rule with exceptions a default rule, or simply a default. Rules and defaults are then treated differently when trying to prove something: a rule is applied whenever possible, while a default is applied only when it does not lead to an inconsistency. So, if we only know that Tweety is a bird, the default ‘birds fly’ can be used to conclude that Tweety flies, but if we also know that Tweety is a penguin and that penguins don’t fly, the default cannot be applied. Thus, instead of expressing our knowledge as a general program and using Prolog to derive conclusions, we will extend the syntax of clausal logic to distinguish between defaults and rules. We will develop a meta-interpreter which implements the inference rules for this extended logic.

The Tweety example can be expressed in terms of rules and defaults as follows.

default((flies(X):-bird(X))).
rule((not flies(X):-penguin(X))).
rule((bird(X):-penguin(X))).
rule((penguin(tweety):-true)).
rule((bird(opus):-true)).

In order to explain why Opus flies but Tweety doesn’t, we use two meta-interpreters. One is the familiar prove meta-interpreter for definite clauses, extended with two arguments to collect the rules used in the proof. The other meta-interpreter applies a default whenever it does not lead to a contradiction.

```:-op(900,fy,not).

% explain(F,E) <- E explains F from rules and defaults
explain(F,E):-
explain(F,[],E).

% meta-interpreter for rules and defaults
explain(true,E,E):-!.
explain((A,B),E0,E):-!,
explain(A,E0,E1),
explain(B,E1,E).
explain(A,E0,E):-
prove_e(A,E0,E).         % explain by rules only
explain(A,E0,[default((A:-B))|E]):-
default((A:-B)),
explain(B,E0,E),
not contradiction(A,E).  % A consistent with E

% meta-interpreter for rules
prove_e(true,E,E):-!.
prove_e((A,B),E0,E):-!,
prove_e(A,E0,E1),
prove_e(B,E1,E).
prove_e(A,E0,[rule((A:-B))|E]):-
rule((A:-B)),
prove_e(B,E0,E).

prove_e(A,E,_E1).
prove_e(not A,E,_E1).
```

The query ?-explain(flies(X),E) has only one answer:

X = opus
E = [ default((flies(opus):-bird(opus))),
rule((bird(opus):-true)) ]

Tweety does not fly, since not flies(tweety) is provable from the rules:

?-explain(not flies(X), E)
X = tweety
E = [ rule((not flies(tweety):-penguin(tweety))),
rule((penguin(tweety):-true)) ]

Sometimes, both a fact and its negation can be explained. Consider the following set of defaults and rules:

```default((not flies(X):-mammal(X))).
default((flies(X):-bat(X))).
rule((mammal(X):-bat(X))).
rule((bat(dracula):-true)).
```

Does Dracula fly or not? One explanation claims he does, because he is a bat, and bats typically fly:

?-explain(flies(dracula),E)
E = [ default((flies(dracula):-bat(dracula))),
rule((bat(dracula):-true)) ]

However, there are also two explanations stating that Dracula doesn’t fly; after all, he’s not only a mammal, and mammals typically don’t fly, but he’s also dead, and dead things typically don’t fly either:

?-explain(not flies(dracula), E)
E = [ default((not flies(dracula):-mammal(dracula))),
rule((mammal(dracula):-bat(dracula))),
rule((bat(dracula):-true)) ];

It seems that only the third of these explanations is acceptable. Thus, we need a way to cancel particular defaults in certain situations.

This can be done by attaching names to defaults, which are parametrised with the variables in the default. Then, we can refer to a default in the conclusion of a rule:

% default(Name,Rule)
default(mammals_dont_fly(X),(not flies(X):-mammal(X))).
default(bats_fly(X),(flies(X):-bat(X))).
rule((mammal(X):-bat(X))).
rule((bat(dracula):-true)).
% bats are flying mammals
rule((not mammals_dont_fly(X):-bat(X))).

We change the fourth clause of the explain/3 predicate accordingly:

```explain(A,E0,[default(Name)|E]):-
default(Name,(A:-B)),       % explain by default rule
explain(B,E0,E),
not contradiction(A,E).     % A consistent with E
```

There are two changes: (i) when applying a default, its name is tested for consistency with the rules, and (ii) the name of the default is added to the explanation, instead of the default itself. The above queries are now handled correctly:

?-explain(flies(dracula),E)
No.

?-explain(not flies(dracula), E)
No more solutions.

We thus see that it is the programmer’s responsibility to avoid inconsistencies by specifying appropriate cancellation rules.

### 8.2   The semantics of incomplete information

In this section, we present a way to interpret not as a logical symbol rather than a meta-predicate. In this way, it can be assigned a declarative semantics of its own, without reference to procedural features like cut. The basic idea is to transform the given program into an intended (possibly indefinite) program, which explicitly captures the intended meaning of the original general program. We will see that the intended program is complete, in the sense that for every ground fact in the Herbrand base, either that fact or its negation is a logical consequence of the intended program. Consequently, the intended program will have exactly one model, which is taken to be the intended model of the original program. We will discuss two methods to construct a complete program. The first, simple method is called the Closed World Assumption; it is simple in the sense that it only works for definite clauses without negation. The second method is called Predicate Completion; it can handle general programs with negated literals in the body of clauses.

Informally, the Closed World Assumption (CWA) states that everything that is not known to be true, must be false. Under the CWA, we need not say that something is not true: we simply say nothing about it. This is motivated by the assumption that, in general, there are many more false statements that can be made than true statements. Let us state the CWA more precisely. It suffices to know the truth or falsity of every ground atom in the Herbrand base, since this results in a single model from which the truth or falsity of any clause can be determined. Saying that such a ground atom A is false, is the same as saying that :-A is true. Thus, if P is a program and B is its Herbrand base, then we define the CWA-closure CWA (P) of P as

CWA (P) = P { :-A | A B and PA }

We refer to CWA (P) - P as the CWA-complement of P. CWA (P) is the intended program according to the Closed World Assumption.

For instance, if P is the program

likes(peter,S):-student_of(S,peter).
student_of(paul,peter).

then the ground atoms which are logical consequences of P are likes(peter,paul) and student_of(paul,peter).

Exercise 8.2. Give the models of P.

The remaining ground atoms in the Herbrand base are not known to be true, and we add their negation to obtain CWA (P):

likes(peter,S):-student_of(S,peter).
student_of(paul,peter).
:-student_of(paul,paul).
:-student_of(peter,paul).
:-student_of(peter,peter).
:-likes(paul,paul).
:-likes(paul,peter).
:-likes(peter,peter).

Note that CWA (P) has only one model:

{ student_of(paul,peter), likes(peter,paul) }

That is, CWA (P) is a complete program, assigning true or false to every ground atom in the Herbrand base. While our original program had several, alternative models, the extended program has exactly one model. This model is then declared to be the intended model of the original program.

If we add the clause C = likes(paul,X) to P, we find that CWA (P { C }) is

likes(peter,S):-student_of(S,peter).
student_of(paul,peter).
likes(paul,X).
:-student_of(paul,paul).
:-student_of(peter,paul).
:-student_of(peter,peter).
:-likes(peter,peter).

This example shows that extending the set of clauses results in a smaller CWA-complement, just as we would expect from a non-monotonic form of reasoning.

The CWA is limited to definite clauses: if it is applied to indefinite clauses, the resulting CWA-closure will be inconsistent. For instance, let P be

bird(tweety).
flies(X);abnormal(X):-bird(X).

then the Herbrand base is

{bird(tweety), abnormal(tweety), flies(tweety) }

of which only the first ground atom follows logically from P. Thus, CWA (P) is

bird(tweety).
flies(X);abnormal(X):-bird(X).
:-flies(tweety)
:-abnormal(tweety)

which is inconsistent: it does not have a model, since the first two clauses require that at least one of abnormal(tweety), flies(tweety) is true. Since the Closed World Assumption is unable to handle indefinite clauses, it is equally unable to handle general clauses with negated literals in the body. The CWA originates from the field of databases, where all information is stored in the form of ground atoms, so that indefinite (disjunctive) information does not occur.

A more sophisticated way to construct complete programs is called Predicate Completion. The basic idea of Predicate Completion is to view each clause as part of the definition of a specific predicate. For instance, a clause like

likes(peter,S):-student_of(S,peter)

is seen as part of the definition of the likes predicate. Such a clause gives values for X and Y in which likes(X,Y) is true. In other words, it belongs to the if part of the definition: ‘ X likes Y if …’. This definition can be completed by adding the only-if parts, resulting in a full definition: ‘ X likes Y if and only if …’. Such a full definition is most easily expressed in Predicate Logic. For instance, the above clause could be completed to the following full definition:

X S:likes(X,S) X=peter student_of(S,peter)

In words: ‘ X likes S if and only if X is Peter, and S is a student of Peter’, that is, Peter is the only one who likes people, and the people Peter likes are his students, and nobody else. We can translate this formula back to clausal form (see section 2.5), which yields a set of clauses

likes(peter,S):-student_of(S,peter).
X=peter:-likes(X,S).
student_of(S,peter):-likes(X,S).

The first clause was originally given; the other two are added by the Completion process.

In general, the procedure for completing a predicate definition consists of the following steps (a Prolog program which performs Predicate Completion is given in Appendix B.2):

(1)  make sure that every argument of the predicate in the head of each clause is a distinct variable, by adding literals of the form Var=Term to the body;

(2)  if there are several clauses, combine them into a single formula with a disjunctive body (this is possible since after step (1) each clause has the same head);

(3)  turn the implication in this formula into an equivalence.

Step (3) is the actual Completion step; the first two steps are preparatory.

As an example, consider the following set of clauses:

likes(peter,S):-student_of(S,peter).
likes(X,Y):-friend(Y,X).

The first step results in the clauses

likes(X,S):-X=peter,student_of(S,peter).
likes(X,Y):-friend(Y,X).

In the second step, these clauses are combined into a single formula in Predicate Logic:

X Y:likes(X,Y) ((X=peter student_of(Y,peter)) friend(Y,X))

This is a formula which is logically equivalent with the original set of clauses . The Completion step is done by turning the implication into an equivalence.

Care should be taken if one of the original clauses contains variables in the body which do not occur in the head, for example

ancestor(X,Y):-parent(X,Y).
ancestor(X,Y):-parent(X,Z),ancestor(Z,Y).

Here, the second clause is equivalent with the formula

X Y Z:ancestor(X,Y) (parent(X,Z) ancestor(Z,Y))

but also with the formula

X Y:ancestor(X,Y) ( Z:parent(X,Z) ancestor(Z,Y))

For this reason, variables which occur in the body of a clause but not in the head are often called existential variables. When performing Predicate Completion we must use the second formula, with explicit existential quantification in the body, because we want all clauses to have exactly the same head. The two original clauses are thus converted to

X Y:ancestor(X,Y) (parent(X,Y) ( Z:parent(X,Z) ancestor(Z,Y)))

A program P consisting of several predicate definitions is completed by completing each predicate definition separately; for those predicates P(X1,,Xn) which occur in the body of clauses but are themselves not defined, a clause :‑P(X1,,Xn) is added. The resulting set of clauses is denoted Comp (P). For instance, if P is

likes(peter,S):-student_of(S,peter).
student_of(paul,peter).

then Comp (P) is

likes(peter,S):-student_of(S,peter).
X=peter:-likes(X,S).
student_of(S,peter):-likes(X,S).
student_of(paul,peter).
X=paul:-student_of(X,Y).
Y=peter:-student_of(X,Y).

It is easily checked that the completed program has only one model:

{student_of(paul,peter), likes(peter,paul) }

and is thus complete. As we saw earlier, this is also the single model of CWA (P), which means that, in this case, Comp (P) and CWA (P) are logically equivalent. This is true in general, provided P is a set of definite clauses.

Predicate Completion extends the Closed World Assumption by also being able to handle programs containing general clauses, like

bird(tweety).
flies(X):-bird(X),not abnormal(X).

Predicate Completion produces the following formulas:

X:bird(X) X=tweety
X:flies(X) (bird(X) ¬abnormal(X))
X:¬abnormal(X)

In words: Tweety is the only bird, something flies if and only if it is a bird which is not abnormal, and there are no abnormal birds. The last formula is added because there is no predicate definition for abnormal. The only model of this set of formulas is

{bird(tweety), flies(tweety)}

However, there are also general clauses which Predicate Completion cannot handle. One such a clause is the following:

friendly(peter):-not friendly(peter)

This clause states that the assumption that Peter is not friendly leads to a contradiction; therefore Peter must be friendly, and friendly(peter) should be a logical consquence of the intended program associated with this clause. Predicate Completion will construct the formula

X: friendly(X) (X=peter ¬friendly(peter))

It is easy to see that this formula is inconsistent.

Admittedly, the above clause is a bit awkward, since it is logically equivalent with

friendly(peter)

However, there are many programs which exhibit the same problem. Basically, the problem is caused by ‘recursion through negation’. For instance, the completion of the following two clauses is also inconsistent:

wise(X):-not teacher(X).
teacher(peter):-wise(peter).

These clauses say ‘anybody who is not a teacher is wise’ and ‘if Peter is wise, he is a teacher’. Assuming that Peter is not a teacher leads to a contradiction; therefore, he must be a teacher (and he may or may not be wise). However, Predicate Completion leads to inconsistencies.

Exercise 8.3. Apply Predicate Completion to this program.

A stratified program is a program without recursion through negation. One can prove that for stratified programs, Predicate Completion never results in inconsistencies.

### 8.3   Abduction and diagnostic reasoning

Abduction extends default reasoning by not only making assumptions about what is false, but also about what is true. For instance, in the lightbulb example given earlier, we know that if the lightbulb is broken, the light doesn’t switch on. If we observe that the light doesn’t switch on, a possible explanation is that the lightbulb is broken. Since this is only one of the possible explanations, it cannot be guaranteed to be true. For instance, there might be a problem with the power supply instead, or the switch might be broken.

The general problem of abduction can be stated as follows. Given a Theory and an Observation, find an Explanation such that

Theory ExplanationObservation

i.e. the Observation follows logically from the Theory extended with the Explanation. For instance, if Theory consists of the following clauses

likes(peter,S):-student_of(S,peter).
likes(X,Y):-friend(Y,X).

and we have the Observation likes(peter,paul), then possible Explanations are { student_of(paul,peter) } and { friend(paul,peter) }.

Other Explanations which satisfy the problem specification are { likes(X,paul) } and { likes(X,Y):-friendly(Y), friendly(paul) }. However, abductive explanations are usually restricted to ground literals with predicates that are undefined in Theory (such literals are called abducibles). Inferring general rules from specific observations is called induction, and is discussed in the next chapter.

Procedurally, we can construct an abductive explanation by trying to prove the Observation from the initial Theory alone: whenever we encounter a literal for which there is no clause to resolve with, we add the literal to the Explanation. This leads to the following abductive meta-interpreter.

```% abduce(O,E) <- observation O follows by SLD-resolution
%                from the theory defined by clause/2,
%                extended with a list of unit clauses E
abduce(O,E) :-
abduce(O,[],E).

% with accumulator for explanations
abduce(true,E,E):-!.
abduce((A,B),E0,E):-!,
abduce(A,E0,E1),
abduce(B,E1,E).
abduce(A,E0,E):-
cl(A,B),	% query clauses enumerated by cl/2
abduce(B,E0,E).
abduce(A,E,E):-
element(A,E).
abduce(A,E,[A|E]):-
not element(A,E),
abducible(A).

abducible(A):-
not cl(A,_B).
```

The last two clauses of abduce/3 extend the original depth-first meta-interpreter. The program uses an accumulator containing the partial explanation found so far, such that literals are not unnecessarily duplicated in the final explanation. The query

?-abduce(likes(peter,paul),Explanation)

Explanation = [student_of(paul,peter)];
Explanation = [friend(paul,peter)]

Interestingly, this abductive meta-interpreter also works for general clauses, but it does not always produce correct explanations. For instance, suppose the initial Theory contains a general clause:

flies(X):-bird(X),not abnormal(X).
abnormal(X):-penguin(X).
bird(X):-penguin(X).
bird(X):-sparrow(X).

If asked to explain flies(tweety), the above program will try to find a clause explaining not(abnormal(tweety)); since there is no such clause, this negated literal will be added to the explanation. As a result, the program will give the following explanations:

Explanation = [not abnormal(tweety),penguin(tweety)];
Explanation = [not abnormal(tweety),sparrow(tweety)]

There are two problems with these explanations. First of all, the first explanation is inconsistent with the theory. Secondly, abnormal/1 is not an abducible predicate, and should not appear in an abductive explanation. For these reasons, we have to deal explicitly with negated literals in our abduction program.

As a first try, we can extend our abductive meta-interpreter with negation as failure, by adding the following clause (see also section 3.8):

abduce(not(A),E,E):-% E explains not(A)
not abduce(A,E,E).  % if E doesn't explain A

In order to prevent the query abducible(not(A)) from succeeding, we change the definition of abducible/1 to

```abducible(A):-A \= not(B),not cl(A,B).
```

With this extended abductive meta-interpreter, the query

?-abduce(flies(tweety),Explanation).

now results in the following, correct answer:

Explanation = [sparrow(tweety)]

The explanation [penguin(tweety)] is found to be inconsistent, since

?-abduce(not(abnormal(tweety)),
[penguin(tweety)],[penguin(tweety)])

will fail, as it should.

However, this approach relies on the fact that negated literals are checked after the abductive explanation has been constructed. To illustrate this, supppose that Theory is extended with the following clause:

flies1(X):-not abnormal(X),bird(X)

Since

?-abduce(not(abnormal(tweety)),[],[]).

succeeds, any explanation of bird(tweety) will also be an explanation of flies1(tweety), which is of course wrong. The problem here is that the fact that abnormal(tweety) is considered to be false is not reflected in the explanation. Thus, we need a separate predicate abduce_not/3 for building explanations for literals assumed to be false.

The full program is given below. There are two changes in abduce/3: in the fifth clause, an abducible A is only added to the explanation E if it is consistent with it; i.e. if E does not explain not(A). In the sixth clause, an explicit explanation for not(A) is constructed.

% abduce(O,E0,E) <- E is abductive explanation of O, given
%                  E0 (works also for general programs)

abduce(true,E,E):-!.

abduce((A,B),E0,E):-!,
abduce(A,E0,E1),
abduce(B,E1,E).

abduce(A,E0,E):-
clause(A,B),
abduce(B,E0,E).

abduce(A,E,E):-

abduce(A,E,[A|E]):-        % A can be added to E
not element(A,E),      % if it's not already there,
abducible(A),          % if it's abducible,
not abduce_not(A,E,E). % and E doesn't explain not(A)

abduce(not(A),E0,E):-      % find explanation for not(A)
not element(A,E0),     % should be consistent
abduce_not(A,E0,E).

The definition of abduce_not/3 closely mirrors the clauses for abduce/3:

(i)   a negated conjunction not((A,B)) is explained by either explaining not(A) or not(B);

(ii)  if there are clauses for A, then not(A) is explained by constructing an explanation for not(B), for every body B;

(iiinot(A) is explained if it is already part of the explanation;

(iv) otherwise, not(A) is explained by itself, if A is abducible and not explained;

(v)  not(not(A)) is explained by explaining A.

There is no clause for true, since not(true) cannot be explained.

% abduce_not(O,E0,E) <- E is abductive expl. of not(O)

abduce_not((A,B),E0,E):-!,
abduce_not(A,E0,E);      % disjunction
abduce_not(B,E0,E).

abduce_not(A,E0,E):-
setof(B,clause(A,B),L),
abduce_not_l(L,E0,E).

abduce_not(A,E,E):-

abduce_not(A,E,[not(A)|E]):- % not(A) can be added to E
not element(not(A),E),   % if it's not already there,
abducible(A),            % if A is abducible
not abduce(A,E,E).       % and E doesn't explain A

abduce_not(not(A),E0,E):-    % find explanation for A
not element(not(A),E0),  % should be consistent
abduce(A,E0,E).

abduce_not_l([],E,E).

abduce_not_l([B|Bs],E0,E):-
abduce_not(B,E0,E1),
abduce_not_l(Bs,E1,E).

We illustrate the program on the following set of clauses. Notice that there are several explanations for abnormal(tweety).

```cl(flies1(X),(not(abnormal(X)),bird(X))).
cl(flies(X),(bird(X),not(abnormal(X)))).
cl(abnormal(X),penguin(X)).
cl(bird(X),penguin(X)).
cl(bird(X),sparrow(X)).
```

The following queries show that the order of unnegated and negated literals in a clause only influences the order in which abducibles are added to the explanation, but not the explanation itself:

?-abduce(flies(tweety),Explanation).
Explanation =

?-abduce(flies1(tweety),Explanation).
Explanation =

Exercise 8.4. The abductive meta-interpreter will loop on the program
wise(X):-not teacher(X).
teacher(peter):-wise(peter).
with the query ?-abduce(teacher(peter),E) (see section 8.2). Change the interpreter such that this query is handled correctly, by adding all literals collected in the proof to the abductive explanation.

Abduction can be used for formulating hypotheses about faulty components in a malfunctioning system. Here, the Theory is a description of the operation of the system, an Observation is a combination of input values and the observed output values, and Explanation is a diagnosis, telling us which components are malfunctioning. As an example we consider a logical circuit for adding three binary digits. Such a circuit can be built from two XOR-gates, two AND-gates, and an OR-gate (fig. 8.3). Its behaviour can be described logically as follows:

xor(X,Y,S),
xor(Z,S,Sum),
and(X,Y,C1),
and(Z,S,C2),
or(C1,C2,Carry).

xor(0,0,0).    and(0,0,0).       or(0,0,0).

xor(0,1,1).    and(0,1,0).       or(0,1,1).

xor(1,0,1).    and(1,0,0).       or(1,0,1).

xor(1,1,0).    and(1,1,1).       or(1,1,1).

These clauses describe the normal operation of the system. However, since diagnosis deals with faulty operation of components, we have to extend the system description with a so-called fault model. Such a fault model describes the behaviour of each component when it is in a faulty state. We distinguish two faulty states: the output of a component can be stuck at 0, or it can be stuck at 1. Faulty states are expressed by literals of the form fault(Name=State), where State is either s0 (stuck at 0) or s1 (stuck at 1). The Name of a component is given by the system that contains it. Since components might be nested (e.g. the adder might itself be part of a circuit that adds two 8-bits binary numbers), the names of the components of a sub-system are prefixed by the name of that sub-system. This results in the following system description: Figure 8.3. A 3-bit adder.

xorg(N-xor1,X,Y,S),
xorg(N-xor2,Z,S,Sum),
andg(N-and1,X,Y,C1),
andg(N-and2,Z,S,C2),
org(N-or1,C1,C2,Carry).

xorg(N,X,Y,Z):-xor(X,Y,Z).

xorg(N,0,0,1):-fault(N=s1).

xorg(N,0,1,0):-fault(N=s0).

xorg(N,1,0,0):-fault(N=s0).

xorg(N,1,1,1):-fault(N=s1).

andg(N,X,Y,Z):-and(X,Y,Z).

andg(N,0,0,1):-fault(N=s1).

andg(N,0,1,1):-fault(N=s1).

andg(N,1,0,1):-fault(N=s1).

andg(N,1,1,0):-fault(N=s0).

org(N,X,Y,Z):-or(X,Y,Z).

org(N,0,0,1):-fault(N=s1).

org(N,0,1,0):-fault(N=s0).

org(N,1,0,0):-fault(N=s0).

org(N,1,1,0):-fault(N=s0).

Such a fault model, which includes all possible faulty behaviours, is called a strong fault model.

In order to diagnose the system, we declare fault/1 as the (only) abducible predicate, and we make a call to abduce/2:

```diagnosis(Observation,Diagnosis):-
abduce(Observation,Diagnosis).

abducible(fault(_X)).
```

For instance, suppose the inputs X=0, Y=0 and Z=1 result in the outputs Sum=0 and Carry=1 (a double fault). In order to diagnose this behaviour, we formulate the following query:

D = [fault(a-or1=s1),fault(a-xor2=s0)];
D = [fault(a-and2=s1),fault(a-xor2=s0)];
D = [fault(a-and1=s1),fault(a-xor2=s0)];
D = [fault(a-and2=s1),fault(a-and1=s1),fault(a-xor2=s0)];
D = [fault(a-xor1=s1)];
D = [fault(a-or1=s1),fault(a-and2=s0),fault(a-xor1=s1)];
D = [fault(a-and1=s1),fault(a-xor1=s1)];
D = [fault(a-and2=s0),fault(a-and1=s1),fault(a-xor1=s1)];
No more solutions

The first diagnosis is very obvious: it states that or1 (which calculates Carry) is stuck at 1, and xor2 (which calculates Sum) is stuck at 0. But the fault in the output of or1 might also be caused by and2 or and1, and even by both! The fifth diagnosis is an interesting one: if xor1 is stuck at 1, this accounts for both faults in the outputs of the adder. The remaining three diagnoses are considerably less interesting, since each of them makes unnecessary assumptions about additional faulty components.

The predicate diagnosis/2 generates every possible diagnosis; it does not make any assumptions about the relative plausibility of each of them. Several such assumptions can be made. For instance, we might be interested in the diagnoses with the least number of faulty components (there is only one smallest diagnosis in the example, but there may be several in general). Alternatively, we might want to consider only non-redundant or minimal diagnoses: those of which no proper subset is also a diagnosis. This is readily expressed in Prolog:

```min_diagnosis(O,D):-
diagnosis(O,D),
not((diagnosis(O,D1),proper_subset(D1,D))).
```

D = [fault(a-or1=s1),fault(a-xor2=s0)];
D = [fault(a-and2=s1),fault(a-xor2=s0)];
D = [fault(a-and1=s1),fault(a-xor2=s0)];
D = [fault(a-xor1=s1)];
No more solutions

It should be noted that the predicate min_diagnosis/2 is quite inefficient, since it needs time quadratic in the number of diagnoses (for each possible diagnosis, it generates in the worst case each possible diagnosis to see if the second is a proper subset of the first). In turn, the number of diagnoses is exponential in the number of components. More efficient ways of generating minimal diagnoses can be found in the literature; they fall outside the scope of this book.

### 8.4   The complete picture

In this chapter we studied several ways of dealing with imcomplete information. Incompleteness occurs whenever there is a ground fact in the Herbrand base of which we do not know the truth value. In order to extend our knowledge, we need to make assumptions about the truth value of such ground facts. The simplest approach is to assume that everything that is not known to be true must be false. The procedural equivalent of this is negation as failure: everything that is not provable is assumed to be false. Thus, a negated literal not L in the body of a general clause is assumed to be proved if a proof of L fails. The resulting proof procedure is called SLDNF-resolution .

If we strengthen our proof procedure, we must strengthen the semantics accordingly. Since the original program is incomplete it has several models, one of which we need to choose. One way to do this is to transform the original program into a new, complete program, which we declare to be the intended program. The only model of this complete program is taken as the intended model of the original program. The Closed World Assumption is a rather naive way to achieve this, while Predicate Completion can also handle a restricted subclass of the class of general programs (so-called stratified programs).

The relation between SLDNF-resolution and Predicate Completion is as follows. Let P be a general program, let Comp (P) denote the completion of P, and let ⊢ SLDNF  denote provability by SLDNF-resolution, treating negated literals in the body of clauses by negation as failure; then the following relation holds:

P SLDNF  q       ⇒      Comp (P) ⊨ q

This is a soundness result for SLDNF-resolution. The corresponding completeness result is not so easily proved, and holds only for specific sub-classes of programs.

Default reasoning is reasoning with typical cases and exceptions. A practical approach to default reasoning is by explicitly listing the exceptions to a rule by means of abnormality predicates. The rule describing the typical case is represented by a general clause, containing the negation of the abnormality predicate. An alternative approach is to distinguish between rules which always hold, and rules which typically hold (so-called defaults). A default is applicable whenever it does not lead to inconsistencies. In order to prevent the applicability of defaults in certain cases, they are assigned names. These names can then be used in other rules to refer to a specific default.

There is a close relation between abnormality predicates and names of defaults, demonstrated by the following translation of default rules to general clauses. The default rule

default(bats_fly(X),(flies(X):-bat(X)))

is first translated to a clause

flies(X):-bat(X),bats_fly(X)

after which the predicate bats_fly/1, indicating the normal case, is converted to a negated abnormality predicate:

flies(X):-bat(X),not nonflying_bat(X)

Furthermore, for each negated conclusion in a rule like

a new predicate is introduced:

Thus, the complete set of rules and defaults about Dracula is translated to the following general program:

notflies(X):-mammal(X),not flying_mammal(X).
flies(X):-bat(X),not nonflying_bat(X).
mammal(X):-bat(X).
bat(dracula).
flying_mammal(X):-bat(X).

Exercise 8.5. Draw the SLD-trees for the queries ?-flies(X) and ?‑notflies(X).

What this shows is the close relationship between assuming that something is false unless the opposite can be proved (negation as failure), and assuming that a default rule is applicable unless this leads to inconsistencies.

Abduction generalises negation as failure by formulating assumptions about either truth or falsity of specific literals (abducibles). For instance, the Dracula example can be handled by the abductive meta-interpreter of section 8.3 without any problem, if we declare the abnormality predicates as abducibles:

```abducible(flying_mammal(_X)).
abducible(nonflying_bat(_X)).
```

?-abduce(flies(X),E)
No.

?-abduce(notflies(X),E)
X = dracula
No more solutions.

Exercise 8.6. Remove the last two clauses from the program, and again determine the answers to the queries ?-abduce(flies(X),E) and ?‑abduce(notflies(X),E).

This shows that negation as failure is a special case of abduction. Moreover, it shows that making assumptions about the applicability of a default rule is a form of abduction. We can therefore conclude that abduction is the most general form of reasoning with incomplete information among the ones discussed in this chapter. However, inductive reasoning extends abduction by hypothesising complete predicate definitions rather than sets of ground literals. This will be the subject of the next chapter.

Negation as failure and Predicate Completion are discussed by Clark (1978). In the same volume, the Closed World Assumption was formally introduced by Reiter (1978). The approach to default reasoning by means of defaults and rules is due to Poole (1988). In (Poole, 1991), a more elaborate Prolog implementation of this approach is presented. (Sombé, 1990) gives a detailed comparison of formalisms for reasoning with incomplete information, using a single example.

An extensive overview of different approaches to abduction and their relation to other forms of reasoning with incomplete information can be found in (Kakas et al., 1992). The abductive meta-interpreter in section 8.3 is based on ideas from the same paper, as well as parts of the analysis in section 8.4. (MozetiË, 1992) presents an efficient algorithm for the computation of minimal diagnoses.

K.L. Clark ( 1978) , ‘Negation as failure’. In Logic and Databases, H. Gallaire & J. Minker (eds), pp. 293-322, Plenum Press.

A.C. Kakas, R.A. Kowalski & F. Toni ( 1992) , ‘Abductive Logic Programming’, Journal of Logic and Computation 2 (6): 719-770.

I. Mozeti ( 1992) , ‘A polynomial-time algorithm for model-based diagnosis’. In Proc. Tenth European Conference on Artificial Intelligence, ECAI’92, B. Neumann (ed.), pp. 729-733, John Wiley.

D. Poole ( 1988) , ‘A logical framework for default reasoning’, Artificial Intelligence 36: 27-47.

D. Poole ( 1991) , ‘Compiling a default reasoning system into Prolog’, New Generation Computing 9: 3-38.

R. Reiter ( 1978) , ‘On closed world databases’. In Logic and Databases, H. Gallaire & J. Minker (eds), pp. 55-76, Plenum Press.

Léa Sombé (1990), Reasoning under Incomplete Information in Artificial Intelligence, John Wiley. Also International Journal of Intelligent Systems 5 (4).

9

## Inductive reasoning

Induction is a form of reasoning which infers general rules from specific observations. For instance, given the following Theory

bird(tweety).            bird(polly).
has_feathers(tweety).    has_beak(polly).

we might want to infer a Hypothesis explaining why both Tweety and Polly fly:

flies(X):-bird(X)

There is a strong similarity between induction and abduction: if the Examples, which induction seeks to explain, are the ground facts flies(tweety) and flies(polly) then the following relation holds:

Theory HypothesisExamples

The main difference with abduction is that Hypothesis is allowed to be a set of clauses, rather than a set of ground facts as in abduction.

Given this similarity, we will try to adopt the abductive meta-interpreter developed in section 8.3 to perform induction. We assume that the set of possible hypotheses is given by means of the predicate inducible/1.

```% induce(E,H) <- H is inductive explanation of E
induce(E,H):-induce(E,[],H).

induce(true,H,H):-!.
induce((A,B),H0,H):-!,
induce(A,H0,H1),
induce(B,H1,H).
induce(A,H0,H):-
/* not A=true, not A=(_,_) */
clause(A,B),
induce(B,H0,H).
induce(A,H0,H):-
induce(B,H0,H).         % proceed with body of rule
induce(A,H0,[(A:-B)|H]):-       % A:-B can be added to H
inducible((A:-B)),      % if it's inducible, and
not element((A:-B),H0), % if it's not already there
induce(B,H0,H).         % proceed with body of rule
```

Whenever a clause is added to the inductive hypothesis, we proceed by constructing an inductive explanation of its body.

Suppose inducible/1 is defined as follows:

inducible((flies(X):-bird(X),has_feathers(X),has_beak(X))).
inducible((flies(X):-has_feathers(X),has_beak(X))).
inducible((flies(X):-bird(X),has_beak(X))).
inducible((flies(X):-bird(X),has_feathers(X))).
inducible((flies(X):-bird(X))).
inducible((flies(X):-has_feathers(X))).
inducible((flies(X):-has_beak(X))).
inducible((flies(X):-true)).

These facts state that every clause with flies/1 in its head and some of the predicates in Theory in its body is a possible inductive hypothesis. We can use induce/2 to find out which of these clauses account for the fact that Tweety and Polly fly:

?-induce(flies(tweety),H).
H = [(flies(tweety):-bird(tweety),has_feathers(tweety))];
H = [(flies(tweety):-bird(tweety))];
H = [(flies(tweety):-has_feathers(tweety))];
H = [(flies(tweety):-true)];
No more solutions

?-induce(flies(polly),H).
H = [(flies(polly):-bird(polly),has_beak(polly))];
H = [(flies(polly):-bird(polly))];
H = [(flies(polly):-has_beak(polly))];
H = [(flies(polly):-true)];
No more solutions

We can combine the answers to these queries in order to find a single clause which explains both flies(tweety) and flies(polly). One way to do this is by generalisation, as will be explained later. Another way is to process all the examples at once.

Exercise 9.1. Change induce/3 so that it handles a list of examples rather than a single example. Moreover, the inductive hypothesis should contain uninstantiated clauses, so that the same clause can be used to explain several examples.

However, a serious problem with this approach is the impracticality of listing every possible hypothesis by means of the predicate inducible/1. In general, the inductive hypothesis can consist of several clauses, and might be recursive. The hypothesis space of possible sets of clauses is typically very large, and even infinite when functors are involved. This space needs to be searched in a systematic manner. Another complication is the possibility of overgeneralisations like the clause flies(X):-true. In order to prevent overgeneralisation, negative examples need to be included in the induction process (here: non-flying objects). For these reasons, induction requires a more sophisticated search strategy than abduction. We will take a closer look at the structure of the search space in the next section. Then, we will develop two programs that can induce definitions for predicates like append/3 from examples.

### 9.1   Generalisation and specialisation

An example is a ground fact for the predicate of which a definition is to be induced. A positive example is true in the intended interpretation, while a negative example is false. Consequently, the inductive Hypothesis should be such that for every positive example p

Theory Hypothesisp

while for every negative example n

Theory Hypothesisn

We say that p is covered by Hypothesis, given Theory. For instance, if Hypothesis is the standard recursive definition of element/2:

element(X,[X|Z]).
element(X,[Y|Z]):-element(X,Z).

then the example element(b,[a,b]) is covered (with empty Theory). This can be demonstrated by a simple meta-interpreter for definite clauses. Note that this proof requires both of the above clauses. Alternatively, if element(b,[b]) is also known to be a positive example, we can say that element(b,[a,b]) is covered by the second, recursive clause alone. The first definition of coverage, which refers to the complete hypothesis, is called intensional coverage, while the second, referring to single clauses plus the rest of the examples, is called extensional coverage. In the induction programs to be developed, we will employ both notions of coverage; for the moment, however, the distinction is immaterial.

Exercise 9.2. Write a predicate covers_ex/3 which, given a clause, an example, and a list of positive examples, tests whether the clause extensionally covers the example.

If Hypothesis1 covers at least all the examples covered by Hypothesis2, we say that Hypothesis1 is at least as general as Hypothesis2, or that Hypothesis2 is at least as specific as Hypothesis1. From the definition of coverage, one can see that Hypothesis2 must be a logical consequence of Hypothesis1, given Theory:

Theory Hypothesis1Hypothesis2

Suppose p is a positive example covered by Hypothesis1 but not by Hypothesis2. This means that Hypothesis2 is too specific; if it is our current hypothesis, it needs to be generalised, for instance to Hypothesis1. Similarly, if a hypothesis covers a negative example, it needs to be specialised. Generalisation and specialisation are the basic operations of induction.

Although we defined generality between hypotheses being sets of clauses, practical approaches to induction usually generalise or specialise single clauses. For instance, the following are clauses of increasing generality:

element(X,[Y|Z]):-element(X,Z).
element(X,V):-element(X,Z).
element(X,V).

This shows that a more specific clause can be constructed by adding a literal, by applying a substitution, or both. This relation of generality between clauses is called θ-subsumption. Formally, Clause1 θ-subsumes Clause2 if there is a substitution  θ that can be applied to Clause1, such that every literal in the resulting clause occurs in Clause2.

Notice that θ only replaces variables in Clause1, not in Clause2. One way to test if such a θ exists is to ground all variables in Clause2, and then unify the ground version of Clause2 with Clause1. Grounding the variables in a term can be done by means of the built-in predicate numbervars/3, which unifies different variables with terms of the form '\$VAR(N)'.

theta_subsumes1((H:-B1),(H:-B2)):-
ground(B2),
subset(B1,B2).

ground(Term):-
numbervars(Term,0,N).

%%% subset/2: see Appendix A.2

This approach has the disadvantage that one or both clauses are changed after a call to theta_subsumes1/2. To avoid this, we apply the following little programming trick:

theta_subsumes((H1:-B1),(H2:-B2)):-
not((H1=H2,ground(B2),
not subset(B1,B2))).

theta_subsumes/2 succeeds exactly when theta_subsumes1/2 does, but by means of the double negation unifications are ‘undone’ after the call succeeds.

Next, we turn to the issue of how to construct generalisations of clauses. First we consider the simpler case of generalising two atoms. Consider the following two ground facts:

element(1,)
element(z,[z,y,x])

The following atom θ-subsumes both of them:

element(X,[X|Y])

Note that this atom is θ-subsumed by every other possible generalisation (such as element(X,[Y|Z]) or element(X,Y)). For this reason, it is called a least general generalisation under θ-subsumption or θ-LGG. θ-LGG’s of atoms can be computed by means of anti-unification. This operation is the dual of unification. It operates by comparing the terms occurring at the same position in the two atoms, and replacing them by a new variable if they are different. The terms which have already been replaced by a variable are collected in two lists, because if the same pair of terms is encountered again, it should be replaced by the same variable (see 1 and z in the example above). For obvious reasons, such lists are called inverse substitutions.

```:-op(600,xfx,'->'). % operator for inverse substitution

% anti_unify(T1,T2,T) <-  T is the anti-unification
%                         of T1 and T2
anti_unify(Term1,Term2,Term):-
anti_unify(Term1,Term2,Term,[],_S1,[],_S2).

% anti-unification with inverse subst.s and accumulators
anti_unify(Term1,Term2,Term1,S1,S1,S2,S2):-
Term1 == Term2,!.                        % same terms
anti_unify(Term1,Term2,V,S1,S1,S2,S2):-
anti_unify(Term1,Term2,Term,S10,S1,S20,S2):-
nonvar(Term1),nonvar(Term2),
functor(Term1,F,N),functor(Term2,F,N),!, % same
functor(Term,F,N),                       % functor
anti_unify_args(N,Term1,Term2,Term,S10,S1,S20,S2).
anti_unify(T1,T2,V,S10,[T1->V|S10],S20,[T2->V|S20]).

anti_unify_args(0,_Term1,_Term2,_Term,S1,S1,S2,S2).
anti_unify_args(N,Term1,Term2,Term,S10,S1,S20,S2):-
N>0,N1 is N-1,
arg(N,Term1,Arg1),
arg(N,Term2,Arg2),
arg(N,Term,Arg),
anti_unify(Arg1,Arg2,Arg,S10,S11,S20,S21),
anti_unify_args(N1,Term1,Term2,Term,S11,S1,S21,S2).

subs_lookup([T1->V|_Subs1],[T2->V|_Subs2],Term1,Term2,V):-
T1 == Term1,
T2 == Term2,!.  % no alternative solutions needed
subs_lookup([_S1|Subs1],[_S2|Subs2],Term1,Term2,V):-
subs_lookup(Subs1,Subs2,Term1,Term2,V).
```

The following query illustrates the operation of the program, including the use of inverse substitutions:

?-anti_unify(2*2=2+2,2*3=3+3,T,[],S1,[],S2)
T = 2*X=X+X
S1 = [2->X]
S2 = [3->X]

Note that the inverse substitution [2->X] does not indicate which occurrences of 2 should be replaced by X. This means that S1 applied to the first term does not yield T (the inverse of S1 applied to T yields the first term, however). Therefore, a proper definition of inverse substitution should include the positions of terms which are to be replaced by variables. We will not elaborate this any further here.

The construction of the θ-LGG of two clauses makes use of, but is more complicated than anti-unification. The basic difference with anti-unification is that the body of a clause is logically speaking unordered, whereas subterms within a term have fixed positions. Therefore, we cannot just compare the literals occurring at the same position in the respective bodies, but should consider all pairs of literals, one from each body. For instance, the θ-LGG of the following two clauses

element(c,[b,c]):-element(c,[c])

element(d,[b,c,d]):-element(d,[c,d]),element(d,[d])

is the clause

element(X,[b,c|Y]):-element(X,[c|Y]),element(X,[X])

The head of this clause is simply obtained by anti-unifying the heads of the original clauses, and the body is obtained by anti-unification of element(c,[c]) and element(d,[c,d]), giving element(X,[c|Y]), and anti-unification of element(c,[c]) and element(d,[d]), giving element(X,[X]).

The program for constructing θ-LGG’s is given below. Note that the inverse substitutions found in each step are passed on to the next, so that the literals share variables.

```:-op(900,fy,not).

% theta_lgg(C1,C2,C) <- C is the θ-LGG of clause C1 and C2
theta_lgg((H1:-B1),(H2:-B2),(H:-B)):-
theta_lgg_bodies(B1,B2,[],B,S10,_S1,S20,_S2). % bodies

% select literal from first body...
theta_lgg_bodies([],_B2,B,B,S1,S1,S2,S2).
theta_lgg_bodies([L|B1],B2,B0,B,S10,S1,S20,S2):-
theta_lgg_literal(L,B2,B0,B00,S10,S11,S20,S21),
theta_lgg_bodies(B1,B2,B00,B,S11,S1,S21,S2).

% and one from second body
theta_lgg_literal(_L1,[],B,B,S1,S1,S2,S2).
theta_lgg_literal(L1,[L2|B2],B0,B,S10,S1,S20,S2):-
same_predicate(L1,L2),anti_unify(L1,L2,L,S10,S11,S20,S21),
theta_lgg_literal(L1,B2,[L|B0],B,S11,S1,S21,S2).
theta_lgg_literal(L1,[L2|B2],B0,B,S10,S1,S20,S2):-
not same_predicate(L1,L2),
theta_lgg_literal(L1,B2,B0,B,S10,S1,S20,S2).

% same_predicate(L1,L2) <- literals L1 and L2 have
%                          the same predicate and arity
same_predicate(L1,L2):-functor(L1,P,N),functor(L2,P,N).
```

To check the above example, we pose the following query:

?-theta_lgg((element(c,[b,c]):-[element(c,[c])]),
(element(d,[b,c,d]):-
[element(d,[c,d]),element(d,[d])]),
C)
C = element(X,[b,c|Y]):-[element(X,[X]),element(X,[c|Y])]

 The relation between θ-subsumption and logical consequence If Clause1 θ-subsumes Clause2, then also Clause1 ⊨ Clause2. The reverse, however, is not always true. Consider the following two clauses: list([V|W]):-list(W) list([X,Y|Z]):-list(Z) Given list([]), the first clause covers lists of arbitrary length, while the second covers only lists of even length. All lists covered by the second clause are also covered by the first, which is therefore more general. However, there is no substitution that can be applied to the first clause to yield the second (such a substitution should map W both to [Y|Z] and to Z, which is impossible). It may seem that |= provides a better notion of generality than θ-subsumption. However, such a semantic definition of generality introduces two problems. One is that it does not suggest a simple procedure to generalise clauses, as θ-subsumption does. The second problem is that LGG’s under logical consequence are not always unique. Consider the two clauses list([A,B|C]):-list(C) list([P,Q,R|S]):-list(S) Under logical consequence, these clauses have two LGG’s: one is list([X|Y]):-list(Y), and the other is list([X,Y|Z]):-list(V). Under θ-subsumption, only the latter is an LGG. Note that the first LGG looks in fact more plausible!

Exercise 9.3. Determine the θ-LGG of the following two clauses:
reverse([2,1],,[1,2,3]):-reverse(,[2,3],[1,2,3])
reverse([a],[],[a]):-reverse([],[a],[a])

In the following section we develop a program which generalises the examples by constructing θ-LGG’s. This corresponds to a specific-to-general search of the space of possible predicate definitions; it is also called bottom-up  induction. Alternatively, one could start with the most general definition, which is specialised as long as it covers some negative example. A program for top-down induction is given in section 9.3.

### 9.2   Bottom-up induction

The induction program we will develop in this section constructs θ-LGG’s of two examples, relative to a partial model M which consists of all positive examples plus ground facts for the background predicates, of which the definitions are given beforehand. Such θ-LGG’s are called relative least general generalisations or RLGG’s. Typically, RLGG’s are quite big clauses, that contain many redundant or otherwise useless literals, but also one or two useful literals. For instance, suppose M consists of the following positive examples for the predicate append/3:

append([1,2],[3,4],[1,2,3,4])   append([a],[],[a])
append([],[],[])                append(,[3,4],[2,3,4])

The RLGG of two examples E 1  and E 2  relative to a model M is defined as the θ-LGG of the clauses E 1 :- Conj (M) and E 2 :- Conj (M), where Conj (M) denotes the conjunction of the ground facts in M. So, the RLGG of the first two examples above is the θ-LGG of the following two clauses:

append([1,2],[3,4],[1,2,3,4]):-
append([1,2],[3,4],[1,2,3,4]),append([a],[],[a]),
append([],[],[]),append(,[3,4],[2,3,4])

append([a],[],[a]):-
append([1,2],[3,4],[1,2,3,4]),append([a],[],[a]),
append([],[],[]),append(,[3,4],[2,3,4])

The body of the resulting clause consists of 16 literals, constructed by pairwise anti-unification of facts in M:

append([A|B],C,[A|D]):-
append([1,2],[3,4],[1,2,3,4]),append([A|B],C,[A|D]),
append(W,C,X),append([S|B],[3,4],[S,T,U|V]),
append([R|G],K,[R|L]),append([a],[],[a]),
append(Q,[],Q),append([P],K,[P|K]),append(N,K,O),
append(M,[],M),append([],[],[]),append(G,K,L),
append([F|G],[3,4],[F,H,I|J]),append([E],C,[E|C]),
append(B,C,D),append(,[3,4],[2,3,4])

Clearly, this clause contains many redundant literals. First of all, removing the ground facts from M does not change the logical meaning of the clause, since they are known to be true. Furthermore, note that most literals introduce new variables, that do not appear in the head of the clause . For simplicity, we will assume that this does not occur in the intended program, i.e. all variables in the body of a hypothesis clause also occur in the head. Such clauses are also called constrained. Under this assumption, the clause can be considerably reduced:

append([A|B],C,[A|D]):-
append([A|B],C,[A|D]),append(B,C,D),

Note that the first body literal turns the clause into a tautology: a clause that is true by definition. We will exclude this literal as well by assuming that hypothesis clauses are strictly constrained, i.e. the set of body variables is a proper subset of the set of head variables (see Exercise 9.4 for a discussion of the kind of program excluded by this restriction). Under this assumption, we arrive at the recursive clause for append/3:

append([A|B],C,[A|D]):-
append(B,C,D)

It is interesting to trace the literal append(B,C,D) back to its origin: it is the anti-unification of the facts append([],[],[]) and append(,[3,4],[2,3,4]). These are exactly the ground bodies of the last clause, if we unify its head with the two original examples!

The program for computing the RLGG of two examples is given below. It is a slight modification of the program for computing θ-LGG’s, given in the previous section. After the head of the clause is constructed, the variables in the head are passed on to the predicate rlgg_bodies/9, which will only construct literals of which all the variables occur in the head.

% rlgg(E1,E2,M,C) <- C is RLGG of E1 and E2 relative to M

rlgg(E1,E2,M,(H:-B)):-
anti_unify(E1,E2,H,[],S10,[],S20),
varsin(H,V), % determine variables in head of clause
rlgg_bodies(M,M,[],B,S10,S1,S20,S2,V).

% varsin(T,V) <- V is list of variables occuring in term T
%               (standard predicate in many Prologs)

rlgg_bodies([],B2,B,B,S1,S1,S2,S2,V).

rlgg_bodies([L|B1],B2,B0,B,S10,S1,S20,S2,V):-
rlgg_literal(L,B2,B0,B00,S10,S11,S20,S21,V),
rlgg_bodies(B1,B2,B00,B,S11,S1,S21,S2,V).

rlgg_literal(L1,[],B,B,S1,S1,S2,S2,V).

rlgg_literal(L1,[L2|B2],B0,B,S10,S1,S20,S2,V):-
same_predicate(L1,L2),
anti_unify(L1,L2,L,S10,S11,S20,S21),
varsin(L,Vars),
var_proper_subset(Vars,V),   % no new variables
!,rlgg_literal(L1,B2,[L|B0],B,S11,S1,S21,S2,V).

rlgg_literal(L1,[L2|B2],B0,B,S10,S1,S20,S2,V):-
rlgg_literal(L1,B2,B0,B,S10,S1,S20,S2,V).

%%% var_… uses == rather than unification (Appendix A.2)

For simplicity, the body of the RLGG thus constructed is a list of literals rather than a conjunction.

The main algorithm of the RLGG-program is relatively simple: construct the RLGG of two positive examples, and remove all positive examples that are extensionally covered by this clause. Such an algorithm, which induces each clause separately, is also called a covering algorithm. Positive and negative examples, identified by a sign, are first separated by means of the predicate pos_neg/3, and the positive examples are combined with a (possibly empty)background model for the background predicates, to yield the model to be used for construction of RLGG’s .

induce_rlgg(Exs,Clauses):-
pos_neg(Exs,Poss,Negs), % split pos. & neg. examples
bg_model(BG),           % ground background model
append(Poss,BG,Model),  % Model includes pos.exs.
induce_rlgg(Poss,Negs,Model,Clauses).

induce_rlgg(Poss,Negs,Model,Clauses):-
covering(Poss,Negs,Model,[],Clauses).

% split positive and negative examples

pos_neg([],[],[]).

pos_neg([+E|Exs],[E|Poss],Negs):-
pos_neg(Exs,Poss,Negs).

pos_neg([-E|Exs],Poss,[E|Negs]):-
pos_neg(Exs,Poss,Negs).

% covering algorithm

covering(Poss,Negs,Model,H0,H):-
construct_hypothesis(Poss,Negs,Model,Hyp),!,
remove_pos(Poss,Model,Hyp,NewPoss),
covering(NewPoss,Negs,Model,[Hyp|H0],H).

covering(P,N,M,H0,H):-
append(H0,P,H).% add uncovered examples to hypothesis

% remove covered positive examples

remove_pos([],M,H,[]).

remove_pos([P|Ps],Model,Hyp,NewP):-
covers_ex(Hyp,P,Model),!,
write('Covered example: '),write(P),nl,
remove_pos(Ps,Model,Hyp,NewP).

remove_pos([P|Ps],Model,Hyp,[P|NewP]):-
remove_pos(Ps,Model,Hyp,NewP).

The two predicates called by the covering algorithm are construct_hypothesis/4 to construct a new clause, and covers_ex/3 to check extensional coverage.

% extensional coverage, relative to a ground model

forall(element(L,Body),element(L,Model)))).

% construct a clause by means of RLGG

construct_hypothesis([E1,E2|Es],Negs,Model,Clause):-
write('RLGG of '),write(E1),
write(' and '),write(E2),write(' is'),
rlgg(E1,E2,Model,Cl),
reduce(Cl,Negs,Model,Clause),!,   % no backtracking
nl,tab(5),write(Clause),nl.

construct_hypothesis([E1,E2|Es],Negs,Model,Clause):-
write(' too general'),nl,
construct_hypothesis([E2|Es],Negs,Model,Clause).

try(Goal) succeeds if and only if Goal succeeds, but without instantiating variables in Goal (see Appendix A.2).

The remaining predicate is reduce/4. This predicate first removes all the ground facts in the background model from the body of the clause. In a second step, the clause is further generalised by removing as many literals as possible, as long as the resulting clause does not cover any negative example (this is the only point where negative examples are used). This is needed because an RLGG might still contain redundant literals. For instance, given the following model

append([1,2],[3,4],[1,2,3,4])   append([a],[],[a])
append([],[],[])                append([],[1,2,3],[1,2,3])
append(,[3,4],[2,3,4])       append([],[3,4],[3,4])

the RLGG of the first two facts is

append([A|B],C,[A|E]):-
append(B,C,D),append([],C,C)

This clause contains the redundant literal append([],C,C), which is true in the intended interpretation. Therefore, removing it will not change the meaning of the clause in the intended interpretation.

% remove redundant literals

reduce((H:-B0),Negs,M,(H:-B)):-
setof0(L,(element(L,B0),not var_element(L,M)),B1),
reduce_negs(H,B1,[],B,Negs,M).

% reduce_negs(H,B1,B0,B,N,M) <- B is a subsequence of B1
%                              such that H:-B does not
%                              cover elements of N

reduce_negs(H,[L|B0],In,B,Negs,M):-
append(In,B0,Body),
not covers_neg((H:-Body),Negs,M,N),!,% remove L
reduce_negs(H,B0,In,B,Negs,M).

reduce_negs(H,[L|B0],In,B,Negs,M):-       % keep L
reduce_negs(H,B0,[L|In],B,Negs,M).

reduce_negs(H,[],Body,Body,Negs,M):-      % fail if clause
not covers_neg((H:-Body),Negs,M,N).  % covers neg.ex.

covers_neg(Clause,Negs,Model,N):-
element(N,Negs),
covers_ex(Clause,N,Model).

%%% var_element/2: see Appendix A.2

We illustrate the program by applying it to two induction problems, one without and one with additional background predicates. The first example is the familiar append/3 predicate.

bg_model([]).

?-induce_rlgg([ +append([1,2],[3,4],[1,2,3,4]),
+append([a],[],[a]),
+append([],[],[]),
+append([],[1,2,3],[1,2,3]),
+append(,[3,4],[2,3,4]),
+append([],[3,4],[3,4]),
-append([a],[b],[b]),
-append([c],[b],[c,a]),
-append([1,2],[],[1,3])   ],Clauses).

RLGG of append([1,2],[3,4],[1,2,3,4]) and append([a],[],[a]) is
append([X|Xs],Ys,[X|Zs]):-[append(Xs,Ys,Zs)]
Covered example: append([1,2],[3,4],[1,2,3,4])
Covered example: append([a],[],[a])
Covered example: append(,[3,4],[2,3,4])

RLGG of append([],[],[]) and append([],[1,2,3],[1,2,3]) is
append([],Y,Y):-[]
Covered example: append([],[],[])
Covered example: append([],[1,2,3],[1,2,3])
Covered example: append([],[3,4],[3,4])

Clauses = [(append([],Y,Y):-[]),
(append([X|Xs],Ys,[X|Zs]):-[append(Xs,Ys,Zs)])]

Note that, because of the use of extensional coverage, we have to provide complete ‘recursive chains’ like

append([1,2],[3,4],[1,2,3,4])
append(,[3,4],[2,3,4])
append([],[3,4],[3,4])

Note also that the recursive clause is induced before the non-recursive one. This is due to the order in which the examples are presented; of course, it is only possible if we apply extensional coverage rather than intensional coverage.

The second example concerns the use of a non-empty background model. The background predicate num/2 converts the numbers 1…5 to the numerals one…five and vice versa; the predicate listnum/2, which does the same for lists of numbers and numerals, is to be induced.

bg_model([ num(1,one),
num(2,two),
num(3,three),
num(4,four),
num(5,five) ]).

?-induce_rlgg([ +listnum([],[]),
+listnum([2,three,4],[two,3,four]),
+listnum(,[four]),
+listnum([three,4],[3,four]),
+listnum([two],),
-listnum([1,4],[1,four]),
-listnum([2,three,4],[two]),
-listnum([five],[5,5])],Clauses).

RLGG of listnum([],[]) and listnum([2,three,4],[two,3,four]) is
too general

RLGG of listnum([2,three,4],[two,3,four]) and listnum(,[four]) is
listnum([X|Xs],[Y|Ys]):-[num(X,Y),listnum(Xs,Ys)]
Covered example: listnum([2,three,4],[two,3,four])
Covered example: listnum(,[four])

RLGG of listnum([],[]) and listnum([three,4],[3,four]) is
too general

RLGG of listnum([three,4],[3,four]) and listnum([two],) is
listnum([V|Vs],[W|Ws]):-[num(W,V),listnum(Vs,Ws)]
Covered example: listnum([three,4],[3,four])
Covered example: listnum([two],)

Clauses =
[ (listnum([V|Vs],[W|Ws]):-[num(W,V),listnum(Vs,Ws)]),
(listnum([X|Xs],[Y|Ys]):-[num(X,Y),listnum(Xs,Ys)]),
listnum([],[]) ]

The RLGG of the first two examples is listnum(X,Y):-[], which is too general since it covers the negative examples. Therefore, the first example is temporarily discarded. After construction of the first clause, it is tried again, without success. Finally, since all examples except the first are covered by the two clauses found, the first example is simply added to the hypothesis as a ground fact.

Exercise 9.4. The restriction that the head of a hypothesis clause contains at least one variable that does not occur in the body excludes many useful programs with accumulators, like reverse/3 (section 3.6). Choose another method to exclude tautological clauses, and demonstrate that your program can learn reverse/3.

### 9.3   Top-down induction

We introduce the second induction method by means of an example. Suppose we want to construct a definition of the predicate element/2 by means of induction. After receiving the first example +element(a,[a,b]), we formulate the simplest hypothesis possible:

element(X,Y)

This hypothesis states that everything is an element of everything. Suppose our next example is a negative one: -element(x,[a,b]). Since this negative example is covered by our current hypothesis, we conclude that it is too general and has to be specialised. Under θ-subsumption, there are two ways to specialise a clause:

(i)   apply a substitution to variables in the clause;

(ii)  add a literal to the body of the clause.

We can thus specialise our hypothesis in several ways: we can apply substitutions like { X [] }, { Y X } or { Y [V|W] }, or we can add a literal like element(Y,X) to the body of the clause. So, the set of specialisations of the above clause includes, among others, the following clauses:

element([],Y)
element(X,X)
element(X,[V|W])
element(X,Y):-element(Y,X)

Note that each of these clauses is a minimal specialisation, in the following sense: each of them is θ-subsumed by the original clause, and there exist no more-general clauses which are also θ-subsumed by the original clause.

Suppose for the moment that we choose the third clause as our next hypothesis:

element(X,[V|W])

This hypothesis expresses that anything is an element of a non-empty list. Obviously, this clause is again too general, since it still covers the negative example. Possible minimal specialisations include

element(X,[V])
element(X,[X|W])
element(X,[V|X])
element(X,[V|W]):-element(X,W)

The second of these clauses is true in the intended interpretation, and will therefore never cover any negative example. Since it also covers the only positive example seen up till now, we decide to adopt it as our next hypothesis. Notice that the recursive clause is also among the above specialisations; it will be found if we supply a positive example like +element(b,[a,b]).

Thus, we see that the operation of specialisation generates a search space in which the correct clauses defining element/2 are to be found. Part of this search space, which we will call the specialisation graph, is depicted in fig. 9.1. Notice that, in order to generate the specialisation graph, we need to specify the hypothesis language: the set of predicates, functors and constants that can occur in the hypothesis. We can further restrict the search space by assigning types to the arguments of predicates and functors. For instance, by assigning X and Y in element(X,Y) and [X|Y] the types ‘item’ and ‘list of items’, respectively, it becomes clear that X and Y should not be unified in a specialisation step, and neither should X be substituted by [] or [V|W]. Such typing would rule out three clauses in fig. 9.1. Figure 9.1. Part of the specialisation graph for element/2.

Even with such typing restrictions, the branching factor in the specialisation graph is typically quite large, increasing with the number of variables in a clause. Therefore, an agenda-based search procedure will require large amounts of memory. Instead, we will employ an iterative deepening search strategy with backtracking. Each time a clause in the hypothesis is found to be too general, we search the specialisation graph for an alternative, starting from the root and increasing the depth bound until a suitable clause is found. Identifying and removing the too-general clause is a specialisation operation; searching for an alternative and adding it to the hypothesis is a generalisation step.

The program below implements this top-down induction procedure. Its main loop is given by the predicate process_examples/4. This predicate processes the examples one by one. Whenever the hypothesis is changed by generalisation or specialisation, the new hypothesis should be checked against all previous examples, which are therefore passed in the list Done.

induce_spec(Examples,Clauses):-
process_examples([],[],Examples,Clauses).

% process the examples

process_examples(Clauses,Done,[],Clauses).

process_examples(Cls1,Done,[Ex|Exs],Clauses):-
process_example(Cls1,Done,Ex,Cls2),
process_examples(Cls2,[Ex|Done],Exs,Clauses).

% process one example

process_example(Clauses,Done,+Example,Clauses):-
covers_d(Clauses,Example).

process_example(Cls,Done,+Example,Clauses):-
not covers_d(Cls,Example),
generalise(Cls,Done,Example,Clauses).

process_example(Cls,Done,-Example,Clauses):-
covers_d(Cls,Example),
specialise(Cls,Done,Example,Clauses).

process_example(Clauses,Done,-Example,Clauses):-
not covers_d(Clauses,Example).

Intensional coverage of an example by a set of clauses is checked by a simple meta-interpreter. Since the current hypothesis might include circular clauses like element(X,Y):-element(Y,X), the meta-interpreter employs a depth bound to cut off the search for a proof after a fixed number of steps. Additionally, abackground theory might be defined by means of the meta-predicate bg/1; we will assume that this background theory is non-circular, and does not contain the predicate to be induced.

% covers_d(Clauses,Ex) <- Ex can be proved from Clauses and
%                        background theory (max. 10 steps)

covers_d(Clauses,Example):-
prove_d(10,Clauses,Example).

prove_d(D,Cls,true):-!.

prove_d(D,Cls,(A,B)):-!,
prove_d(D,Cls,A),
prove_d(D,Cls,B).

prove_d(D,Cls,A):-
D>0,D1 is D-1,
copy_element((A:-B),Cls),    % make copy of clause
prove_d(D1,Cls,B).

prove_d(D,Cls,A):-
prove_bg(A).

prove_bg(true):-!.

prove_bg((A,B)):-!,
prove_bg(A),
prove_bg(B).

prove_bg(A):-
bg((A:-B)),
prove_bg(B).

%%% copy_element/2: see Appendix A.2

If the current hypothesis covers a negative example, it follows that it contains at least one clause which is false in the intended interpretation. The predicate specialise/4 identifies such a false clause by examining the proof of the negative example. Once such a clause is found, it is simply thrown out of the hypothesis. Since this is quite a coarse specialisation step, some of the previous positive examples will now become uncovered, and the predicate process_examples/4 is called again.

specialise(Cls,Done,Example,Clauses):-
false_clause(Cls,Done,Example,C),
remove_one(C,Cls,Cls1),
write('.....refuted: '),write(C),nl,
process_examples(Cls1,[],[-Example|Done],Clauses).

% false_clause(Cs,Exs,E,C) <- C is a false clause
%                            in the proof of E

false_clause(Cls,Exs,true,ok):-!. % empty proof

false_clause(Cls,Exs,(A,B),X):-!,
false_clause(Cls,Exs,A,Xa),  % try first conjunct
( Xa = ok   -> false_clause(Cls,Exs,B,X)  % 2nd one
; otherwise -> X = Xa
).

false_clause(Cls,Exs,E,ok):-    % no false clause for
element(+E,Exs),!.         % positive examples

false_clause(Cls,Exs,A,ok):-    % no false clause for
bg((A:-B)),!.              % background literals

false_clause(Cls,Exs,A,X):-
copy_element((A:-B),Cls),
false_clause(Cls,Exs,B,Xb),% false clause in proof B?
( Xb \= ok  -> X = Xb     % yes
; otherwise -> X = (A:-B) % no; return this clause
).

As explained above, the predicate generalise/4 searches the specialisation graph for a clause covering an uncovered positive example. Since there might be several uncovered positive examples, the generalised hypothesis is again tested against all previous examples.

generalise(Cls,Done,Example,Clauses):-
search_clause(Done,Example,Cl),
write('Found clause: '),write(Cl),nl,
process_examples([Cl|Cls],[],[+Example|Done],Clauses).

The current node in the search process is represented by a term a(Clause,Vars), where Vars is the list of variables occurring in Clause, together with their types (see below).

% search_clause(Exs,E,C) <- C is a clause covering E and
%                          not covering negative examples
%                          (iterative deepening search)

search_clause(Exs,Example,Clause):-
literal(Head,Vars),   % root of specialisation graph
Exs,Example,Clause).

search_clause(D,Current,Exs,Example,Clause):-
write(D),write('..'),
search_clause_d(D,Current,Exs,Example,Clause),!.

search_clause(D,Current,Exs,Example,Clause):-
D1 is D+1,
!,search_clause(D1,Current,Exs,Example,Clause).

The search ends when a clause is found that covers the uncovered example, while not covering any of the negative examples.

search_clause_d(D,a(Clause,Vars),Exs,Example,Clause):-
covers_ex(Clause,Example,Exs),   % goal
not((element(-N,Exs),covers_ex(Clause,N,Exs))),!.

search_clause_d(D,Current,Exs,Example,Clause):-
D>0,D1 is D-1,
specialise_clause(Current,Spec), % specialise
search_clause_d(D1,Spec,Exs,Example,Clause).

Here, extensional coverage is tested against the examples and the background theory:

covers_ex(true,Exs):-!.

covers_ex((A,B),Exs):-!,
covers_ex(A,Exs),
covers_ex(B,Exs).

covers_ex(A,Exs):-
element(+A,Exs).

covers_ex(A,Exs):-
prove_bg(A).

The following predicates generate the specialisation graph. The literals that can be added to the body of a clause are given by the predicate literal/2. The first argument of literal/2 is a literal; the second argument specifies the types of variables in the literal. Thus, for the predicate element/2 the following fact should be added:

literal(element(X,Y),[item(X),list(Y)]).

Likewise, the possible terms to be used in a substitution are specified with their types by the predicate term/2:

term(list([]),[]).
term(list([X|Y]),[item(X),list(Y)]).

For instance, the clause element(X,[V|W]):-true is represented during the search process as

a((element(X,[V|W]):-true),[item(X),item(V),list(W)])

Consequently, X and V can be unified with each other but not with W, and W can be substituted by [] or [Y|Z], but X and V cannot. To restrict the search further, we will again make the assumption that hypothesis clauses are strictly constrained; i.e. the set of variables in a newly added literal is a proper subset of the set of variables in the head of the clause.

% specialise_clause(C,S) <- S is minimal specialisation
%                          of C under theta-subsumption

specialise_clause(Current,Spec):-

specialise_clause(Current,Spec):-
apply_subs(Current,Spec).

literal(L,LVars),
proper_subset(LVars,Vars). % no new variables in L

literal(L,LVars),
proper_subset(LVars,Vars). % no new variables in L

apply_subs(a(Clause,Vars),a(Spec,SVars)):-
copy_term(a(Clause,Vars),a(Spec,Vs)),  % don’t change
apply_subs1(Vs,SVars).                 % Clause

apply_subs1(Vars,SVars):-
unify_two(Vars,SVars). % unify two variables

apply_subs1(Vars,SVars):-
subs_term(Vars,SVars). % subs. term for variable

unify_two([X|Vars],Vars):-  % not both X and Y in Vars
element(Y,Vars),
X=Y.

unify_two([X|Vars],[X|SVars]):-
unify_two(Vars,SVars).

subs_term(Vars,SVars):-
remove_one(X,Vars,Vs),
term(Term,TVars),
X=Term,
append(Vs,TVars,SVars).% TVars instead of X in Vars

We illustrate the program by applying it to the induction problems of the previous section. The first problem is to induce a definition of the predicate append/3. The hypothesis language is specified by the literals and terms to be used, together with the types of their arguments:

literal(append(X,Y,Z),[list(X),list(Y),list(Z)]).
term(list([]),[]).
term(list([X|Y]),[item(X),list(Y)]).

The following query demonstrates that append/3 can be induced from two positive and four negative examples:

?-induce_spec([ +append([],[b,c],[b,c]),
-append([],[a,b],[c,d]),
-append([a,b],[c,d],[c,d]),
-append([a],[b,c],[d,b,c]),
-append([a],[b,c],[a,d,e]),
+append([a],[b,c],[a,b,c])   ],Clauses)

3..Found clause: append(X,Y,Z):-true
...refuted: append([],[a,b],[c,d]):-true

3..Found clause: append(X,Y,Y):-true
...refuted: append([a,b],[c,d],[c,d]):-true

3..Found clause: append([],Y,Y):-true

3..4..Found clause: append([X|Xs],Ys,[X|Zs]):-append(Xs,Ys,Zs)

Clauses = [ (append([X|Xs],Ys,[X|Zs]):-append(Xs,Ys,Zs)),
(append([],Y,Y):-true) ]

The numbers indicate the level of iterative deepening at which the clauses are found. The first two negative examples are needed for the construction of the non-recursive clause, and the remaining two are needed for the construction of the recursive clause.

The second induction problem concerns the predicate listnum/2. The hypothesis language is declared as follows:

literal(listnum(X,Y),[list(X),list(Y)]).
literal(num(X,Y),[item(X),item(Y)]).

term(list([]),[]).
term(list([X|Y]),[item(X),list(Y)]).

We supply the following background theory:

bg((num(1,one):-true)).
bg((num(2,two):-true)).
bg((num(3,three):-true)).
bg((num(4,four):-true)).
bg((num(5,five):-true)).

The predicate listnum/2 can be learned from six well-chosen examples:

?-induce_spec([ +listnum([],[]),
-listnum([one],[one]),
-listnum([1,two],[one,two]),
+listnum(,[one]),
-listnum([five,two],[5,two]),
+listnum([five],)],Clauses)

3..Found clause: listnum(X,Y):-true
...refuted: listnum([one],[one]):-true

3..Found clause: listnum([],[]):-true

3..4..Found clause: listnum([V|Vs],[W|Ws]):-num(V,W),listnum(Vs,Ws)

3..4..Found clause: listnum([X|Xs],[Y|Ys]):-num(Y,X),listnum(Xs,Ys)

Clauses =
[ (listnum([X|Xs],[Y|Ys]):-num(Y,X),listnum(Xs,Ys)),
(listnum([V|Vs],[W|Ws]):-num(V,W),listnum(Vs,Ws)),
(listnum([],[]):-true) ]

It should again be noted that the examples need to be well-chosen and well-ordered. This is particularly true for the recursive clause. Because of the use of extensional coverage, all positive examples occurring in a proof should be given; moreover, it is good practice to supply negative examples for a particular recursive clause before the positive ones. For this induction program, which induces by specialising overly general clauses, negative examples are particularly crucial.

Exercise 9.5. Replace the iterative deepening search strategy with beam search (see the article by Quinlan, referred to below, for a possible heuristic).

The program induce_rlgg/2 is based on the GOLEM system described in (Muggleton & Feng, 1990). The program induce_spec/2 is based on the MIS system described in (Shapiro, 1983). (Quinlan, 1990) discusses a hill-climbing heuristic for top-down induction. The notion of generalisation in logic programs is discussed in (Niblett, 1988). (Gottlob, 1987) precisely characterises the difference between θ-subsumption and logical consequence.

The subject of inductively inferring logic programs has been recently named Inductive Logic Programming. (Muggleton, 1992) is the first collection of papers on this subject. Recent books are (De Raedt, 1992) and (LavraË& Dæeroski, 1994).

G. Gottlob ( 1987) , ‘Subsumption and implication’, Information Processing Letters 24: 109–111.

N. Lavra & S. D æ eroski ( 1994) , Inductive Logic Programming: Techniques and Applications, Ellis Horwood.

S.H. Muggleton & C. Feng ( 1990) , ‘Efficient induction of logic programs’. In Proc. First Conference on Algorithmic Learning Theory, Ohmsha, Tokyo. Also in (Muggleton, 1992), pp. 261-280.

S.H. Muggleton (ed.) ( 1992) , Inductive Logic Programming, Academic Press.

T. Niblett ( 1988) , ‘A study of generalisation in logic programs’. In Proc. European Working Sessions on Learning, D. Sleeman (ed.), pp. 131-138, Pitman.

J.R. Quinlan ( 1990) , ‘Learning logical definitions from relations’, Machine Learning 5 (3): 239-266.

L. de Raedt ( 1992) , Interactive Theory Revision: an Inductive Logic Programming Approach, Academic Press.

E.Y. Shapiro ( 1983) , Algorithmic Program Debugging, MIT Press.

 Ground literals of the form t 1 = t 2  are true in an interpretation if and only if t 1  and t 2  are the same ground term. Thus, the predicate = (which represents, as usual, syntactical identity) is not explicitly represented in a model.

 In SLDNF resolution, not is treated as belonging to the language of general clauses, rather than as a meta-predicate.

 If X is a variable occurring in Body but not in Head, the formula X: Head ¬ Body is logically equivalent with Head ¬∃ X:Body. Such variables are called existential variables.

## Intelligent Reasoning by Example

This book discusses methods to implement intelligent reasoning by means of Prolog programs. The book is written from the shared viewpoints of Computational Logic, which aims at automating various kinds of reasoning, and Artificial Intelligence, which seeks to implement aspects of intelligent behaviour on a computer.