II
A physical object is structured if it consists of several components having certain spatial relationships to each other. Likewise, knowledge is structured if its components have certain logical relationships. For instance, a description of the London underground system consists of a list of stations (the components) plus a list of connections between stations (the relationships). As can be seen in fig. 1.1 in Chapter 1, such structured knowledge has a convenient graphical representation, in which components are represented by points or nodes, and relationships are represented by lines or arcs between nodes. In mathematics, such graphical structures are called graphs.
A characteristic property of structured knowledge is the distinction that is made between explicit and implicit relationships. For instance, in the underground example the direct connections which exist between two stations are the explicit relationships. All other relationships (i.e. connections between stations that are further apart) are only implicitly represented, and must be reconstructed from the explicit relationships. Therefore, reasoning forms an integral part of any form of structured knowledge.
Other examples of structured knowledge, encountered in Part I, include Prolog terms, proof trees, and SLDtrees. Among these, SLDtrees constitute a special case, since they are not given a priori as part of the knowledge describing a certain Universe of Discourse, but are instead derived from problem specifications of the form ‘given program P, find all answers to query Q ’. By means of SLDtrees, such problems are translated to problems of the form ‘given SLDtree T, find all paths from the root of the tree to the empty clause’. Problems of the latter kind are called search problems, and the graph being searched is called a search space. Most problems in intelligent reasoning are search problems of one kind or the other.

In principle, any given problem can be defined as a search problem. To this end, we must identify:
(i) the nodes in the search space;
(ii) the arcs between nodes;
(iii) the starting node;
(iv) the goal node.
For instance, when searching for an answer to a query by means of SLDresolution, the nodes in the search space are resolvents, the arcs are resolution steps by means of a program clause, the starting node is the query, and the goal node is the empty clause. As another example, we consider the puzzle known as The Towers of Hanoi. This puzzle consists of three pegs and n disks of decreasing size. Initially, all the disks are on the left peg, such that no disk is placed on a smaller one. This rule is to be obeyed throughout the game. The goal is to move all the disks to the right peg by moving one disk at a time. This problem is easily reformulated as a search problem, where nodes are allowed positions, and arcs are moves of the upper disk on one peg to another. Starting node and goal node are as in fig. II.1.
An analytic solution to the Towers of Hanoi In the case of the Towers of Hanoi, there is a simple analytic solution based on the following observation: suppose we are able to solve the problem for n –1 disks, then we can solve it for n disks also: move the upper n –1 disks from the left to the middle peg [12] , move the remaining disk on the left peg to the right peg, and move the n –1 disks from the middle peg to the right peg. Since we are able to solve the problem for 0 disks, it follows by complete induction that we can solve the problem for any number of disks. The inductive nature of this argument is nicely reflected in the following recursive program: :op(900,xfx,to).
% hanoi(N,A,B,C,Moves) <Moves is the list of moves to hanoi(0,A,B,C,[]).
hanoi(N,A,B,C,Moves): For instance, the query ?hanoi(3,left,middle,right,M) yields the answer
M = [left to right, left to middle, right to middle, The first three moves move the upper two disks from the left to the middle peg, then the largest disk is moved to the right peg, and again three moves are needed to move the two disks on the middle peg to the right peg. 
Since the number of allowed positions is 3 n , the search space for the Towers of Hanoi grows exponentially with the number of disks. In practice, this means that the problem will be unsolvable for large n, no matter how efficient the search program, or how powerful the computer. This is a common characteristic of search problems. Search is a problem solving method which, although applicable to almost any problem, has considerable practical limitations. Therefore, search is only applied to problems for which no analytic solutions are known.
For many problems in intelligent reasoning such analytic solutions simply do not exist, and search is the best we can do. In Chapters 5 and 6, we will present and analyse various methods for searching graphs. Since graphs are not only important for search problems, but for all forms of structured knowledge, Chapter 4 is devoted to a discussion of various ways to represent structured knowledge in clausal logic.
4
In this chapter we will discuss various ways to represent structured knowledge in Prolog. The central notion is that of a graph, which is the mathematical abstraction of the graphical representation of structured knowledge. A graph consists of nodes, and arcs between nodes. Nodes are identified by their name, and arcs are identified by the pair of nodes they connect. By convention, arcs are taken to be directed, which means that an arc from n 1 to n 2 is not the same as an arc from n 2 to n 1 .Undirected arcs (as in the London Underground example of Chapter 1) can be viewed as consisting of two directed arcs, one in each direction. If an arc is directed from n 1 to n 2 , then n 1 is called the parent of n 2 , and n 2 is called the child of n 1 .
A path in a graph is a sequence of nodes, such that for each consecutive pair n i , n j in the sequence the graph contains an arc from n i to n j . If there is a path from n k to n l , then n k is called an ancestor of n l , and n l is called a descendant of n k . A cycle is a path from a node to itself. Obviously, when a path from n i to n j passes through a node which is also on a cycle, there are infinitely many different paths from n i to n j . Thus, a graph consisting of a limited number of nodes and arcs can generate infinite behaviour. This is something to keep in mind when searching such cyclic graphs!
A tree is a special kind of graph which contains a root such that there is a unique path from the root to any other node. From this it follows that for any two nodes in a tree, either there is no path between them, or there is exactly one. Thus, trees are necessarily noncyclic or acyclic. A leaf is a node without children. Often, leaves are goal nodes in search spaces like SLDtrees. Strictly speaking, an SLDtree is not a tree, because there might be several ways to construct the same resolvent. By convention, however, resolvents constructed in a different way are considered to be distinct nodes in the SLDtree. Usually, trees are drawn upside down, with the root node at the top; arcs are implicitly understood to be directed from top to bottom. Note that, if n is the root of a tree, each of its children is the root of a subtree (fig. 4.1).

Recall from section 1.3 that complex Prolog terms like
route(tottenham_court_road,route(leicester_square,noroute))
can be viewed as a tree, with the functor route acting as the root of (sub)trees, and tottenham_court_road, leicester_square, and noroute as leaves (fig. 1.6). Conversely, trees can be represented by Prolog terms.
Exercise 4.1. Draw the tree represented by the term 1(2(4),3(5,6)).
A tree is traversed by first visiting its root, and then recursively traversing all of its subtrees. A list of subtrees is obtained by decomposing the complex term by means of the =.. predicate (see section 3.7):
% term_tree(T,R,S) < term T represents a tree with root R and list of subtrees S term_tree(Tree,Root,Subtrees): Tree=..[RootSubtrees]. % term_root(T,R) < R is the root of tree T term_root(Tree,Root): term_tree(Tree,Root,_S). % term_subtree(T,S) < S is a subtree of tree T term_subtree(Tree,Subtree): term_tree(Tree,_R,S), member(Subtree,S).
Data abstraction
The principle of data abstraction prescribes to keep datastructures local to specific predicates such as term_tree, term_root and term_subtree, and to access the datastructures only through these predicates. The main advantage of this design principle is modularity: if we choose to change the representation of a tree, we just have to modify these specific predicates, but the predicates which call them need not be changed. In contrast, if we unfold term_tree, term_root and term_subtree into the definition of term_arc,
term_arc(Tree,[Root,R]):
term_arc(Tree,Arc):
This program fragment is badly designed, because term_arc explicitly mentions the way trees are represented by Prolog terms. Consequently, if we change this representation, term_arc needs to be changed as well. This illustrates that the design of good datastructures is as important in Prolog 
By means of these simple predicates, we can write a program for finding arcs and paths in a tree. Paths are represented as lists of nodes, and an arc is simply a path consisting of two nodes:
% term_arc(T,A) < T is a tree, and A is an arc in T term_arc(Tree,[Root,SR]): % Arc from Root to Subtree term_root(Tree,Root), term_subtree(Tree,Subtree), term_root(Subtree,SR). term_arc(Tree,Arc): % Arc in Subtree term_subtree(Tree,Subtree), term_arc(Subtree,Arc). % term_path(T,P) < T is a tree, and P is a path in T term_path(Tree,Arc): % consisting of one arc term_arc(Tree,Arc). term_path(Tree,[Node1,Node2Nodes]): % several arcs term_arc(Tree,[Node1,Node2]), term_path(Tree,[Node2Nodes]).
Exercise 4.2. Give a term Tree, such that it contains the tree of exercise 4.1, and such that Path=[1,2,7,8] is an answer to the query
?term_path(Tree,Path).

Consider the tree in fig. 4.2. The following query lists all the paths in this tree:
?term_path(n1(n2(n4,n5(n7),n6),n3(n8,n9(n10))),Path).
Path = [1,2];
Path = [1,3];
Path = [2,4];
Path = [2,5];
Path = [2,6];
Path = [5,7];
Path = [3,8];
Path = [3,9];
Path = [9,10];
Path = [1,2,4];
Path = [1,2,5];
Path = [1,2,6];
Path = [1,2,5,7];
Path = [1,3,8];
Path = [1,3,9];
Path = [1,3,9,10];
Path = [2,5,7];
Path = [3,9,10];
No more solutions
Exercise 4.3. Explain the order in which these paths are found.
It would be convenient to have a program for printing Prolog terms which represent trees in a treelike way. The nicest way to do this is to print from the root down; however, this requires a rather elaborate program [13] . A reasonable alternative is to print the tree rotated at 90 degrees, from the root to the right. A program to do this is given below.
term_write(Tree): term_write(0,Tree),nl. % write a Tree at position Pos term_write(Pos,Tree): term_tree(Tree,Root,Subtrees), % decompose Tree term_write_node(Pos,Pos2,Root), % write Root term_writes(Pos2,Subtrees). % new position % write a list of trees at position Pos term_writes(Pos,[]). term_writes(Pos,[Tree]):!, % no newline here term_write(Pos,Tree). term_writes(Pos,[TreeSubtrees]): term_write(Pos,Tree), nl,tab(Pos), % skip to position Pos term_writes(Pos,Subtrees). % write a Node from Begin to End term_write_node(Begin,End,Node): name(Node,L),length(L,N), % N is length of Nodename End is Begin+10, N1 is EndBeginN, % N1 is length of line write_line(N1), write(Node). % write a line of given length write_line(0). write_line(N): N>0,N1 is N1, write(''), write_line(N1).
name/2 [14] is a builtin predicate, converting an atom into a list of ASCIIcodes. In combination with length/2, it is used to determine the number of characters in an atom. The query ?term_write(1(2(4,5(7),6),3(8,9(10)))) displays the tree as follows:
124
57
6
38
910
In the preceding section, a tree was represented by a Prolog term. This is convenient for relatively small trees such as proof trees, that are processed and passed around as a unit. However, for bigger trees it is a better idea not to represent them explicitly by a Prolog term, but implicitly by a set of ground facts, listing the arcs in the graph. An additional advantage of this representation is the possibility of representing graphs that are not trees.
As an example of this representation, the tree in fig. 4.2 would be represented by the following facts:
arc(1,2).
arc(1,3).
arc(2,4).
arc(2,5).
arc(2,6).
arc(5,7).
arc(3,8).
arc(3,9).
arc(9,10).
The predicate for finding a path in a graph now needs a few minor adjustments: the graph is not passed on as an argument, and arc/2 is used rather than term_arc/2:
% path(P) < P is a path in the graph given by arc/2 path([Node1,Node2]): arc(Node1,Node2). path([Node1,Node2Nodes]): arc(Node1,Node2), path([Node2Nodes]).
Exercise 4.4. Draw the SLDtree for the query ?path([1Path]).
path/2 will generate paths between any two connected nodes. When searching a graph such as an SLDtree, we are normally only interested in paths which start at a given node (for instance, the root of a tree), and end in a leaf. The following program will do the job. Note that this program differs from the previous one in that it allows for paths consisting of one node only.
% path_leaf(N,P) < P is a path starting at node N, ending % in a leaf in the graph given by arc/2 path_leaf(Leaf,[Leaf]): leaf(Leaf). path_leaf(Node1,[Node1Nodes]): arc(Node1,Node2), path_leaf(Node2,Nodes). leaf(Leaf): not(arc(Leaf,_SomeNode)).
The query ?path_leaf(1,Path) will lead to the following answers:
Path = [1,2,4];
Path = [1,2,5,7];
Path = [1,2,6];
Path = [1,3,8];
Path = [1,3,9,10];
No more solutions
Exercise 4.5. Draw the SLDtree for this query.
Notice the order in which the paths to the leafs are found — the longer path [1,2,5,7] is found before the shorter path [1,2,6]. This kind of search is called depthfirst search, because the deepest unvisited nodes are preferred. In contrast, breadthfirst search tries all nodes on a given level before going one level deeper; consequently, shortest paths are found first. Of course, the order in which nodes are visited can only be understood procedurally — logically speaking, there is nothing in the program which prescribes such an order. It is only because Prolog itself searches the SLDtree in a depthfirst fashion, that programs like the above perform depthfirst search.
In real life, graphs are often infinite. For instance, many SLDtrees are infinite, even for very simple programs such as (‘br’ abbreviates brother):
br(X,Y):br(X,Z),br(Z,Y).
br(paul,peter).
Exercise 4.6. Sketch the SLDtree for the query ?br(paul,B).
SLDtrees are graphs, with resolvents as nodes. Representing a resolvent by the list of its literals, we would need an infinite number of facts to represent SLDtrees, for instance:
arc([br(paul,B)],[br(paul,Z),br(Z,B)]).
arc([br(paul,B)],[]).
arc([br(paul,Z),br(Z,B)],[br(paul,Z1),br(Z1,Z),br(Z,B)]).
arc([br(paul,Z),br(Z,B)],[br(peter,B)]).
arc([br(paul,Z),br(Z,B)],[br(paul,paul)]).
...
arc([br(peter,B)],[br(peter,Z),br(Z,B)]).
...
arc([br(paul)],[br(paul,Z),br(Z,paul)]).
...
In such cases, it is a better idea to write a program which generates these facts. In other words, we need a logical definition of arc/2.
Now, arc(A,B) is true if A and B are lists of negative literals interpreted as resolvents, and one resolution step applied to A and a clause for br/2 yields B. We can write this down by means of the predicate resolve/3, which performs one resolution step, and the two clauses for br/2 in the appropriate representation. This gives the following program:
arc(A,B): resolve(A,(br(X,Y):[br(X,Z),br(Z,Y)]),B). arc(A,B): resolve(A,(br(paul,peter):[]),B). % resolve(G,C,NewG) < the goal G (a list of atoms) % resolves with the clause C (body % is a list) to yield the goal NewG resolve([H1T],(H2:Body),B): H1=H2, % literal in goal unifies with head of clause append(Body,T,B). resolve([HT],Clause,[HB]): resolve(T,Clause,B). % try next literal
The query ?arc([br(paul,B)],N) results in the answers
B = Y
N = [br(paul,Z),br(Z,Y)];
B = peter
N = []
as expected.
Note that a query of the form ?arc(R,[]) asks for a path from R to a success branch in the SLDtree, thus simulating a query :R. That is, the above program for arc/2 is simply a metainterpreter (with the objectlevel program hardwired in its clauses). In section 5.3, we encounter a similar metainterpreter for full clausal logic.
In the foregoing sections, we studied two kinds of graphs: trees represented by Prolog terms, and graphs generated by predicate definitions. In both cases, the main inference step is to search for a path satisfying certain conditions. In this section, we study a type of structured knowledge called an inheritance hierarchy, which differs from the previous cases in that it requires a more elaborate kind of reasoning. Basically, this is because a node in such a hierarchy is a more complex entity with various kinds of properties. Lower nodes in the hierarchy inherit properties from ancestor nodes, unless they are assigned a property of their own. Thus, reasoning about inheritance hierarchies not only requires searching for a path, but also collecting properties found along a path.
Figure 4.3. An inheritance hierarchy of musical instruments. Nodes in the tree denote classes; at the bottom, instances for each class are listed. 
Fig. 4.3 displays an inheritance hierarchy of a variety of musical instruments. The topmost node represents the class of all instruments in the Universe of Discourse, which has three subclasses: wind instruments, string instruments, and percussion instruments. In turn, wind instruments are divided into woodwinds and brass instruments, and so on. At the bottom of the figure, instances are listed for each most specific subclass. Thus, guitar, lute and harp are instances of the class ‘plucked instruments’, and thus also of the classes ‘string instruments’ and ‘instruments’.
If we want to represent such hierarchies in Prolog, we have to choose a representation for instances and classes. By far the most natural choice is to represent an instance by a constant, and a class by a unary predicate. A class–superclass relation is then expressed by a clause, and an instance–class relation is expressed by a ground fact:
% Classes instrument(X):wind(X). instrument(X):string(X). instrument(X):percussion(X). wind(X):woodwind(X). wind(X):brass(X). string(X):plucked(X). string(X):bowed(X). string(X):keyboard(X). percussion(X):tuned(X). percussion(X):untuned(X). % Instances woodwind(recorder). woodwind(flute). woodwind(oboe). woodwind(saxophone). brass(trumpet). brass(trombone). brass(horn). plucked(guitar). plucked(lute). plucked(harp). bowed(violin). bowed(cello). keyboard(harpsichord). keyboard(piano). tuned(triangle). tuned(kettledrum). untuned(cymbal). untuned(snaredrum).
With these clauses, it is possible to ask questions about instances and (super)classes. For example, we can find out what instruments there are by means of the query
?‑instrument(X).
As was remarked above, nodes (and instances) in an inheritance hierarchy can be assigned properties, where a property is an attribute–value pair. For instance, the material an instrument is made of can be an attribute, with possible values ‘wood’ and ‘metal’. The statement ‘saxophones are made of metal’ is represented by the ground fact
material(saxophone,metal)
The statement ‘instances of the class of string instruments are made of wood’ is represented by the clause
material(X,wood):string(X).
Since string(piano) is a logical consequence of the previous clauses expressing the hierarchy, we can now prove material(piano,wood). Thus, the chosen representation takes care of the inheritance of properties, as required.
In our musical Universe of Discourse, we consider three attributes: the function of an instrument (all instruments have a musical function), the material of an instrument (wood or metal), and the way the instrument produces sound, expressed by the attribute action:
function(X,musical):instrument(X). % Materials material(flute,metal). material(saxophone,metal). material(X,wood):woodwind(X). material(X,metal):brass(X). material(X,wood):string(X). material(X,metal):percussion(X). % Actions action(oboe,reed(double)). action(saxophone,reed(single)). action(harpsichord,plucked). action(piano,hammered). action(X,reed(lip)):brass(X). action(X,plucked):plucked(X). action(X,bowed):bowed(X). action(X,hammered):percussion(X).
For instance, all brass instruments have lipreeds, while some woodwinds have a double reed (oboes, for example) or a single reed (saxophones).
Note that there is a potential conflict in the above clauses: woodwinds are generally made of wood, but flutes and saxophones are made of metal. Thus, the query
?material(flute,M)
has two answers:
M = metal;
M = wood
The order in which these answers are found is, of course, determined by the order of the clauses above. Since we put the ground facts listing properties of instances before the clauses listing properties assigned to classes (and the clauses pertaining to classes before those pertaining to superclasses), the answers are found by climbing the inheritance hierarchy from bottom to top, and the first property found is the desired one. It should be noted, however, that things are not always that simple. If more sophisticated inheritance strategies are needed, alternative representations, like the ones to be discussed later in this section, are to be preferred.
A typical thing one would like to know regarding an inheritance hierarchy is: what are the properties of a given instance? In principle, this requires a secondorder query
?Attr(Inst,Value)
which is not allowed in Prolog if Attr is not instantiated. We can get around this by maintaining a list of all attributes, and constructing the appropriate goal for each attribute by means of the predicate get_value/3:
properties(Inst,Props): attributes(Attrs), properties(Attrs,Inst,Props). properties([],Inst,[]). properties([AttrAttrs],Inst,[Attr=ValueProps]): get_value(Attr,Inst,Value),!, % only first answer properties(Attrs,Inst,Props). attributes([function,material,action]). get_value(A,B,C): Goal =.. [A,B,C], call(Goal).
For instance, the query ?properties(saxophone,P) yields the answer
P = [function=musical,material=metal,action=reed(single)]
Only the most specific property regarding material is found, because of the cut in the recursive clause of properties/3.
As indicated above, the representation of inheritance hierarchies by means of clauses only allows a relatively simple inheritance strategy. Moreover, since classes are represented by predicates, reasoning about classes becomes a secondorder logical inference. For example, the question ‘what are the subclasses of the class of instruments’ is not easily handled in the above representation. Both shortcomings can be alleviated if classes and attributes are represented by terms instead of predicates. In effect, this will result in a clearer separation of declarative knowledge describing the hierarchy, and procedural knowledge describing the inheritance strategy. This can be done in several ways; two possibilities are worked out below.
The first idea is to represent the tree in fig. 4.3 according to the first method in section 4.2, i.e. by a set of ground facts listing the arcs in the tree. Thus, nodes (classes) are represented by constants, and arcs (class–superclass relations) are represented by means of the predicate isa/2:
% Classes
isa(wind,instrument). isa(string,instrument).
isa(percussion,instrument). isa(woodwind,wind).
isa(brass,wind). isa(plucked,string).
isa(bowed,string). isa(keyboard,string).
isa(tuned,percussion). isa(untuned,percussion).
Instance–class vs. class–superclass
In this representation there appears to be no difference between instance–class relations and class–superclass relations. Indeed, we could have treated instances just as classes, and use the isa/2 predicate for both. However, this obscures the semantic difference between instances and classes, which can lead to problems. For example, instances of one class can be composed of instances of other classes (a bicycle is composed of two wheels and a frame), but this is not true for classes 
Instances are listed by means of the predicate inst/2:
% Instances
inst(recorder,woodwind). inst(flute,woodwind).
inst(oboe,woodwind). inst(saxophone,woodwind).
inst(trumpet,brass). inst(trombone,brass).
inst(horn,brass). inst(guitar,plucked).
inst(lute,plucked). inst(harp,plucked).
inst(violin,bowed). inst(cello,bowed).
inst(harpsichord,keyboard). inst(piano,keyboard).
inst(triangle,tuned). inst(kettledrum,tuned).
inst(cymbal,untuned). inst(snaredrum,untuned).
The difference between inheritance hierarchies and ordinary graphs lies in the additional meaning assigned to classes and instances by means of properties. Therefore, a graph extended with properties is commonly called a semantic network. Properties are represented by means of the predicate prop/3:
% Class properties
prop(instrument,function,musical).
prop(string,material,wood).
prop(percussion,material,metal).
prop(percussion,action,hammered).
prop(woodwind,material,wood).
prop(brass,material,metal).
prop(brass,action,reed(lip)).
prop(plucked,action,plucked).
prop(bowed,action,bowed).
% Instance properties
prop(flute,material,metal).
prop(oboe,action,reed(double)).
prop(saxophone,material,metal).
prop(saxophone,action,reed(single)).
prop(harpsichord,action,plucked).
prop(piano,action,hammered).
Since we will be using a more sophisticated inheritance strategy, the order of these facts is now immaterial.
The inheritance strategy is to collect the properties of instances before properties inherited from classes:
properties_sn(Inst,Props):
props(Inst,InstProps), % properties of instance
inst(Inst,Class),
inherit_sn(Class,InstProps,Props).% inherit the rest
props(IC,Props):
findall(Attr=Value,prop(IC,Attr,Value),Props).
In turn, inherited properties are collected from bottom to top in the hierarchy, so that specific properties are found before general properties:
inherit_sn(top,Props,Props).
inherit_sn(Class,SpecificProps,AllProps):
props(Class,GeneralProps),% properties of this class
override(SpecificProps,GeneralProps,Props),
isa(Class,SuperClass), % climb hierarchy
inherit_sn(SuperClass,Props,AllProps).% inherit rest
top refers to the root of the universal inheritance hierarchy, which should be added as the root of any subhierarchy:
isa(instrument,top).
The predicate override/3 checks for every general property whether a more specific property has already been found. If so, we say that the specific property overrides the general property:
override(Props,[],Props). override(Specific,[Attr=_ValGeneral],Props): member(Attr=_V,Specific), % overriding override(Specific,General,Props). override(Specific,[Attr=ValGeneral],[Attr=ValProps]): not(member(Attr=_V,Specific)), % no overriding override(Specific,General,Props).
Again, the query ?properties_sn(saxophone,P) yields the answer
P = [function=musical,material=metal,action=reed(single)]
What we gained with this representation, however, is a declarative specification of the inheritance strategy, which is therefore also amenable to change. For instance, if the inheritance hierarchy is not a tree, a class could be a subclass of two or more other classes. In this case, different values for the same attribute could be inherited along different paths; this is called multiple inheritance. Such conflicts need to be resolved (or at least signalled) by the inheritance strategy.
Exercise 4.7. Implement a multiple inheritance strategy.
A slightly different but related representation is obtained if we group all information about one class or instance together in a socalled frame. A frame representation is obtained from the semantic network representation by adding a list of properties to each arc in the network. Below, class frames are defined by the predicate class/3, and instance frames are defined by the predicate instance/3:
% Classes
class(instrument,top,[]).
class(wind,instrument,[function=musical]).
class(string,instrument,[material=wood]).
class(percussion,instrument,[material=metal,
action=hammered]).
class(woodwind,wind,[material=wood]).
class(brass,wind,[material=metal,action=reed(lip)]).
class(plucked,string,[action=plucked]).
class(bowed,string,[action=bowed]).
class(keyboard,string,[]).
class(tuned,percussion,[]).
class(untuned,percussion,[]).
% Instances
instance(recorder,woodwind,[]).
instance(flute,woodwind,[material=metal]).
instance(oboe,woodwind,[action=reed(double)]).
instance(saxophone,woodwind,[material=metal,
action=reed(single)]).
/* etcetera... */
instance(cymbal,untuned,[]).
instance(snaredrum,untuned,[]).
Inheritance is as easily implemented as in the semantic network representation:
properties_fr(Inst,Props): instance(Inst,Class,InstProps), % instance properties inherit_fr(Class,InstProps,Props). % inherit the rest inherit_fr(top,Props,Props). inherit_fr(Class,SpecificProps,AllProps): class(Class,SuperClass,GeneralProps), % this class override(SpecificProps,GeneralProps,Props), inherit_fr(SuperClass,Props,AllProps). % inherit rest
Historically, semantic network and framebased representations were proposed in quite different contexts. We see that their representation in Prolog is very similar.
An introduction to Knowledge Representation can be found in (Ringland & Duce, 1989). (Brachman & Levesque, 1985) is a collection of papers discussing various aspects of Knowledge Representation, such as the difference between isalinks and instanceoflinks in semantic networks. Papers about inheritance hierarchies can be found in (Lenzerini et al., 1991). LOGIN is an extension of Prolog in which inheritance is represented by terms rather than clauses (AïtKaci & Nasr, 1986).
H. AïtKaci & R. Nasr ( 1986) , ‘LOGIN: a logic programming language with builtin inheritance’, Journal of Logic Programming 1986 (3): 185215.
R.J. Brachman & H.J. Levesque (eds) ( 1985) , Readings in Knowledge Representation, Morgan Kaufmann.
M. Lenzerini, D. Nardi & M. Simi (eds) ( 1991) , Inheritance Hierarchies in Knowledge Representation and Programming Languages, John Wiley.
G.A. Ringland & D.A. Duce (eds) ( 1989) , Approaches to Knowledge Representation: an Introduction, Research Studies Press.
5
As explained earlier, a search problem is defined by a search space, which is a graph with one or more starting nodes and one or more goal nodes. Given a search space, a solution is a path from a starting node to a goal node . A cost function c assigns a number to each arc from n 1 to n 2 , specifying the cost of moving from n 1 to n 2 . The cost of a path is the sum of the costs of the arcs in the path. Given a search space and a cost function, an optimal solution is a solution with minimal cost. A trivial example of a cost function is c (a)=1 for each arc a, in which case the cost of a path equals the length of the path, and an optimal solution is a shortest path. For SLD proofs, such a cost function would measure the depth of the proof tree.
In this chapter, we will discuss and implement some basic techniques for finding solutions in search spaces. Their common denominator is that they are exhaustive: that is, in the worst case they will eventually visit every node in the search space along every possible path, before finding a solution. On the other hand, they differ with regard to:
• completeness — will a solution always be found?
• optimality — will shorter paths be found before longer ones?
• efficiency — what are the runtime and memory requirements?
We start with a general discussion of the problem of search. Then, we will discuss the basic exhaustive search strategies: depthfirst search, breadthfirst search, and forward chaining.
Imagine a visit with a friend to the Staatsgalerie in Stuttgart. It is very crowded in this beautiful art museum, and while admiring theMondriaanworks you lose sight of each other. Having been through situations like this before, you had made the agreement that she would stay where she was, while you would go looking for her. What strategy would you employ?
First of all, to make sure that you don’t miss any room, you have to visit them in some systematic way. You don’t have a global map of the building, so you decide to never leave a room through the door through which you entered. Thinking about it, you recognise that this procedure won’t fully work, because a room might have just one door: the one through which you entered. Assuming that there are still rooms not yet visited, you have to leave such a room through the same door through which you entered, and find a room you’ve visited before, with a door not yet taken. Such a procedure, however, requires that, for each room you visit, you remember the door through which you entered the room (in order to go back to a room you’ve been in before), and the doors you tried already (in order to try a remaining door).
Figure 5.1. Searching for a friend. 
Luckily enough, you carry a piece of paper and a pencil with you, so you can stick little papers saying ‘entrance’ or ‘exit’ on the appropriate doors. However, the amount of paper you have is limited, so a better idea is to mark the doors not yet tried, and to remove the paper when you try a door, so that you can use the paper again. By reusing those pieces of paper that become obsolete, you minimise the amount of paper needed. Similarly, if you return to a room in which there are no remaining doors, you will never return to that room, so you might want to remove the paper saying ‘entrance’ as well. On the other hand, leaving one paper might be a good idea, just in case you return to the room later via a ‘circular’ route; you are then able to see that you already tried all the doors in that room.
So you decide to employ the following procedure:
1. mark every door in the starting room as ‘exit’;
2. examine the current room;
3. if you find your friend, stop;
4. otherwise, if there are any doors marked ‘exit’ in the room,
4a. choose one of them;
4b. remove the mark ‘exit’;
4c. go through it;
4d. if one of the doors in this room is already marked ‘entrance’, go back to the previous room, and go to step 4;
4d. otherwise, mark the door you just came through as ‘entrance’;
4e. mark all other doors as ‘exit’;
4f. go to step 2;
5. otherwise, take the door marked ‘entrance’, and go to step 4.
Figure 5.2. You find her by systematically searching the rooms, backtracking when all the rooms reachable from the room you’re in 
Steps 13 are obvious enough. In step 4, you check whether there are any untried doors left; if not, you have to go back to a previously visited room, and do the same there (step 5). This process of reconsidering previous decisions is called backtracking. It is an essential step in any exhaustive search procedure. If there are any alternatives left, you have to check whether you have been there already via some other route (step 4d). This step is called loop detection, and is only needed for cyclic search spaces. If you omit this step in such cases, you risk walking in circles forever. If you are in a yet unvisited room, you do some bookkeeping and proceed in the same way.
How does this search procedure work in practice? Suppose you are in the Miró room (fig. 5.1). You decide to try the doors in that room in a clockwise order. You first check the Léger room, then the Kupka room, and finally the Kandinsky room. When you enter the Léger room again from Kandinsky , you realise that you’ve been there before, because there’s a door marked ‘entrance’. So you backtrack to Léger (because there are no alternatives left in Kandinsky and Kupka ), and try the next door. This one leads you straight to Kandinsky again, and your little papers remind you that you have been there already. You backtrack again to Léger , and try the Matisse room. From there, Klee is a dead end, so you backtrack and finally find your friend still admiring the Mondriaan paintings! The route you walked is shown in fig. 5.2, (thin lines denote backtracking).
In a computer implementation of such a search procedure, you don’t walk from room to room. Instead of marking nodes and returning to them later, the search program stores a description of those nodes in memory. In the above example, the number of marks needed corresponds to the amount of memory required during search, and just as marks can be used several times, memory space can be reclaimed once all the children of a node have been put on the list. This list of nodes to be tried next is called the agenda; this is an important concept, which can be used to describe any backtracking search procedure. Such a generalpurpose agendabased search algorithm operates as follows (for simplicity, we have omitted loop detection):
1. take the next node from the agenda;
2. if it is a goal node, stop;
3. otherwise,
3a. generate its children;
3b. put them on the agenda;
3c. go to step 1.
This procedure can be almost directly translated into a Prolog program:
% search(Agenda,Goal) < Goal is a goal node, and a
% descendant of one of the nodes
% on the Agenda
search(Agenda,Goal):
next(Agenda,Goal,Rest),
goal(Goal).
search(Agenda,Goal):
next(Agenda,Current,Rest),
children(Current,Children),
add(Children,Rest,NewAgenda),
search(NewAgenda,Goal).
In this program, we have abstracted from the way the agenda is represented. Furthermore, as remarked above, by specifying the order in which nodes are added to and removed from the agenda, we obtain specific search strategies. In the Staatsgalerie example, doors marked most recently are tried first. In other words, the agenda is a last in–first out datastructure, or a stack. In this example, it seems the most reasonable approach, because it minimises the amount of walking needed to backtrack to another room. The result is a depthfirst search procedure, moving away as quickly as possible from the initial room, only coming closer again when backtracking.
On the other hand, the shortest path between your initial position and your friend is Miró  Mondriaan , while you finally reach your friend along the path Miró  Léger  Matisse  Mondriaan [15] . You would have found your friend sooner if you would have examined all rooms next to Miró first. But suppose your friend was two rooms away, e.g. in the Matisse room? Well, in that case you would have gone to the rooms next to Miró ( Léger and Mondriaan ), and then to all rooms next to those ( Kupka , Kandinsky and Matisse ). That is, doors marked most recently are tried last: a first in–first out strategy, implemented by a datastructure called a queue. Thus you would have found your friend along one of the two shortest paths ( Miró  Léger  Matisse ). This second method is an example of breadthfirst search.
Finally, a third approach called bestfirst search orders the doors to be tried next according to some criterion called a heuristic. For instance, suppose you saw your friend last in the Mondriaan room. In this case it would be wise to overrule the default clockwise ordering, and to try Mondriaan before Léger . Consequently, you would have found your friend along the path Miró  Mondriaan  Matisse . In the following sections, we will take a closer look at depthfirst and breadthfirst search. The use of heuristics will be studied in Chapter 6.
We obtain a depthfirst search strategy if the agenda is implemented as a last in–first out datastructure. The obvious way to do this in Prolog is to represent the agenda by a list of nodes, and to add and remove nodes from the front of the list:
% depthfirst search
search_df([GoalRest],Goal):
goal(Goal).
search_df([CurrentRest],Goal):
children(Current,Children),
append(Children,Rest,NewAgenda),
search_df(NewAgenda,Goal).
The children/2 predicate finds all children of a given node. If arcs in the search space are defined as before by the arc/2 predicate, we could define children/2 as
children(Node,Children):
findall(C,arc(Node,C),Children).
In this way, all children of the current node are generated and stored on the agenda before examining the next node.
% an example of depthfirst search search_df([GoalRest],Goal): goal(Goal). search_df([CurrentRest],Goal): children(Current,Children), append(Children,Rest,NewAgenda), search_df(NewAgenda,Goal). children(Node,Children): findall(C,arc(Node,C),Children). % nodes are lists of letters arc(T,[HT]): length(T,N),N<11, member(H,[a,d,i,m]). % find palindromes goal(L): reverse(L,L).
This depthfirst search program can be refined in several ways, of which we will consider two: returning a path to the goal, and loop detection. In the above implementation, it is impossible to return a path if we discover a goal node on the agenda, because we do not know how that goal node was reached. Instead of putting a single node on the agenda, we will store a complete path to that node. This is simply accomplished by changing the children/2 predicate as follows:
children([NodePath],Children):
findall([C,NodePath],arc(Node,C),Children).
Of course, the goal/1 predicate must be changed accordingly, because its argument is now a path instead of a single node. A query now takes the form
?search_df([[InitialNode]],PathToGoal).
The second refinement concerns loop detection. In order to check whether a node has been investigated before, we must maintain a list of visited nodes. We only add nodes to the agenda which do not already occur on this list (or on the agenda):
% depthfirst search with loop detection
search_df_loop([GoalRest],Visited,Goal):
goal(Goal).
search_df_loop([CurrentRest],Visited,Goal):
children(Current,Children),
add_df(Children,Rest,Visited,NewAgenda),
search_df_loop(NewAgenda,[CurrentVisited],Goal).
add_df([],Agenda,Visited,Agenda).
add_df([ChildRest],OldAgenda,Visited,[ChildNewAgenda]):
not element(Child,OldAgenda),
not element(Child,Visited),
add_df(Rest,OldAgenda,Visited,NewAgenda).
add_df([ChildRest],OldAgenda,Visited,NewAgenda):
element(Child,OldAgenda),
add_df(Rest,OldAgenda,Visited,NewAgenda).
add_df([ChildRest],OldAgenda,Visited,NewAgenda):
element(Child,Visited),
add_df(Rest,OldAgenda,Visited,NewAgenda).
Note that the combination of loop detection and path construction allows the following optimisation: instead of maintaining complete paths to a node on the agenda and the list of visited nodes, we only store a node together with its parent. Once we encounter a goal, all its parents are on the list of visited nodes, which allows us to reconstruct the path.
Exercise 5.1. Modify the predicate search_df_loop/3 such that it reconstructs the path to a goal in this way.
We now analyse depthfirst search with respect to completeness, optimality and efficiency. A search strategy is complete if it is guaranteed to find every goal. Obviously, any exhaustive strategy is complete for finite search spaces. However, in an infinite search space depthfirst search might get trapped in an infinite branch before having found all the solutions. For instance, reconsider the infinite SLDtree in fig. 3.2. A lefttoright depthfirst search strategy would dive deeper and deeper into the tree, taking the left branch at every node, and never find the goals in the branches to the right. So, depthfirst search is, in general, incomplete. Since Prolog itself employs depthfirst search, Prolog is also incomplete. Often, however, the incompleteness of Prolog can be avoided by reordering the clauses such that goals are found before infinite branches (for instance, by putting the recursive clause last), and to cut away the infinite parts of the search space.
If there is no cost function, a search strategy is optimal if it is guaranteed to reach any goal along the shortest path possible. The Staatsgalerie example already showed that this is not true for depthfirst search: you found your friend but, while she was in a room next to your initial position, you finally reached that room through two other rooms. Thus, depthfirst search does not always find a shortest solution path. Finally, we can estimate the memory requirements for depthfirst search as follows. Suppose we are searching a tree in which each node has, on the average, B children. The number B is known as the branching factor. Generating the children of a node adds B nodes to the agenda. We are interested in the following question: if a goal is found at depth n (i.e. the path from the root to the goal has length n), how many nodes are there on the agenda? Since at each level only the children of a single node are generated, the size of the agenda is of the order B × n, that is, a linear function of the depth of the tree. The time complexity of depthfirst search is of the order B n , since the runtime is proportional to the number of nodes searched, and in the worst case the goal is found in the last branch, after searching B n nodes. Of course, we cannot hope to achieve any better for blind exhaustive search!
In practice, depthfirst search is only implemented as above if loop detection is an absolute must. Otherwise, the agenda is represented implicitly by means of Prolog’s internal goal stack. Children of a given node are generated one at a time, by means of Prolog’s backtracking mechanism, and examined immediately upon generation:
% depthfirst search by means of backtracking
search_bt(Goal,Goal):
goal(Goal).
search_bt(Current,Goal):
arc(Current,Child),
search_bt(Child,Goal).
If there is a chance that the search program gets trapped in an infinite loop, it might be a good idea to employ a predefined depth bound:
% backtracking depthfirst search with depth bound
search_d(D,Goal,Goal):
goal(Goal).
search_d(D,Current,Goal):
D>0, D1 is D1,
arc(Current,Child),
search_d(D1,Child,Goal).
In this way the search process is guaranteed to halt, but solutions which appear beyond the depth bound are missed.
Iterative deepening is a form of depthfirst search which employs a depth bound that is increased on each iteration. That is, after performing a depthfirst search with depth bound d, search starts all over again from the starting nodes with an increased depth bound d + n. The predicate search_id/2 implements iterative deepening for n =1.
% iterative deepening
search_id(First,Goal):
search_id(1,First,Goal). % start with depth 1
search_id(D,Current,Goal):
search_d(D,Current,Goal).
search_id(D,Current,Goal):
D1 is D+1, % increase depth
search_id(D1,Current,Goal).
% an example of iterative deepening search_id(First,Goal): search_id(1,First,Goal). % start with depth 1 search_id(D,Current,Goal): search_d(D,Current,Goal). search_id(D,Current,Goal): D1 is D+1, % increase depth search_id(D1,Current,Goal). % backtracking depthfirst search with depth bound search_d(D,Goal,Goal): goal(Goal). search_d(D,Current,Goal): D>0, D1 is D1, arc(Current,Child), search_d(D1,Child,Goal). children(Node,Children): findall(C,arc(Node,C),Children). % nodes are lists of letters arc(T,[HT]): %length(T,N),N<11, % depth bound no longer needed member(H,[a,d,i,m]). % find palindromes goal(L): reverse(L,L).
A big advantage of iterative deepening over simple depthfirst search is that iterative deepening is complete: it will find all the goals at depth d and less before proceeding to depth d + n. Moreover, if we set the depth increment n to 1, iterative deepening is also optimal: it will find shorter paths first. A disadvantage of iterative deepening is that upper parts of the search space are searched more than once (and goals in those upper parts are found more than once as well).
Breadthfirst search is realised by implementing the agenda as a first in–first out datastructure. That is, while removing nodes from the front of the list, they are added at the end:
% breadthfirst search
search_bf([GoalRest],Goal):
goal(Goal).
search_bf([CurrentRest],Goal):
children(Current,Children),
append(Rest,Children,NewAgenda),
search_bf(NewAgenda,Goal).
% an example of breadthfirst search search_bf([Goal_Rest],Goal): goal(Goal). search_bf([CurrentRest],Goal): children(Current,Children), append(Rest,Children,NewAgenda), search_bf(NewAgenda,Goal). children(Node,Children): findall(C,arc(Node,C),Children). % nodes are lists of letters arc(T,[HT]): length(T,N),N<11, % try removing the depth bound member(H,[a,d,i,m]). % find palindromes goal(L): reverse(L,L).
Exercise 5.2. Implement the predicate term_write_bf/1, which writes the tree represented by a term from the root downward (as opposed to the predicate term_write/1 of section 4.1, which writes from left to right). Employ breadthfirst search with two agendas, one for nodes at depth n and the other for nodes at depth n +1.
In breadthfirst search, the agenda is implemented as a queue. This means that the nodes on the agenda are ordered according to increasing depth: all the nodes on depth n occur before the nodes on depth n +1. This has profound consequences with regard to the properties of breadthfirst search. First of all, breadthfirst search is complete, even for infinite search spaces. This is so because every goal on depth n will be found before descending to depth n +1. Secondly, breadthfirst search always finds a shortest solution path. It may seem that breadthfirst search is much better than depthfirst search. However, like every coin this one has a reverse side also: the number of nodes at depth n is B n , such that breadthfirst search requires much more memory than depthfirst search.
We will now show how to change Prolog into a complete SLD prover, by employing breadthfirst search. We start from the metainterpreter prove_r/1 given in section 3.8:
prove_r(true):!.
prove_r((A,B)):!,
clause(A,C),
conj_append(C,B,D),
prove_r(D).
prove_r(A):
clause(A,B),
prove_r(B).
As explained in that section, this metainterpreter operates on the complete resolvent, which is exactly what we need. This predicate is turned into an agendabased depthfirst search procedure as follows:
% agendabased version of prove_r/1
prove_df(Goal):
prove_df_a([Goal]).
prove_df_a([trueAgenda]).
prove_df_a([(A,B)Agenda]):!,
findall(D,(clause(A,C),conj_append(C,B,D)),Children),
append(Children,Agenda,NewAgenda),
prove_df_a(NewAgenda).
prove_df_a([AAgenda]):
findall(B,clause(A,B),Children),
append(Children,Agenda,NewAgenda),
prove_df_a(NewAgenda).
The changes are relatively straightforward: all solutions to the calls in the bodies of the second and third prove_r clauses are collected by means of the predicate findall/3, and added to the front of the agenda.
In order to search in a breadthfirst fashion, we swap the first two arguments of the append/3 literals. One additional improvement is required, since prove_df/1 succeeds for every proof that can be found, but it does not return an answer substitution for the variables in the query. This is because the call findall(X,G,L) creates new variables for the unbound variables in the instantiation of X before putting it in the list L. In order to obtain an answer substitution, we should maintain the agenda as a list of pairs
a(Literals,OrigGoal)
where OrigGoal is a copy of the original goal. To illustrate this, suppose the following clauses are given:
likes(peter,Y):student(Y),friendly(Y).
likes(X,Y):friend(Y,X).
student(maria).
student(paul).
friendly(maria).
friend(paul,peter).
Below, the agenda obtained after each breadthfirst search iteration is given for the query ?‑likes(X,Y):
[ a((student(Y1),friendly(Y1)), likes(peter,Y1)),
a(friend(Y2,X2), likes(X2,Y2)) ]
[ a(friend(Y2,X2), likes(X2,Y2))
a(friendly(maria), likes(peter,maria)),
a(friendly(paul), likes(peter,paul)) ]
[ a(friendly(maria), likes(peter,maria)),
a(friendly(paul), likes(peter,paul)),
a(true, likes(peter,paul)) ]
[ a(friendly(paul), likes(peter,paul)),
a(true, likes(peter,paul)),
a(true, likes(peter,maria)) ]
[ a(true, likes(peter,paul)),
a(true, likes(peter,maria)) ]
Here, Y1, X2 and Y2 denote new variables introduced by findall/3. It can be clearly seen that for each item a(R,G) on the agenda, R and G share the right variables — thus, whenever the resolvent gets more instantiated during the proof, the corresponding copy of the goal is instantiated correspondingly. In particular, if the empty clause is found on the agenda in the form of a term a(true,Goal), then Goal will contain the correct answer substitutions.
The final, complete SLD prover looks as follows:
% breadthfirst version of prove_r/1 + answer substitution
prove_bf(Goal):
prove_bf_a([a(Goal,Goal)],Goal).
prove_bf_a([a(true,Goal)Agenda],Goal).
prove_bf_a([a((A,B),G)Agenda],Goal):!,
findall(a(D,G),
(clause(A,C),conj_append(C,B,D)),
Children),
append(Agenda,Children,NewAgenda),% breadthfirst
prove_bf_a(NewAgenda,Goal).
prove_bf_a([a(A,G)Agenda],Goal):
findall(a(B,G),clause(A,B),Children),
append(Agenda,Children,NewAgenda),% breadthfirst
prove_bf_a(NewAgenda,Goal).
Notice that this program is able to find alternative solutions, since it will backtrack from the first clause into the third and, being unable to find a clause for the predicate true/0, findall/3 will generate an empty list of children and search will proceed with the rest of the agenda.
Exercise 5.3. Consider the following program:
brother(peter,paul).
brother(adrian,paul).
brother(X,Y):brother(Y,X).
brother(X,Y):brother(X,Z),brother(Z,Y).
Compare and explain the behaviour of prove_bf/1 and Prolog on the query ?‑brother(peter,adrian). Can you reorder the clauses, such that Prolog succeeds?
As a second, related example of a breadthfirst search program, we give a program for finding refutation proofs in full clausal logic. Objectlevel clauses are given by the predicate cl/1. Note that true denotes the empty body, while false denotes the empty head; thus, false:true denotes the empty clause.
% refute_bf(Clause) < Clause is refuted by clauses
% defined by cl/1
% (breadthfirst search strategy)
refute_bf(Clause):
refute_bf_a([a(Clause,Clause)],Clause).
refute_bf_a([a((false:true),Clause)Rest],Clause).
refute_bf_a([a(A,C)Rest],Clause):
findall(a(R,C),(cl(Cl),resolve(A,Cl,R)),Children),
append(Rest,Children,NewAgenda), % breadthfirst
refute_bf_a(NewAgenda,Clause).
% resolve(C1,C2,R) < R is the resolvent of C1 and C2.
resolve((H1:B1),(H2:B2),(ResHead:ResBody)):
resolve(H1,B2,R1,R2),
disj_append(R1,H2,ResHead),
conj_append(B1,R2,ResBody).
resolve((H1:B1),(H2:B2),(ResHead:ResBody)):
resolve(H2,B1,R2,R1),
disj_append(H1,R2,ResHead),
conj_append(R1,B2,ResBody).
resolve((A;B),C,B,E):
conj_remove_one(A,C,E).
resolve((A;B),C,(A;D),E):
resolve(B,C,D,E).
resolve(A,C,false,E):
conj_remove_one(A,C,E).
%%% disj_append/3, conj_remove_one/3: see Appendix A.2
For instance, given the following clauses:
cl((bachelor(X);married(X):man(X),adult(X))).
cl((has_wife(X):man(X),married(X))).
cl((false:has_wife(paul))).
cl((man(paul):true)).
cl((adult(paul):true)).
and the query ?refute_bf((false:bachelor(X))) (refute that noone is a bachelor), the program answers X=paul. Note that there are many proofs for this answer!
Exercise 5.4. Extend the metainterpreter, such that it returns a proof tree (see section 3.8). In order to ensure correct variable substitutions, each item on the agenda must be extended with a partial proof tree.
As a search program, the above program is complete. As a theorem prover, however, the program is incomplete. This is due to the resolution strategy used, in which every resolvent has at least one given clause as its parent. This strategy is called input resolution; it is refutation complete for definite clauses, but not for indefinite clauses.
Search programs involving ifthen rules, such as metainterpreters and theorem provers, can use these rules in either of two directions: from body to head or forward, and from head to body or backward. The metainterpreters we encountered up till now apply clauses backward, just like Prolog; they are said to perform backward chaining. For checking if a given formula follows logically from a given theory, this is usually the best strategy.
However, in some cases we must rather perform forward chaining, because we do not have a goal to start from. For instance, consider the problem of constructing a model of a given theory. It would not be feasible to generate all the ground atoms in the Herbrand base and follow the chains back to the theory. Rather, we would generate the model incrementally by forward chaining. The procedure is as follows:
(i) search for a violated clause of which the body is true in the current model, but the head is not (such a clause is said to fire);
(ii) add a literal from the head to the model [16] .
By step (ii), the head (a disjunction) is made true in the model, so that this clause is no longer violated. The procedure iterates back to step (i); if no violated clauses remain, the model is complete.
The program for model generation by forward chaining is given below. It is a fairly simple forward chainer, in the sense that it simply chooses the first clause which fires. More sophisticated forward chainers use conflict resolution strategies in order to choose among the rules which fire at a certain stage.
% model(M) < M is a model of the clauses defined by cl/1 model(M): model([],M). model(M0,M): is_violated(Head,M0),!,% instance of violated clause disj_element(L,Head), % L: ground literal from head model([LM0],M). % add L to the model model(M,M). % no more violated clauses is_violated(H,M): cl((H:B)), satisfied_body(B,M), % grounds the variables \+ satisfied_head(H,M). satisfied_body(true,_M). % body is a conjunction satisfied_body(A,M): member(A,M). satisfied_body((A,B),M): member(A,M), satisfied_body(B,M). satisfied_head(A,M): % head is a disjunction member(A,M). satisfied_head((A;_B),M): member(A,M). satisfied_head((_A;B),M): satisfied_head(B,M). %%% from Appendix A.2 disj_element(X,X): % singleelement disjunction X\=false, X\=(_;_). disj_element(X,(X;_Ys)). disj_element(X,(_Y;Ys)): disj_element(X,Ys). %%% Example disjunctive clauses to compute model for cl((married(X);bachelor(X):man(X),adult(X))). cl((has_wife(X):married(X),man(X))). cl((man(paul):true)). cl((adult(paul):true)).
Given the following clauses:
cl((married(X);bachelor(X):man(X),adult(X))).
cl((has_wife(X):married(X),man(X))).
cl((man(paul):true)).
cl((adult(paul):true)).
and the query ?model(M), the program constructs the following models (on backtracking):
M = [has_wife(paul),married(paul),adult(paul),man(paul)];
M = [bachelor(paul),adult(paul),man(paul)]
Notice that these are the two minimal models of the program.
Exercise 5.5. Give the remaining models of the program.
Not every model generated by model/1 is minimal. Consider the following set of clauses:
cl((likes(peter,maria):true)).
cl((student(maria):true)).
cl((teacher(X);friendly(Y):likes(X,Y),student(Y))).
cl((friendly(Y):teacher(X),likes(X,Y))).
is_violated/2 will first succeed for the third clause, returning the instantiated head teacher(peter);friendly(maria). The first literal in this head will be added to the model. Next, the fourth clause is violated, and friendly(maria) is added to the model. This results in the following model:
[friendly(maria),teacher(peter),
student(maria),likes(peter,maria)]
However, this is not a minimal model since teacher(peter) can be removed from it, yielding the model
[friendly(maria),student(maria),likes(peter,maria)]
which will be returned as the second answer.
Exercise 5.6. Are all minimal models always constructed by model/1?
It should be noted that the program only works properly for a restricted class of clauses, namely those clauses for which grounding the body also grounds the head. Otherwise, a head literal from a violated clause might still contain variables. Adding a nonground literal to the model could result in incorrect behaviour. Consider the following set of clauses:
cl((man(X);woman(X):true)).
cl((false:man(maria))).
cl((false:woman(peter))).
Since the first clause is violated by the empty model, the program will attempt to add man(X) to the model. This leads to the second clause being violated, and since this clause has an empty head, it cannot be satisfied by adding a literal to the model. Upon backtracking woman(X) is tried instead, but this leads to a similar problem with the third clause. Consequently, model/1 will fail to construct a model, although there exists one, namely { man(peter), woman(maria) }.
The solution is to add a literal to the body of the first clause, which serves to enumerate the possible values for X:
cl((man(X);woman(X):person(X))).
cl((person(maria):true)).
cl((person(peter):true)).
cl((false:man(maria))).
cl((false:woman(peter))).
In this way, the first clause is violated only under the substitutions { X → peter } and { X → maria }. Thus, all literals which are added to the model are ground, and the program constructs the correct model
[man(peter),person(peter),woman(maria),person(maria)]
Clauses of which all variables in the head occur also in the body are called rangerestricted. Every set of clauses can be transformed into a set of rangerestricted clauses by adding domain predicates enumerating the domains of variables, as above. The two sets of clauses are equivalent in the sense that there exists a onetoone correspondence between their models:
• any model of the original clauses provides an enumeration of all the domains;
• any model of the rangerestricted clauses can be transformed to a model of the original clauses by dropping the domain literals.
Obviously, model/1 loops if the model being constructed is infinite. This will happen, for instance, with the following set of clauses, representing a rangerestricted version of the append predicate:
cl((append([],Y,Y):list(Y))).
cl((append([XXs],Ys,[XZs]):thing(X),append(Xs,Ys,Zs))).
cl((list([]):true)).
cl((list([XY]):thing(X),list(Y))).
cl((thing(a):true)).
cl((thing(b):true)).
cl((thing(c):true)).
Instead of the complete, infinite model, we might be interested in a subset over a universe of lists up to a given length. Such a ‘submodel’ can be computed by a forward chaining procedure which stops after a prespecified number of steps. In this way, the procedure gets more of a ‘breadthfirst’ flavour. The program is given below:
% model_d(D,M) < M is a submodel of the clauses
% defined by cl/1
model_d(D,M):
model_d(D,[],M).
model_d(0,M,M).
model_d(D,M0,M):
D>0,D1 is D1,
findall(H,is_violated(H,M0),Heads),
satisfy_clauses(Heads,M0,M1),
model_d(D1,M1,M).
satisfy_clauses([],M,M).
satisfy_clauses([HHs],M0,M):
disj_element(D,H),
satisfy_clauses(Hs,[DM0],M).
model/1 is replaced by model_d/2, which has an additional depth parameter. On each iteration, all the violated clauses are generated and satisfied.
Below, we illustrate the operation of the program on the above set of clauses, setting the depth to 4:
?model_d(4,M)
M = [list([a,c,a]), list([a,c,b]), list([a,c,c]),% D=4 %
list([a,b,a]), list([a,b,b]), list([a,b,c]),
list([a,a,a]), list([a,a,b]), list([a,a,c]),
list([b,c,a]), list([b,c,b]), list([b,c,c]),
list([b,b,a]), list([b,b,b]), list([b,b,c]),
list([b,a,a]), list([b,a,b]), list([b,a,c]),
list([c,c,a]), list([c,c,b]), list([c,c,c]),
list([c,b,a]), list([c,b,b]), list([c,b,c]),
list([c,a,a]), list([c,a,b]), list([c,a,c]),
append([a],[a],[a,a]), append([a],[b],[a,b]),
append([a],[c],[a,c]), append([a,c],[],[a,c]),
append([a,b],[],[a,b]), append([a,a],[],[a,a]),
append([b],[a],[b,a]), append([b],[b],[b,b]),
append([b],[c],[b,c]), append([b,c],[],[b,c]),
append([b,b],[],[b,b]), append([b,a],[],[b,a]),
append([c],[a],[c,a]), append([c],[b],[c,b]),
append([c],[c],[c,c]), append([c,c],[],[c,c]),
append([c,b],[],[c,b]), append([c,a],[],[c,a]),
append([],[c,a],[c,a]), append([],[c,b],[c,b]),
append([],[c,c],[c,c]), append([],[b,a],[b,a]),
append([],[b,b],[b,b]), append([],[b,c],[b,c]),
append([],[a,a],[a,a]), append([],[a,b],[a,b]),
append([],[a,c],[a,c]),
list([a,c]), list([a,b]), list([a,a]), % D=3 %
list([b,c]), list([b,b]), list([b,a]),
list([c,c]), list([c,b]), list([c,a]),
append([a],[],[a]), append([b],[],[b]),
append([c],[],[c]), append([],[c],[c]),
append([],[b],[b]), append([],[a],[a]),
list([a]), list([b]), list([c]), % D=2 %
append([],[],[]),
thing(c), thing(b), thing(a), % D=1 %
list([]) ]
At depth 1, only domain clauses are satisfied; at depth 2 the first append literal appears. Depths 3 and 4 add list literals for all lists of length 2 and 3, and append literals for all lists of length 1 and 2, respectively.
Korf (1987) gives a comprehensive overview of search methods in Artificial Intelligence. He is also the originator of the iterative deepening search strategy (Korf, 1985). The model generation program in section 5.4 is adapted from (Manthey & Bry, 1988).
R.E. Korf ( 1985) , ‘Depthfirst iterative deepening: an optimal admissible tree search’, Artificial Intelligence 27: 97109.
R.E. Korf ( 1987) , ‘Search’. In Encyclopedia of Artificial Intelligence, S.C. Shapiro (ed.), pp. 994998, John Wiley.
R. Manthey & F. Bry ( 1988) , ‘SATCHMO: a theorem prover implemented in Prolog’. In E. Lusk & R. Overbeek (eds), Proc. 9th International Conference on Automated Deduction, Lecture Notes in Computer Science 310, pp. 415434, Springer‑Verlag.
6
The search strategies of the previous chapter do not make any assumptions about the plausibility of a certain node in the search space leading to a goal. Such a form of search is called blind search. Alternatively, search strategies which do make such assumptions are called informed search strategies. The extra information which is incorporated in the search process is provided by an evaluation function h called a heuristic, which estimates how far a given node is from a goal. This information can be used in several ways. If we use it to order the nodes on the agenda, such that most promising nodes are tried first, the resulting search method is called bestfirst search. In section 6.2, we will discuss a complete variant of bestfirst search called the A algorithm, and investigate the conditions under which this algorithm is optimal. In section 6.3, we will discuss nonexhaustive informed search strategies, that can be derived from bestfirst search by limiting the size of the agenda.
We will assume that a predicate eval/2 is defined, which returns for a given node in the search space an estimate of the distance between that node and a goal node. The children of the current node are added to the agenda according to their heuristic evaluation (lowest values first). Thus, the agenda will always be sorted.
% bestfirst search
% goal/1, children/2 and eval/2 depend on
% the search problem at hand
search_bstf([GoalRest],Goal):
goal(Goal).
search_bstf([CurrentRest],Goal):
children(Current,Children),
add_bstf(Children,Rest,NewAgenda),
search_bstf(NewAgenda,Goal).
% add_bstf(A,B,C) < C contains the elements of A and B
% (B and C sorted according to eval/2)
add_bstf([],Agenda,Agenda).
add_bstf([ChildChildren],OldAgenda,NewAgenda):
add_one(Child,OldAgenda,TmpAgenda),
add_bstf(Children,TmpAgenda,NewAgenda).
% add_one(S,A,B) < B is A with S inserted acc. to eval/2
add_one(Child,OldAgenda,NewAgenda):
eval(Child,Value),
add_one(Value,Child,OldAgenda,NewAgenda).
add_one(Value,Child,[],[Child]).
add_one(Value,Child,[NodeRest],[Child,NodeRest]):
eval(Node,V),
Value<V.
add_one(Value,Child,[NodeRest],[NodeNewRest]):
eval(Node,V),
Value>=V,
add_one(Value,Child,Rest,NewRest).
add_bstf/3 operates by inserting the new children one by one in the current agenda. Note that if the list of children were already sorted, it could more efficiently be merged with the current agenda.
Exercise 6.1. Suppose the call children(Current,Children) results in an ordered list of children. Write a predicate merge/3 which directly merges this list with the current agenda.

As an application of bestfirst search, consider the following puzzle. We have a board consisting of seven consecutive squares, three black tiles and three white tiles, initially placed on the board as in fig. 6.1. The goal is to move the tiles in such a way that the black tiles are to the right of the white tiles (the position of the empty square is immaterial). Each move consists of moving one tile into the empty square, which is allowed if there are at most two other tiles in between. The cost of such a move is 1 if there are no tiles in between, and equals the number of tiles jumped over otherwise.
This puzzle defines a search space, in which nodes are board positions and arcs are single moves. We choose a simple list representation for board positions: e.g. [b,b,b,e,w,w,w] represents the starting position of fig. 6.1. The following predicates examine and manipulate board positions:
% get_tile(P,N,T) < pos. P contains tile T at square N get_tile(Pos,N,T): get_tile(Pos,1,N,T). get_tile([X_Xs],N,N,X). get_tile([_XXs],N0,N,Y): N1 is N0+1, get_tile(Xs,N1,N,Y). % replace(P,N,T,P1) < P1 is P with tile T at square N replace([_XXs],1,Y,[YXs]). replace([XXs],N,Y,[XZs]): N>1, N1 is N1, replace(Xs,N1,Y,Zs).
When not to use lists Recall (section 1.3) that [b,b,b,e,w,w,w] is an alternative notation for the term .(b,.(b,.(b,.(e,.(w,.(w,.(w,[]))))))). This term contains, besides the seven constants in the linear notation, one additional constant (‘ [] ’) and seven functors (‘ . ’), each with two arguments. In contrast, a ‘flat’ term p(b,b,b,e,w,w,w) contains only one additional functor, with seven arguments. Recursive datastructures like lists are useful if the number of items to be stored is not fixed, but they require significantly more storage space. In general, if the number of items is fixed, a nonrecursive datastructure is preferred as far as memory is concerned. Given a term T holding the items, the call arg(N,T,A) retrieves the N th argument A. However, arg/3 requires N to be instantiated, and cannot be used to generate all arguments on backtracking. Therefore, lists are sometimes used even if the nature of the data is nonrecursive. 
We use the above bestfirst search procedure, with a number of changes. First, rather than returning the goal position found, the program should construct a sequence of moves by which the goal position is reached. Therefore, nodes that are examined during the search process are collected in the list Visited. After a goal position has been found, the solution path and its total cost are reconstructed from the list Visited by means of the predicate construct_moves/6.
Secondly, the items on the agenda are represented as pairs v(Value,Move), where Value is the heuristic evaluation of the position reached by Move. Children of the current position are generated by means of the setof/3 predicate, which yields a sorted list. By putting the heuristic Value as the first argument of the functor v, the list Children is therefore sorted according to increasing heuristic value. Therefore, this list can be simply merged with the current agenda to yield the new agenda. The program thus looks as follows:
% tiles(M,C) < moves M lead to a goal position at cost C
% (bestfirst search strategy)
tiles(Moves,Cost):
start(Start),
eval(Start,Value),
tiles_a([v(Value,Start)],Final,[],Visited),
construct_moves(Final,Visited,[],Moves,0,Cost).
% tiles_a(A,M,V0,V) < goal position can be reached from
% one of the positions on A with last
% move M (bestfirst strategy)
tiles_a([v(V,LastMove)Rest],LastMove,Visited,Visited):
goal(LastMove).
tiles_a([v(V,LastMove)Rest],Goal,Visited0,Visited):
show_move(LastMove,V),
setof0(v(Value,NextMove),
( move(LastMove,NextMove),
eval(NextMove,Value) ),
Children),
merge(Children,Rest,NewAgenda), % bestfirst
tiles_a(NewAgenda,Goal,[LastMoveVisited0],Visited).
%%% merge/3: see exercise 6.1
setof0/3 is a variant of setof/3 which succeeds with the empty list if no solutions can be found (see Appendix A.2).
A move from OldPos to NewPos is represented by a triple
m(OldPos,NewPos,Cost)
where Cost specifies the cost of the move. According to the principle of data abstraction, this representation is kept local to the following predicates:
% move(m(X,P,Y),m(P,NP,C)) < position NP can be reached
% from position P in one move
% at cost C
move(m(OldPos,Pos,OldCost),m(Pos,NewPos,Cost)):
get_tile(Pos,Ne,e),get_tile(Pos,Nbw,BW),not(BW=e),
Diff is abs(NeNbw),Diff<4,
replace(Pos,Ne,BW,Pos1),
replace(Pos1,Nbw,e,NewPos),
( Diff=1 > Cost=1
; otherwise > Cost is Diff1 ).
start(m(noparent,[b,b,b,e,w,w,w],0)).
% reconstruct total cost and path from visited nodes
construct_moves(m(noparent,Start,0),V,Ms,[StartMs],Cost,Cost).
construct_moves(m(P,Pos,C),Visited,Ms0,Ms,Cost0,Cost):
element(m(GP,P,C1),Visited), % GP is parent of P
Cost1 is Cost0+C,
construct_moves(m(GP,P,C1),Visited,[PosMs0],Ms,Cost1,Cost).
show_move(m(P,Pos,C),Value):
write(PosValue),nl.
Finally, we have to choose a heuristic evaluation function. A first idea is to count, for each white tile, the number of black tiles to the left of it:
goal(LastMove):
eval(LastMove,0).
eval(m(P,Pos,C),Value):
bLeftOfw(Pos,Value).
bLeftOfw(Pos,Value):
findall((Nb,Nw),
(get_tile(Pos,Nb,b),get_tile(Pos,Nw,w),Nb<Nw),
L),
length(L,Value).
Note that this program actually counts the number of solutions to the query
?get_tile(Pos,Nb,b),get_tile(Pos,Nw,w),Nb<Nw.
by determining the length of the list that is returned by the secondorder predicate findall/3.
Exercise 6.2. Rewrite bLeftOfw/2 such that it uses only firstorder predicates.
The program writes every move it considers on the screen, together with its heuristic evaluation. For instance, the query
%%% all above code taken together %%%
?tiles(M,C).
results in the following output:
[b,b,b,e,w,w,w]9
[b,b,b,w,e,w,w]9
[b,b,e,w,b,w,w]8
[b,b,w,w,b,e,w]7
[b,b,w,w,b,w,e]7
[b,b,w,w,e,w,b]6
[b,e,w,w,b,w,b]4
[b,w,e,w,b,w,b]4
[e,w,b,w,b,w,b]3
[w,w,b,e,b,w,b]2
[w,w,b,w,b,e,b]1
M = [[b,b,b,e,w,w,w],[b,b,b,w,e,w,w],[b,b,e,w,b,w,w],
[b,b,w,w,b,e,w],[b,b,w,w,b,w,e],[b,b,w,w,e,w,b],
[b,e,w,w,b,w,b],[b,w,e,w,b,w,b],[e,w,b,w,b,w,b],
[w,w,b,e,b,w,b],[w,w,b,w,b,e,b],[w,w,e,w,b,b,b]]
C = 15
Since the only moves that are considered are those that are on the final solution path, there is no backtracking. This seems to suggest that the heuristic works quite well. On the other hand, the first few moves seem a bit awkward: in particular, the first and the fourth move are relatively expensive.
Let’s try another heuristic, which counts the number of tiles out of place: a wrong tile on the first or seventh square gives 3, on the second or sixth square 2, and on the third or fifth square 1.
eval(Pos,Value):
outOfPlace(Pos,1,0,Value).
outOfPlace(Pos,8,N,N).
outOfPlace(Pos,K,N0,N):
K<8, K1 is K+1,
( K<4,get_tile(Pos,K,b) > N1 is N0(K4)
; K>4,get_tile(Pos,K,w) > N1 is N0+(K4)
; otherwise > N1=N0 ),
outOfPlace(Pos,K1,N1,N).
We get the following result:
[b,b,b,e,w,w,w]12
[b,b,b,w,w,w,e]9
[e,b,b,b,w,w,w]9
[b,b,b,w,w,e,w]10
[b,b,b,w,w,w,e]9
[b,b,e,w,w,b,w]9
[e,b,b,w,w,b,w]7
[w,b,b,e,w,b,w]7
[w,b,b,w,w,b,e]4
[w,b,b,w,w,e,b]4
[w,b,e,w,w,b,b]3
[w,b,w,w,e,b,b]2
M = [[b,b,b,e,w,w,w],[b,b,b,w,w,e,w],[b,b,e,w,w,b,w],
[e,b,b,w,w,b,w],[w,b,b,e,w,b,w],[w,b,b,w,w,b,e],
[w,b,b,w,w,e,b],[w,b,e,w,w,b,b],[w,b,w,w,e,b,b],
[w,e,w,w,b,b,b]]
C = 14
We observe a couple of differences with the previous heuristic. First of all, there is backtracking: the first, second and fourth moves are not pursued any further. Furthermore, the solution found requires two moves less, and is also cheaper.
Figure 6.2. Solutions found for different heuristics. 
This improvement seems to suggest that an increased punishment for wrongly placed tiles might lead to an even cheaper solution. For instance, we could increase the punishment to 4, 3 and 2, respectively, by adapting the predicate outOfPlace/4 (try it!). This leads to the following sequence of moves:
[b,b,b,e,w,w,w]18
[b,b,b,w,w,w,e]14
[e,b,b,b,w,w,w]14
[b,b,b,w,w,e,w]15
[b,b,e,w,w,b,w]13
[b,b,w,w,e,b,w]11
[b,e,w,w,b,b,w]8
[e,b,w,w,b,b,w]7
[w,b,e,w,b,b,w]7
[w,e,b,w,b,b,w]6
[e,w,b,w,b,b,w]6
[w,w,b,e,b,b,w]6
[w,w,b,w,b,b,e]2
[w,w,b,w,b,e,b]2
M = [[b,b,b,e,w,w,w],[b,b,b,w,w,e,w],[b,b,e,w,w,b,w],
[b,b,w,w,e,b,w],[b,e,w,w,b,b,w],[e,b,w,w,b,b,w],
[w,b,e,w,b,b,w],[w,e,b,w,b,b,w],[w,w,b,e,b,b,w],
[w,w,b,w,b,b,e],[w,w,b,w,b,e,b],[w,w,e,w,b,b,b]]
C = 15
Obviously, this heuristic works no better than the previous two: it does not find an optimal solution, and it investigates more moves than the first heuristic. In fig. 6.2, the solutions found by the three heuristics are compared. In the next section, we will investigate the conditions under which a heuristic is guaranteed to find an optimal solution.
Bestfirst search is an exhaustive search strategy, with a possible behaviour ranging from depthfirst search to breadthfirst search, depending on the heuristic used. By itself, bestfirst search is not complete: the heuristic might consistently assign lower values to the nodes on an infinite path. This is because the heuristic evaluation only takes into account an estimate of the distance to a goal, while we are actually interested in minimising the total cost of reaching a goal along a particular path. In order to obtain a complete bestfirst search algorithm, we use an evaluation function f consisting of two components:
f (n) = g (n) + h (n)
Here, h (n) is the heuristic estimate of the cost of reaching a goal node from node n, as it was introduced before. g (n) is the actual cost of reaching n from the starting node. Their sum f (n) is used to order the nodes on the agenda.
A bestfirst search algorithm which uses such an evaluation function f to estimate the total cost of a path is called an A algorithm. An A algorithm is complete, since the depth count g (n) will prevent search from getting trapped in an infinite path. In effect, the depth count will give the search strategy more of a breadthfirst flavour. Indeed, breadthfirst search is a special case of an A algorithm, with h (n)=0 for every node n. A disadvantage of A algorithms is the decreased efficiency associated with this breadthfirst flavour.
Exercise 6.3. Change the tiles program into an A algorithm, by associating with each move the g value of the position reached by that move (i.e. the cost of the path leading to that position, instead of the cost of the last move). Demonstrate the decreased efficiency of the search.
Breadthfirst search is not only complete, it is also optimal: it always returns a shortest solution path [17] . Do A algorithms inherit this property from breadthfirst search? Obviously, this depends on the function h: if a node n 1 on the cheapest path gets an h estimate that is too high, other nodes will be tried instead, and a solution along a nonoptimal path may be found first. We say that the heuristic was too pessimistic regarding n 1 . Conversely, a heuristic which never assigns a value to a node that is higher than the actual cost of reaching a goal state from that node is called optimistic.
For instance, consider the first heuristic for the puzzle in the previous section, which counts for each white tile the number of black tiles to the left of it. Suppose one black tile has w white tiles to its right, which adds w to the heuristic value for that position. In order to reach a goal position, the black tile has to jump over some of the white tiles, while the remaining white tiles have to jump over the black tile; this has a cost of at least w. Therefore, this heuristic is optimistic. The second heuristic, calculating a weighted sum of tiles out of place, is also optimistic. For instance, suppose that a black tile is at the first square, then there are three white tiles to its right, over which it must jump. Analogously, if it is on the second square, then there are at least two white tiles to jump over. In contrast, the weights used in the third heuristic are too high.
Exercise 6.4. Find a position for which the third heuristic is too pessimistic.
It is possible to prove the following important result: an A algorithm with an optimistic heuristic h always results in an optimal solution. The resulting algorithm is called A* (A star); both A* search and optimistic heuristics are said to be admissible. This should not be mistaken to suggest that better heuristics are more optimistic! On the contrary, a good heuristic is as pessimistic as possible, without becoming nonadmissible. In general, if h 1 (n)≥ h 2 (n) for any node n, then we call heuristic h 1 at least as informed as h 2 . It can be shown that a more informed heuristic indeed searches a smaller part of the search space.
As a small example, consider the search space in fig. 6.3. The h values for each node are as indicated; the cost per arc is 1. The heuristic is optimistic, so A* search will return the shortest path startrsgoal. However, this path is not found immediately: since both p and q have a lower f value than r, they are investigated first. After q has been investigated, s is put on the agenda with f value 3+1=4. Since r has a lower f value of 3, it is the next one to be investigated. Now s will again be added to the agenda, this time with f value 2+1=3! In fact, it is this latter s which, being on the optimal path, leads to the goal.
Thus, although admissible search leads to optimal solutions, it is not necessarily the case that every node on an optimal path is immediately reached along that optimal path. In fig. 6.3, this is caused by ‘local pessimism’ of the heuristic, which estimates the cost of moving from start to p as 3–1=2, while the actual cost is 1. Indeed, if p would have an h value of 2, s would have been reached the first time along the shortest path. This is true in general: if the heuristic estimates the cost of moving from one node to another optimistically, then any node is reached along the cheapest path first. This property is called monotonicity, since one can show that the f values are monotonically nondecreasing along a path.

The first heuristic of the previous section is monotonic, while the second is not. This can be concluded from the following two evaluations:
[b,b,e,w,w,b,w]9
[e,b,b,w,w,b,w]7
The heuristic estimates the cost of this move as 9–7=2, while the actual cost is 1. Since monotonicity implies admissibility, the third heuristic is not monotonic either.
Exercise 6.5. Implement a Prolog metainterpreter which employs an A search algorithm. Use h (R)=  R  (the number of literals in resolvent R) as heuristic. Is this heuristic admissible and monotonic?
The search strategies discussed until now are all exhaustive: they will all search the complete search space in the worst case. This is so because all children of a certain node will be put on the agenda, in some order. Exhaustive search is often impractical, since the size of the agenda grows exponentially with the search depth. The use of a heuristic offers the possibility of keeping only a selection of best nodes on the agenda. Such nonexhaustive search strategies are, of course, not guaranteed to be complete, and should only be applied in combination with a reasonably informed heuristic.
Beam search is a form of bestfirst search in which the number of nodes on the agenda is limited. In its most simple form, the agenda is of fixed size. Alternatively, one could allow the agenda to grow polynomially (instead of exponentially) with the search depth. The effect of this strategy is, that only a ‘beam’ of the search space is searched:
search_beam(Agenda,Goal):
search_beam(1,Agenda,[],Goal).
search_beam(D,[],NextLayer,Goal):
D1 is D+1,
search_beam(D1,NextLayer,[],Goal).
search_beam(D,[GoalRest],NextLayer,Goal):
goal(Goal).
search_beam(D,[CurrentRest],NextLayer,Goal):
children(Current,Children),
add_beam(D,Children,NextLayer,NewNextLayer),
search_beam(D,Rest,NewNextLayer,Goal).
In this program, two agendas are maintained, one for the current level, and one for the children of the nodes on the current level. Once the current level is exhausted, the agenda’s are swapped and the depth count is increased. The depth count is passed on to the predicate add_beam/4, in order to decide how many children to add to the agenda for the next level.
Exercise 6.6. Extend the program of exercise 6.3 with beam search with fixed agenda size. Demonstrate the nonoptimality of the search strategy.
If we limit the size of the agenda to 1, we arrive at a search strategy called hillclimbing. It is also callled greedy search, since there is no backtrackinginvolved. Hillclimbing is the type of search employed by a wanderer who wants to reach the top of a hill by always moving in the steepest direction. Clearly, she will reach the top of a hill (and never get off it), but it is not necessarily the highest one.
The predicate search_hc/2 below implements a hillclimbing search strategy. Instead of maintaining an agenda of nodes yet to be investigated, it maintains a single node in its first argument. Therefore, hillclimbing has some similarity with depthfirst search with implicit agenda:
search_hc(Goal,Goal):
goal(Goal).
search_hc(Current,Goal):
children(Current,Children),
select_best(Children,Best),
search_hc(Best,Goal).
The predicate select_best/2 selects the best child of the current node, according to the heuristic value to be optimised. To stress that backtracking is not needed after the best child has been selected, one can place a cut before the recursive call in the second clause.
Nilsson (1980) gives a gentle introduction to the use of heuristics and their properties. (Pearl, 1984) is the main source for mathematical results on heuristics. The sliding tiles puzzle was taken from (Luger & Stubblefield, 1993).
G.F. Luger & W.A. Stubblefield ( 1993) , Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Benjamin/Cummings, second edition.
N.J. Nilsson ( 1980) , Principles of Artificial Intelligence, Tioga Press.
J. Pearl (1984), Heuristics: Intelligent Search Strategies for Computer Problem Solving, AddisonWesley.
[12] The remaining disk on A can safely be ignored, since it is the largest.
[13] Such a program should perform breadthfirst search; see Exercise 5.2.
[14] From now on, we denote a Predicate with Arity as Predicate/Arity. This is because predicates with different arity are different predicates, even if they share the same predicate name.
[15] Here, we refer to the resultant path, ignoring backtracking.
[16] We will assume for the moment that the head literals are ground by the substitution which makes the body true; a more detailed discussion follows below.
[17] If arcs can have different costs, breadthfirst search does not necessarily return the cheapest solution path.
This book discusses methods to implement intelligent reasoning by means of Prolog programs. The book is written from the shared viewpoints of Computational Logic, which aims at automating various kinds of reasoning, and Artificial Intelligence, which seeks to implement aspects of intelligent behaviour on a computer.