Reasoning with
structured knowledge

A physical object is structured if it consists of several components having certain spatial relationships to each other. Likewise, knowledge is structured if its components have certain logical relationships. For instance, a description of the London underground system consists of a list of stations (the components) plus a list of connections between stations (the relationships). As can be seen in fig. 1.1 in Chapter 1, such structured knowledge has a convenient graphical representation, in which components are represented by points or nodes, and relationships are represented by lines or arcs between nodes. In mathematics, such graphical structures are called graphs.

A characteristic property of structured knowledge is the distinction that is made between explicit and implicit relationships. For instance, in the underground example the direct connections which exist between two stations are the explicit relationships. All other relationships (i.e. connections between stations that are further apart) are only implicitly represented, and must be reconstructed from the explicit relationships. Therefore, reasoning forms an integral part of any form of structured knowledge.

Other examples of structured knowledge, encountered in Part I, include Prolog terms, proof trees, and SLD-trees. Among these, SLD-trees constitute a special case, since they are not given a priori as part of the knowledge describing a certain Universe of Discourse, but are instead derived from problem specifications of the form ‘given program P, find all answers to query Q ’. By means of SLD-trees, such problems are translated to problems of the form ‘given SLD-tree T, find all paths from the root of the tree to the empty clause’. Problems of the latter kind are called search problems, and the graph being searched is called a search space. Most problems in intelligent reasoning are search problems of one kind or the other.

(a) Starting position

(b) Intermediate position

(c) Goal position

Figure II.1. The Towers of Hanoi.

In principle, any given problem can be defined as a search problem. To this end, we must identify:

(i)   the nodes in the search space;

(ii)  the arcs between nodes;

(iii) the starting node;

(iv) the goal node.

For instance, when searching for an answer to a query by means of SLD-resolution, the nodes in the search space are resolvents, the arcs are resolution steps by means of a program clause, the starting node is the query, and the goal node is the empty clause. As another example, we consider the puzzle known as The Towers of Hanoi. This puzzle consists of three pegs and n disks of decreasing size. Initially, all the disks are on the left peg, such that no disk is placed on a smaller one. This rule is to be obeyed throughout the game. The goal is to move all the disks to the right peg by moving one disk at a time. This problem is easily reformulated as a search problem, where nodes are allowed positions, and arcs are moves of the upper disk on one peg to another. Starting node and goal node are as in fig. II.1.

An analytic solution to the Towers of Hanoi

In the case of the Towers of Hanoi, there is a simple analytic solution based on the following observation: suppose we are able to solve the problem for n –1 disks, then we can solve it for n disks also: move the upper n –1 disks from the left to the middle peg [12] , move the remaining disk on the left peg to the right peg, and move the n –1 disks from the middle peg to the right peg. Since we are able to solve the problem for 0 disks, it follows by complete induction that we can solve the problem for any number of disks. The inductive nature of this argument is nicely reflected in the following recursive program:


  % hanoi(N,A,B,C,Moves) <-Moves is the list of moves to
  %                        move N disks from peg A to peg C,
  %                        using peg B as intermediary peg


N1 is N-1,
append(Moves1,[A to C|Moves2],Moves).

For instance, the query ?-hanoi(3,left,middle,right,M) yields the answer

  M = [left to right, left to middle, right to middle,
left to right,
middle to left, middle to right, left to right ]

The first three moves move the upper two disks from the left to the middle peg, then the largest disk is moved to the right peg, and again three moves are needed to move the two disks on the middle peg to the right peg.

Since the number of allowed positions is 3 n , the search space for the Towers of Hanoi grows exponentially with the number of disks. In practice, this means that the problem will be unsolvable for large n, no matter how efficient the search program, or how powerful the computer. This is a common characteristic of search problems. Search is a problem solving method which, although applicable to almost any problem, has considerable practical limitations. Therefore, search is only applied to problems for which no analytic solutions are known.

For many problems in intelligent reasoning such analytic solutions simply do not exist, and search is the best we can do. In Chapters 5 and 6, we will present and analyse various methods for searching graphs. Since graphs are not only important for search problems, but for all forms of structured knowledge, Chapter 4 is devoted to a discussion of various ways to represent structured knowledge in clausal logic.


Representing structured knowledge

In this chapter we will discuss various ways to represent structured knowledge in Prolog. The central notion is that of a graph, which is the mathematical abstraction of the graphical representation of structured knowledge. A graph consists of nodes, and arcs between nodes. Nodes are identified by their name, and arcs are identified by the pair of nodes they connect. By convention, arcs are taken to be directed, which means that an arc from n 1  to n 2  is not the same as an arc from n 2  to n 1 .Undirected arcs (as in the London Underground example of Chapter 1) can be viewed as consisting of two directed arcs, one in each direction. If an arc is directed from n 1  to n 2 , then n 1  is called the parent of n 2 , and n 2  is called the child of n 1 .

A path in a graph is a sequence of nodes, such that for each consecutive pair n i , n j  in the sequence the graph contains an arc from n i  to n j . If there is a path from n k  to n l , then n k  is called an ancestor of n l , and n l  is called a descendant of n k . A cycle is a path from a node to itself. Obviously, when a path from n i  to n j  passes through a node which is also on a cycle, there are infinitely many different paths from n i  to n j . Thus, a graph consisting of a limited number of nodes and arcs can generate infinite behaviour. This is something to keep in mind when searching such cyclic graphs!

A tree is a special kind of graph which contains a root such that there is a unique path from the root to any other node. From this it follows that for any two nodes in a tree, either there is no path between them, or there is exactly one. Thus, trees are necessarily non-cyclic or acyclic. A leaf is a node without children. Often, leaves are goal nodes in search spaces like SLD-trees. Strictly speaking, an SLD-tree is not a tree, because there might be several ways to construct the same resolvent. By convention, however, resolvents constructed in a different way are considered to be distinct nodes in the SLD-tree. Usually, trees are drawn upside down, with the root node at the top; arcs are implicitly understood to be directed from top to bottom. Note that, if n is the root of a tree, each of its children is the root of a subtree (fig. 4.1).

Figure 4.1. A tree with two subtrees.

4.1   Trees as terms

Recall from section 1.3 that complex Prolog terms like


can be viewed as a tree, with the functor route acting as the root of (sub)trees, and tottenham_court_road, leicester_square, and noroute as leaves (fig. 1.6). Conversely, trees can be represented by Prolog terms.

Exercise 4.1. Draw the tree represented by the term 1(2(4),3(5,6)).

A tree is traversed by first visiting its root, and then recursively traversing all of its subtrees. A list of subtrees is obtained by decomposing the complex term by means of the =.. predicate (see section 3.7):

% term_tree(T,R,S) <- term T represents a tree with root R and list of subtrees S

% term_root(T,R) <- R is the root of tree T

% term_subtree(T,S) <- S is a subtree of tree T

Data abstraction

The principle of data abstraction prescribes to keep datastructures local to specific predicates such as term_tree, term_root and term_subtree, and to access the datastructures only through these predicates. The main advantage of this design principle is modularity: if we choose to change the representation of a tree, we just have to modify these specific predicates, but the predicates which call them need not be changed. In contrast, if we unfold term_tree, term_root and term_subtree into the definition of term_arc,
we get the following piece of code:



This program fragment is badly designed, because term_arc explicitly mentions the way trees are represented by Prolog terms. Consequently, if we change this representation, term_arc needs to be changed as well. This illustrates that the design of good datastructures is as important in Prolog
as it is in any other programming language.

By means of these simple predicates, we can write a program for finding arcs and paths in a tree. Paths are represented as lists of nodes, and an arc is simply a path consisting of two nodes:

% term_arc(T,A) <- T is a tree, and A is an arc in T
term_arc(Tree,[Root,SR]):- % Arc from Root to Subtree

term_arc(Tree,Arc):- % Arc in Subtree

% term_path(T,P) <- T is a tree, and P is a path in T
term_path(Tree,Arc):- % consisting of one arc

term_path(Tree,[Node1,Node2|Nodes]):- % several arcs 

Exercise 4.2. Give a term Tree, such that it contains the tree of exercise 4.1, and such that Path=[1,2,7,8] is an answer to the query

Figure 4.2. Which are the paths
through this tree?

Consider the tree in fig. 4.2. The following query lists all the paths in this tree:


Path = [1,2];
Path = [1,3];
Path = [2,4];
Path = [2,5];
Path = [2,6];
Path = [5,7];
Path = [3,8];
Path = [3,9];
Path = [9,10];
Path = [1,2,4];
Path = [1,2,5];
Path = [1,2,6];
Path = [1,2,5,7];
Path = [1,3,8];
Path = [1,3,9];
Path = [1,3,9,10];
Path = [2,5,7];
Path = [3,9,10];
No more solutions


Exercise 4.3. Explain the order in which these paths are found.

It would be convenient to have a program for printing Prolog terms which represent trees in a tree-like way. The nicest way to do this is to print from the root down; however, this requires a rather elaborate program [13] . A reasonable alternative is to print the tree rotated at 90 degrees, from the root to the right. A program to do this is given below.


% write a Tree at position Pos
  term_tree(Tree,Root,Subtrees),  % decompose Tree
  term_write_node(Pos,Pos2,Root), % write Root
  term_writes(Pos2,Subtrees).     % new position

% write a list of trees at position Pos
term_writes(Pos,[Tree]):-!,       % no newline here
  nl,tab(Pos),                    % skip to position Pos

% write a Node from Begin to End
  name(Node,L),length(L,N),       % N is length of Nodename
  End is Begin+10,
  N1 is End-Begin-N,              % N1 is length of line

% write a line of given length
  N>0,N1 is N-1,

name/2 [14] is a built-in predicate, converting an atom into a list of ASCII-codes. In combination with length/2, it is used to determine the number of characters in an atom. The query ?-term_write(1(2(4,5(7),6),3(8,9(10)))) displays the tree as follows:


4.2   Graphs generated by a predicate

In the preceding section, a tree was represented by a Prolog term. This is convenient for relatively small trees such as proof trees, that are processed and passed around as a unit. However, for bigger trees it is a better idea not to represent them explicitly by a Prolog term, but implicitly by a set of ground facts, listing the arcs in the graph. An additional advantage of this representation is the possibility of representing graphs that are not trees.

As an example of this representation, the tree in fig. 4.2 would be represented by the following facts:


The predicate for finding a path in a graph now needs a few minor adjustments: the graph is not passed on as an argument, and arc/2 is used rather than term_arc/2:

% path(P) <- P is a path in the graph given by arc/2

Exercise 4.4. Draw the SLD-tree for the query ?-path([1|Path]).

path/2 will generate paths between any two connected nodes. When searching a graph such as an SLD-tree, we are normally only interested in paths which start at a given node (for instance, the root of a tree), and end in a leaf. The following program will do the job. Note that this program differs from the previous one in that it allows for paths consisting of one node only.

% path_leaf(N,P) <- P is a path starting at node N, ending 
%                      in a leaf in the graph given by arc/2


The query ?-path_leaf(1,Path) will lead to the following answers:

Path = [1,2,4];
Path = [1,2,5,7];
Path = [1,2,6];
Path = [1,3,8];
Path = [1,3,9,10];
No more solutions

Exercise 4.5. Draw the SLD-tree for this query.

Notice the order in which the paths to the leafs are found — the longer path [1,2,5,7] is found before the shorter path [1,2,6]. This kind of search is called depth-first search, because the deepest unvisited nodes are preferred. In contrast, breadth-first search tries all nodes on a given level before going one level deeper; consequently, shortest paths are found first. Of course, the order in which nodes are visited can only be understood procedurally — logically speaking, there is nothing in the program which prescribes such an order. It is only because Prolog itself searches the SLD-tree in a depth-first fashion, that programs like the above perform depth-first search.

In real life, graphs are often infinite. For instance, many SLD-trees are infinite, even for very simple programs such as (‘br’ abbreviates brother):


Exercise 4.6. Sketch the SLD-tree for the query ?-br(paul,B).

SLD-trees are graphs, with resolvents as nodes. Representing a resolvent by the list of its literals, we would need an infinite number of facts to represent SLD-trees, for instance:


In such cases, it is a better idea to write a program which generates these facts. In other words, we need a logical definition of arc/2.

Now, arc(A,B) is true if A and B are lists of negative literals interpreted as resolvents, and one resolution step applied to A and a clause for br/2 yields B. We can write this down by means of the predicate resolve/3, which performs one resolution step, and the two clauses for br/2 in the appropriate representation. This gives the following program:

arc(A,B):- resolve(A,(br(X,Y):-[br(X,Z),br(Z,Y)]),B).
arc(A,B):- resolve(A,(br(paul,peter):-[]),B).

% resolve(G,C,NewG) <- the goal G (a list of atoms)
%                          resolves with the clause C (body
%                          is a list) to yield the goal NewG
  H1=H2,    % literal in goal unifies with head of clause
  resolve(T,Clause,B).  % try next literal

The query ?-arc([br(paul,B)],N) results in the answers

B = Y
N = [br(paul,Z),br(Z,Y)];

B = peter
N = []

as expected.

Note that a query of the form ?-arc(R,[]) asks for a path from R to a success branch in the SLD-tree, thus simulating a query :-R. That is, the above program for arc/2 is simply a meta-interpreter (with the object-level program hardwired in its clauses). In section 5.3, we encounter a similar meta-interpreter for full clausal logic.

4.3   Inheritance hierarchies

In the foregoing sections, we studied two kinds of graphs: trees represented by Prolog terms, and graphs generated by predicate definitions. In both cases, the main inference step is to search for a path satisfying certain conditions. In this section, we study a type of structured knowledge called an inheritance hierarchy, which differs from the previous cases in that it requires a more elaborate kind of reasoning. Basically, this is because a node in such a hierarchy is a more complex entity with various kinds of properties. Lower nodes in the hierarchy inherit properties from ancestor nodes, unless they are assigned a property of their own. Thus, reasoning about inheritance hierarchies not only requires searching for a path, but also collecting properties found along a path.

Figure 4.3. An inheritance hierarchy of musical instruments. Nodes in the tree denote classes; at the bottom, instances for each class are listed.

Fig. 4.3 displays an inheritance hierarchy of a variety of musical instruments. The topmost node represents the class of all instruments in the Universe of Discourse, which has three subclasses: wind instruments, string instruments, and percussion instruments. In turn, wind instruments are divided into woodwinds and brass instruments, and so on. At the bottom of the figure, instances are listed for each most specific subclass. Thus, guitar, lute and harp are instances of the class ‘plucked instruments’, and thus also of the classes ‘string instruments’ and ‘instruments’.

If we want to represent such hierarchies in Prolog, we have to choose a representation for instances and classes. By far the most natural choice is to represent an instance by a constant, and a class by a unary predicate. A class–superclass relation is then expressed by a clause, and an instance–class relation is expressed by a ground fact:

% Classes

% Instances

With these clauses, it is possible to ask questions about instances and (super)classes. For example, we can find out what instruments there are by means of the query


As was remarked above, nodes (and instances) in an inheritance hierarchy can be assigned properties, where a property is an attribute–value pair. For instance, the material an instrument is made of can be an attribute, with possible values ‘wood’ and ‘metal’. The statement ‘saxophones are made of metal’ is represented by the ground fact


The statement ‘instances of the class of string instruments are made of wood’ is represented by the clause


Since string(piano) is a logical consequence of the previous clauses expressing the hierarchy, we can now prove material(piano,wood). Thus, the chosen representation takes care of the inheritance of properties, as required.

In our musical Universe of Discourse, we consider three attributes: the function of an instrument (all instruments have a musical function), the material of an instrument (wood or metal), and the way the instrument produces sound, expressed by the attribute action:


% Materials

% Actions

For instance, all brass instruments have lip-reeds, while some woodwinds have a double reed (oboes, for example) or a single reed (saxophones).

Note that there is a potential conflict in the above clauses: woodwinds are generally made of wood, but flutes and saxophones are made of metal. Thus, the query


has two answers:

M = metal;
M = wood

The order in which these answers are found is, of course, determined by the order of the clauses above. Since we put the ground facts listing properties of instances before the clauses listing properties assigned to classes (and the clauses pertaining to classes before those pertaining to superclasses), the answers are found by climbing the inheritance hierarchy from bottom to top, and the first property found is the desired one. It should be noted, however, that things are not always that simple. If more sophisticated inheritance strategies are needed, alternative representations, like the ones to be discussed later in this section, are to be preferred.

A typical thing one would like to know regarding an inheritance hierarchy is: what are the properties of a given instance? In principle, this requires a second-order query 


which is not allowed in Prolog if Attr is not instantiated. We can get around this by maintaining a list of all attributes, and constructing the appropriate goal for each attribute by means of the predicate get_value/3:

  get_value(Attr,Inst,Value),!,   % only first answer


  Goal =.. [A,B,C],

For instance, the query ?-properties(saxophone,P) yields the answer

P = [function=musical,material=metal,action=reed(single)]

Only the most specific property regarding material is found, because of the cut in the recursive clause of properties/3.

As indicated above, the representation of inheritance hierarchies by means of clauses only allows a relatively simple inheritance strategy. Moreover, since classes are represented by predicates, reasoning about classes becomes a second-order logical inference. For example, the question ‘what are the subclasses of the class of instruments’ is not easily handled in the above representation. Both shortcomings can be alleviated if classes and attributes are represented by terms instead of predicates. In effect, this will result in a clearer separation of declarative knowledge describing the hierarchy, and procedural knowledge describing the inheritance strategy. This can be done in several ways; two possibilities are worked out below.

The first idea is to represent the tree in fig. 4.3 according to the first method in section 4.2, i.e. by a set of ground facts listing the arcs in the tree. Thus, nodes (classes) are represented by constants, and arcs (class–superclass relations) are represented by means of the predicate isa/2:

% Classes

isa(wind,instrument).        isa(string,instrument).

isa(percussion,instrument).  isa(woodwind,wind).

isa(brass,wind).             isa(plucked,string).

isa(bowed,string).           isa(keyboard,string).

isa(tuned,percussion).       isa(untuned,percussion).

Instance–class vs. class–superclass

In this representation there appears to be no difference between instance–class relations and class–superclass relations. Indeed, we could have treated instances just as classes, and use the isa/2 predicate for both. However, this obscures the semantic difference between instances and classes, which can lead to problems. For example, instances of one class can be composed of instances of other classes (a bicycle is composed of two wheels and a frame), but this is not true for classes
(the class of bicycles is not composed of the class of wheels
and the class of frames).

Instances are listed by means of the predicate inst/2:

% Instances

inst(recorder,woodwind).     inst(flute,woodwind).

inst(oboe,woodwind).         inst(saxophone,woodwind).

inst(trumpet,brass).         inst(trombone,brass).

inst(horn,brass).            inst(guitar,plucked).

inst(lute,plucked).          inst(harp,plucked).

inst(violin,bowed).          inst(cello,bowed).

inst(harpsichord,keyboard).  inst(piano,keyboard).

inst(triangle,tuned).        inst(kettledrum,tuned).

inst(cymbal,untuned).        inst(snaredrum,untuned).

The difference between inheritance hierarchies and ordinary graphs lies in the additional meaning assigned to classes and instances by means of properties. Therefore, a graph extended with properties is commonly called a semantic network. Properties are represented by means of the predicate prop/3:

% Class properties










% Instance properties







Since we will be using a more sophisticated inheritance strategy, the order of these facts is now immaterial.

The inheritance strategy is to collect the properties of instances before properties inherited from classes:

props(Inst,InstProps),   % properties of instance
inherit_sn(Class,InstProps,Props).% inherit the rest


In turn, inherited properties are collected from bottom to top in the hierarchy, so that specific properties are found before general properties:


props(Class,GeneralProps),% properties of this class
isa(Class,SuperClass),    % climb hierarchy
inherit_sn(SuperClass,Props,AllProps).% inherit rest

top refers to the root of the universal inheritance hierarchy, which should be added as the root of any sub-hierarchy:


The predicate override/3 checks for every general property whether a more specific property has already been found. If so, we say that the specific property overrides the general property:

  member(Attr=_V,Specific),       % overriding
  not(member(Attr=_V,Specific)),   % no overriding

Again, the query ?-properties_sn(saxophone,P) yields the answer

P = [function=musical,material=metal,action=reed(single)]

What we gained with this representation, however, is a declarative specification of the inheritance strategy, which is therefore also amenable to change. For instance, if the inheritance hierarchy is not a tree, a class could be a subclass of two or more other classes. In this case, different values for the same attribute could be inherited along different paths; this is called multiple inheritance. Such conflicts need to be resolved (or at least signalled) by the inheritance strategy.

Exercise 4.7. Implement a multiple inheritance strategy.

A slightly different but related representation is obtained if we group all information about one class or instance together in a socalled frame. A frame representation is obtained from the semantic network representation by adding a list of properties to each arc in the network. Below, class frames are defined by the predicate class/3, and instance frames are defined by the predicate instance/3:

% Classes












% Instances





/* etcetera... */



Inheritance is as easily implemented as in the semantic network representation:

  instance(Inst,Class,InstProps),       % instance properties
  inherit_fr(Class,InstProps,Props).    % inherit the rest

  class(Class,SuperClass,GeneralProps),         % this class
  inherit_fr(SuperClass,Props,AllProps).        % inherit rest

Historically, semantic network and frame-based representations were proposed in quite different contexts. We see that their representation in Prolog is very similar.

Further reading

An introduction to Knowledge Representation can be found in (Ringland & Duce, 1989). (Brachman & Levesque, 1985) is a collection of papers discussing various aspects of Knowledge Representation, such as the difference between isa-links and instance-of-links in semantic networks. Papers about inheritance hierarchies can be found in (Lenzerini et al., 1991). LOGIN is an extension of Prolog in which inheritance is represented by terms rather than clauses (Aït-Kaci & Nasr, 1986).

H. Aït-Kaci & R. Nasr ( 1986) , ‘LOGIN: a logic programming language with built-in inheritance’, Journal of Logic Programming 1986 (3): 185-215.

R.J. Brachman & H.J. Levesque (eds) ( 1985) , Readings in Knowledge Representation, Morgan Kaufmann.

M. Lenzerini, D. Nardi & M. Simi (eds) ( 1991) , Inheritance Hierarchies in Knowledge Representation and Programming Languages, John Wiley.

G.A. Ringland & D.A. Duce (eds) ( 1989) , Approaches to Knowledge Representation: an Introduction, Research Studies Press.


Searching graphs

As explained earlier, a search problem is defined by a search space, which is a graph with one or more starting nodes and one or more goal nodes. Given a search space, a solution is a path from a starting node to a goal node . A cost function c assigns a number to each arc from n 1  to n 2 , specifying the cost of moving from n 1  to n 2 . The cost of a path is the sum of the costs of the arcs in the path. Given a search space and a cost function, an optimal solution is a solution with minimal cost. A trivial example of a cost function is c (a)=1 for each arc a, in which case the cost of a path equals the length of the path, and an optimal solution is a shortest path. For SLD proofs, such a cost function would measure the depth of the proof tree.

In this chapter, we will discuss and implement some basic techniques for finding solutions in search spaces. Their common denominator is that they are exhaustive: that is, in the worst case they will eventually visit every node in the search space along every possible path, before finding a solution. On the other hand, they differ with regard to:

• completeness — will a solution always be found?

• optimality — will shorter paths be found before longer ones?

• efficiency — what are the runtime and memory requirements?

We start with a general discussion of the problem of search. Then, we will discuss the basic exhaustive search strategies: depth-first search, breadth-first search, and forward chaining.

5.1   A general search procedure

Imagine a visit with a friend to the Staatsgalerie in Stuttgart. It is very crowded in this beautiful art museum, and while admiring theMondriaanworks you lose sight of each other. Having been through situations like this before, you had made the agreement that she would stay where she was, while you would go looking for her. What strategy would you employ?

First of all, to make sure that you don’t miss any room, you have to visit them in some systematic way. You don’t have a global map of the building, so you decide to never leave a room through the door through which you entered. Thinking about it, you recognise that this procedure won’t fully work, because a room might have just one door: the one through which you entered. Assuming that there are still rooms not yet visited, you have to leave such a room through the same door through which you entered, and find a room you’ve visited before, with a door not yet taken. Such a procedure, however, requires that, for each room you visit, you remember the door through which you entered the room (in order to go back to a room you’ve been in before), and the doors you tried already (in order to try a remaining door).

Figure 5.1. Searching for a friend.

Luckily enough, you carry a piece of paper and a pencil with you, so you can stick little papers saying ‘entrance’ or ‘exit’ on the appropriate doors. However, the amount of paper you have is limited, so a better idea is to mark the doors not yet tried, and to remove the paper when you try a door, so that you can use the paper again. By reusing those pieces of paper that become obsolete, you minimise the amount of paper needed. Similarly, if you return to a room in which there are no remaining doors, you will never return to that room, so you might want to remove the paper saying ‘entrance’ as well. On the other hand, leaving one paper might be a good idea, just in case you return to the room later via a ‘circular’ route; you are then able to see that you already tried all the doors in that room.

So you decide to employ the following procedure:

1.   mark every door in the starting room as ‘exit’;

2.   examine the current room;

3.   if you find your friend, stop;

4.   otherwise, if there are any doors marked ‘exit’ in the room,

4a.  choose one of them;

4b.  remove the mark ‘exit’;

4c.  go through it;

4d.  if one of the doors in this room is already marked ‘entrance’, go back to the previous room, and go to step 4;

4d.  otherwise, mark the door you just came through as ‘entrance’;

4e.  mark all other doors as ‘exit’;

4f.  go to step 2;

5.   otherwise, take the door marked ‘entrance’, and go to step 4.

Figure 5.2. You find her by systematically searching the rooms, backtracking when all the rooms reachable from the room you’re in
have been visited already (thin lines).

Steps 1-3 are obvious enough. In step 4, you check whether there are any untried doors left; if not, you have to go back to a previously visited room, and do the same there (step 5). This process of reconsidering previous decisions is called backtracking. It is an essential step in any exhaustive search procedure. If there are any alternatives left, you have to check whether you have been there already via some other route (step 4d). This step is called loop detection, and is only needed for cyclic search spaces. If you omit this step in such cases, you risk walking in circles forever. If you are in a yet unvisited room, you do some bookkeeping and proceed in the same way.

How does this search procedure work in practice? Suppose you are in the Miró room (fig. 5.1). You decide to try the doors in that room in a clockwise order. You first check the Léger room, then the Kupka room, and finally the Kandinsky room. When you enter the Léger room again from Kandinsky , you realise that you’ve been there before, because there’s a door marked ‘entrance’. So you backtrack to Léger (because there are no alternatives left in Kandinsky and Kupka ), and try the next door. This one leads you straight to Kandinsky again, and your little papers remind you that you have been there already. You backtrack again to Léger , and try the Matisse room. From there, Klee is a dead end, so you backtrack and finally find your friend still admiring the Mondriaan paintings! The route you walked is shown in fig. 5.2, (thin lines denote backtracking).

In a computer implementation of such a search procedure, you don’t walk from room to room. Instead of marking nodes and returning to them later, the search program stores a description of those nodes in memory. In the above example, the number of marks needed corresponds to the amount of memory required during search, and just as marks can be used several times, memory space can be reclaimed once all the children of a node have been put on the list. This list of nodes to be tried next is called the agenda; this is an important concept, which can be used to describe any backtracking search procedure. Such a general-purpose agenda-based search algorithm operates as follows (for simplicity, we have omitted loop detection):

1.   take the next node from the agenda;

2.   if it is a goal node, stop;

3.   otherwise,

3a.  generate its children;

3b.  put them on the agenda;

3c.  go to step 1.

This procedure can be almost directly translated into a Prolog program:

% search(Agenda,Goal) <- Goal is a goal node, and a
%                       descendant of one of the nodes
%                       on the Agenda



In this program, we have abstracted from the way the agenda is represented. Furthermore, as remarked above, by specifying the order in which nodes are added to and removed from the agenda, we obtain specific search strategies. In the Staatsgalerie example, doors marked most recently are tried first. In other words, the agenda is a last in–first out datastructure, or a stack. In this example, it seems the most reasonable approach, because it minimises the amount of walking needed to backtrack to another room. The result is a depth-first search procedure, moving away as quickly as possible from the initial room, only coming closer again when backtracking.

On the other hand, the shortest path between your initial position and your friend is Miró - Mondriaan , while you finally reach your friend along the path Miró - Léger - Matisse - Mondriaan [15] . You would have found your friend sooner if you would have examined all rooms next to Miró first. But suppose your friend was two rooms away, e.g. in the Matisse room? Well, in that case you would have gone to the rooms next to Miró ( Léger and Mondriaan ), and then to all rooms next to those ( Kupka , Kandinsky and Matisse ). That is, doors marked most recently are tried last: a first in–first out strategy, implemented by a datastructure called a queue. Thus you would have found your friend along one of the two shortest paths ( Miró - Léger - Matisse ). This second method is an example of breadth-first search.

Finally, a third approach called best-first search orders the doors to be tried next according to some criterion called a heuristic. For instance, suppose you saw your friend last in the Mondriaan room. In this case it would be wise to overrule the default clockwise ordering, and to try Mondriaan before Léger . Consequently, you would have found your friend along the path Miró - Mondriaan - Matisse . In the following sections, we will take a closer look at depth-first and breadth-first search. The use of heuristics will be studied in Chapter 6.

We obtain a depth-first search strategy if the agenda is implemented as a last in–first out datastructure. The obvious way to do this in Prolog is to represent the agenda by a list of nodes, and to add and remove nodes from the front of the list:

% depth-first search



The children/2 predicate finds all children of a given node. If arcs in the search space are defined as before by the arc/2 predicate, we could define children/2 as


In this way, all children of the current node are generated and stored on the agenda before examining the next node.

% an example of depth-first search


% nodes are lists of letters

% find palindromes

This depth-first search program can be refined in several ways, of which we will consider two: returning a path to the goal, and loop detection. In the above implementation, it is impossible to return a path if we discover a goal node on the agenda, because we do not know how that goal node was reached. Instead of putting a single node on the agenda, we will store a complete path to that node. This is simply accomplished by changing the children/2 predicate as follows:


Of course, the goal/1 predicate must be changed accordingly, because its argument is now a path instead of a single node. A query now takes the form


The second refinement concerns loop detection. In order to check whether a node has been investigated before, we must maintain a list of visited nodes. We only add nodes to the agenda which do not already occur on this list (or on the agenda):

% depth-first search with loop detection




not element(Child,OldAgenda),
not element(Child,Visited),



Note that the combination of loop detection and path construction allows the following optimisation: instead of maintaining complete paths to a node on the agenda and the list of visited nodes, we only store a node together with its parent. Once we encounter a goal, all its parents are on the list of visited nodes, which allows us to reconstruct the path.

Exercise 5.1. Modify the predicate search_df_loop/3 such that it reconstructs the path to a goal in this way.

We now analyse depth-first search with respect to completeness, optimality and efficiency. A search strategy is complete if it is guaranteed to find every goal. Obviously, any exhaustive strategy is complete for finite search spaces. However, in an infinite search space depth-first search might get trapped in an infinite branch before having found all the solutions. For instance, reconsider the infinite SLD-tree in fig. 3.2. A left-to-right depth-first search strategy would dive deeper and deeper into the tree, taking the left branch at every node, and never find the goals in the branches to the right. So, depth-first search is, in general, incomplete. Since Prolog itself employs depth-first search, Prolog is also incomplete. Often, however, the incompleteness of Prolog can be avoided by reordering the clauses such that goals are found before infinite branches (for instance, by putting the recursive clause last), and to cut away the infinite parts of the search space.

If there is no cost function, a search strategy is optimal if it is guaranteed to reach any goal along the shortest path possible. The Staatsgalerie example already showed that this is not true for depth-first search: you found your friend but, while she was in a room next to your initial position, you finally reached that room through two other rooms. Thus, depth-first search does not always find a shortest solution path. Finally, we can estimate the memory requirements for depth-first search as follows. Suppose we are searching a tree in which each node has, on the average, B children. The number B is known as the branching factor. Generating the children of a node adds B nodes to the agenda. We are interested in the following question: if a goal is found at depth n (i.e. the path from the root to the goal has length n), how many nodes are there on the agenda? Since at each level only the children of a single node are generated, the size of the agenda is of the order B × n, that is, a linear function of the depth of the tree. The time complexity of depth-first search is of the order B n , since the runtime is proportional to the number of nodes searched, and in the worst case the goal is found in the last branch, after searching B n   nodes. Of course, we cannot hope to achieve any better for blind exhaustive search!

In practice, depth-first search is only implemented as above if loop detection is an absolute must. Otherwise, the agenda is represented implicitly by means of Prolog’s internal goal stack. Children of a given node are generated one at a time, by means of Prolog’s backtracking mechanism, and examined immediately upon generation:

% depth-first search by means of backtracking



If there is a chance that the search program gets trapped in an infinite loop, it might be a good idea to employ a predefined depth bound:

% backtracking depth-first search with depth bound


D>0, D1 is D-1,

In this way the search process is guaranteed to halt, but solutions which appear beyond the depth bound are missed.

Iterative deepening is a form of depth-first search which employs a depth bound that is increased on each iteration. That is, after performing a depth-first search with depth bound d, search starts all over again from the starting nodes with an increased depth bound d + n. The predicate search_id/2 implements iterative deepening for n =1.

% iterative deepening

search_id(1,First,Goal).     % start with depth 1


D1 is D+1,                   % increase depth

% an example of iterative deepening
	search_id(1,First,Goal).     % start with depth 1

	D1 is D+1,                   % increase depth

% backtracking depth-first search with depth bound
	D>0, D1 is D-1,


% nodes are lists of letters
	%length(T,N),N<11,	% depth bound no longer needed

% find palindromes

A big advantage of iterative deepening over simple depth-first search is that iterative deepening is complete: it will find all the goals at depth d and less before proceeding to depth d + n. Moreover, if we set the depth increment n to 1, iterative deepening is also optimal: it will find shorter paths first. A disadvantage of iterative deepening is that upper parts of the search space are searched more than once (and goals in those upper parts are found more than once as well).

Breadth-first search is realised by implementing the agenda as a first in–first out datastructure. That is, while removing nodes from the front of the list, they are added at the end:

% breadth-first search



% an example of breadth-first search


% nodes are lists of letters
	length(T,N),N<11,	% try removing the depth bound

% find palindromes

Exercise 5.2. Implement the predicate term_write_bf/1, which writes the tree represented by a term from the root downward (as opposed to the predicate term_write/1 of section 4.1, which writes from left to right). Employ breadth-first search with two agendas, one for nodes at depth n and the other for nodes at depth n +1.

In breadth-first search, the agenda is implemented as a queue. This means that the nodes on the agenda are ordered according to increasing depth: all the nodes on depth n occur before the nodes on depth n +1. This has profound consequences with regard to the properties of breadth-first search. First of all, breadth-first search is complete, even for infinite search spaces. This is so because every goal on depth n will be found before descending to depth n +1. Secondly, breadth-first search always finds a shortest solution path. It may seem that breadth-first search is much better than depth-first search. However, like every coin this one has a reverse side also: the number of nodes at depth n is B n , such that breadth-first search requires much more memory than depth-first search.

We will now show how to change Prolog into a complete SLD prover, by employing breadth-first search. We start from the meta-interpreter prove_r/1 given in section 3.8:




As explained in that section, this meta-interpreter operates on the complete resolvent, which is exactly what we need. This predicate is turned into an agenda-based depth-first search procedure as follows:

% agenda-based version of prove_r/1





The changes are relatively straightforward: all solutions to the calls in the bodies of the second and third prove_r clauses are collected by means of the predicate findall/3, and added to the front of the agenda.

In order to search in a breadth-first fashion, we swap the first two arguments of the append/3 literals. One additional improvement is required, since prove_df/1 succeeds for every proof that can be found, but it does not return an answer substitution for the variables in the query. This is because the call findall(X,G,L) creates new variables for the unbound variables in the instantiation of X before putting it in the list L. In order to obtain an answer substitution, we should maintain the agenda as a list of pairs


where OrigGoal is a copy of the original goal. To illustrate this, suppose the following clauses are given:


Below, the agenda obtained after each breadth-first search iteration is given for the query ?‑likes(X,Y):

[ a((student(Y1),friendly(Y1)), likes(peter,Y1)),
 a(friend(Y2,X2), likes(X2,Y2)) ]

[ a(friend(Y2,X2), likes(X2,Y2))
 a(friendly(maria), likes(peter,maria)),
 a(friendly(paul), likes(peter,paul)) ]

[ a(friendly(maria), likes(peter,maria)),
 a(friendly(paul), likes(peter,paul)),
 a(true, likes(peter,paul)) ]

[ a(friendly(paul), likes(peter,paul)),
 a(true, likes(peter,paul)),
 a(true, likes(peter,maria)) ]

[ a(true, likes(peter,paul)),
 a(true, likes(peter,maria)) ]

Here, Y1, X2 and Y2 denote new variables introduced by findall/3. It can be clearly seen that for each item a(R,G) on the agenda, R and G share the right variables — thus, whenever the resolvent gets more instantiated during the proof, the corresponding copy of the goal is instantiated correspondingly. In particular, if the empty clause is found on the agenda in the form of a term a(true,Goal), then Goal will contain the correct answer substitutions.

The final, complete SLD prover looks as follows:

% breadth-first version of prove_r/1 + answer substitution



append(Agenda,Children,NewAgenda),% breadth-first

append(Agenda,Children,NewAgenda),% breadth-first

Notice that this program is able to find alternative solutions, since it will backtrack from the first clause into the third and, being unable to find a clause for the predicate true/0, findall/3 will generate an empty list of children and search will proceed with the rest of the agenda.

Exercise 5.3. Consider the following program:
Compare and explain the behaviour of prove_bf/1 and Prolog on the query ?‑brother(peter,adrian). Can you re-order the clauses, such that Prolog succeeds?

As a second, related example of a breadth-first search program, we give a program for finding refutation proofs in full clausal logic. Object-level clauses are given by the predicate cl/1. Note that true denotes the empty body, while false denotes the empty head; thus, false:-true denotes the empty clause.

% refute_bf(Clause) <- Clause is refuted by clauses
%                     defined by cl/1
%                     (breadth-first search strategy)



append(Rest,Children,NewAgenda),  % breadth-first

% resolve(C1,C2,R) <- R is the resolvent of C1 and C2.






%%% disj_append/3, conj_remove_one/3: see Appendix A.2

For instance, given the following clauses:


and the query ?-refute_bf((false:-bachelor(X))) (refute that no-one is a bachelor), the program answers X=paul. Note that there are many proofs for this answer!

Exercise 5.4. Extend the meta-interpreter, such that it returns a proof tree (see section 3.8). In order to ensure correct variable substitutions, each item on the agenda must be extended with a partial proof tree.

As a search program, the above program is complete. As a theorem prover, however, the program is incomplete. This is due to the resolution strategy used, in which every resolvent has at least one given clause as its parent. This strategy is called input resolution; it is refutation complete for definite clauses, but not for indefinite clauses.

5.4   Forward chaining

Search programs involving if-then rules, such as meta-interpreters and theorem provers, can use these rules in either of two directions: from body to head or forward, and from head to body or backward. The meta-interpreters we encountered up till now apply clauses backward, just like Prolog; they are said to perform backward chaining. For checking if a given formula follows logically from a given theory, this is usually the best strategy.

However, in some cases we must rather perform forward chaining, because we do not have a goal to start from. For instance, consider the problem of constructing a model of a given theory. It would not be feasible to generate all the ground atoms in the Herbrand base and follow the chains back to the theory. Rather, we would generate the model incrementally by forward chaining. The procedure is as follows:

(i)   search for a violated clause of which the body is true in the current model, but the head is not (such a clause is said to fire);

(ii)  add a literal from the head to the model [16] .

By step (ii), the head (a disjunction) is made true in the model, so that this clause is no longer violated. The procedure iterates back to step (i); if no violated clauses remain, the model is complete.

The program for model generation by forward chaining is given below. It is a fairly simple forward chainer, in the sense that it simply chooses the first clause which fires. More sophisticated forward chainers use conflict resolution strategies in order to choose among the rules which fire at a certain stage.

% model(M) <- M is a model of the clauses defined by cl/1

	is_violated(Head,M0),!,% instance of violated clause
	disj_element(L,Head),  % L: ground literal from head
	model([L|M0],M).       % add L to the model
model(M,M).                 % no more violated clauses

	satisfied_body(B,M),   % grounds the variables
	\+ satisfied_head(H,M).

satisfied_body(true,_M).     % body is a conjunction

satisfied_head(A,M):-       % head is a disjunction

%%% from Appendix A.2
disj_element(X,X):-          % single-element disjunction

%%% Example disjunctive clauses to compute model for

Given the following clauses:


and the query ?-model(M), the program constructs the following models (on backtracking):

M = [has_wife(paul),married(paul),adult(paul),man(paul)];

M = [bachelor(paul),adult(paul),man(paul)]

Notice that these are the two minimal models of the program.

Exercise 5.5. Give the remaining models of the program.

Not every model generated by model/1 is minimal. Consider the following set of clauses:


is_violated/2 will first succeed for the third clause, returning the instantiated head teacher(peter);friendly(maria). The first literal in this head will be added to the model. Next, the fourth clause is violated, and friendly(maria) is added to the model. This results in the following model:


However, this is not a minimal model since teacher(peter) can be removed from it, yielding the model


which will be returned as the second answer.

Exercise 5.6. Are all minimal models always constructed by model/1?

It should be noted that the program only works properly for a restricted class of clauses, namely those clauses for which grounding the body also grounds the head. Otherwise, a head literal from a violated clause might still contain variables. Adding a non-ground literal to the model could result in incorrect behaviour. Consider the following set of clauses:


Since the first clause is violated by the empty model, the program will attempt to add man(X) to the model. This leads to the second clause being violated, and since this clause has an empty head, it cannot be satisfied by adding a literal to the model. Upon backtracking woman(X) is tried instead, but this leads to a similar problem with the third clause. Consequently, model/1 will fail to construct a model, although there exists one, namely { man(peter), woman(maria) }.

The solution is to add a literal to the body of the first clause, which serves to enumerate the possible values for X:


In this way, the first clause is violated only under the substitutions { X peter } and { X maria }. Thus, all literals which are added to the model are ground, and the program constructs the correct model


Clauses of which all variables in the head occur also in the body are called range-restricted. Every set of clauses can be transformed into a set of range-restricted clauses by adding domain predicates enumerating the domains of variables, as above. The two sets of clauses are equivalent in the sense that there exists a one-to-one correspondence between their models:

•     any model of the original clauses provides an enumeration of all the domains;

•     any model of the range-restricted clauses can be transformed to a model of the original clauses by dropping the domain literals.

Obviously, model/1 loops if the model being constructed is infinite. This will happen, for instance, with the following set of clauses, representing a range-restricted version of the append predicate:


Instead of the complete, infinite model, we might be interested in a subset over a universe of lists up to a given length. Such a ‘submodel’ can be computed by a forward chaining procedure which stops after a prespecified number of steps. In this way, the procedure gets more of a ‘breadth-first’ flavour. The program is given below:

% model_d(D,M) <- M is a submodel of the clauses
%                defined by cl/1



D>0,D1 is D-1,



model/1 is replaced by model_d/2, which has an additional depth parameter. On each iteration, all the violated clauses are generated and satisfied.

Below, we illustrate the operation of the program on the above set of clauses, setting the depth to 4:


M = [list([a,c,a]), list([a,c,b]), list([a,c,c]),% D=4 %
list([a,b,a]), list([a,b,b]), list([a,b,c]),
list([a,a,a]), list([a,a,b]), list([a,a,c]),
list([b,c,a]), list([b,c,b]), list([b,c,c]),
list([b,b,a]), list([b,b,b]), list([b,b,c]),
list([b,a,a]), list([b,a,b]), list([b,a,c]),
list([c,c,a]), list([c,c,b]), list([c,c,c]),
list([c,b,a]), list([c,b,b]), list([c,b,c]),
list([c,a,a]), list([c,a,b]), list([c,a,c]),
append([a],[a],[a,a]), append([a],[b],[a,b]),
append([a],[c],[a,c]), append([a,c],[],[a,c]),
append([a,b],[],[a,b]), append([a,a],[],[a,a]),
append([b],[a],[b,a]), append([b],[b],[b,b]),
append([b],[c],[b,c]), append([b,c],[],[b,c]),
append([b,b],[],[b,b]), append([b,a],[],[b,a]),
append([c],[a],[c,a]), append([c],[b],[c,b]),
append([c],[c],[c,c]), append([c,c],[],[c,c]),
append([c,b],[],[c,b]), append([c,a],[],[c,a]),
append([],[c,a],[c,a]), append([],[c,b],[c,b]),
append([],[c,c],[c,c]), append([],[b,a],[b,a]),
append([],[b,b],[b,b]), append([],[b,c],[b,c]),
append([],[a,a],[a,a]), append([],[a,b],[a,b]),
list([a,c]), list([a,b]), list([a,a]),      % D=3 %
list([b,c]), list([b,b]), list([b,a]),
list([c,c]), list([c,b]), list([c,a]),
append([a],[],[a]), append([b],[],[b]),
append([c],[],[c]), append([],[c],[c]),
append([],[b],[b]), append([],[a],[a]),
list([a]), list([b]), list([c]),            % D=2 %
thing(c), thing(b), thing(a),               % D=1 %
list([]) ]

At depth 1, only domain clauses are satisfied; at depth 2 the first append literal appears. Depths 3 and 4 add list literals for all lists of length 2 and 3, and append literals for all lists of length 1 and 2, respectively.

Further reading

Korf (1987) gives a comprehensive overview of search methods in Artificial Intelligence. He is also the originator of the iterative deepening search strategy (Korf, 1985). The model generation program in section 5.4 is adapted from (Manthey & Bry, 1988).

R.E. Korf ( 1985) , ‘Depth-first iterative deepening: an optimal admissible tree search’, Artificial Intelligence 27: 97-109.

R.E. Korf ( 1987) , ‘Search’. In Encyclopedia of Artificial Intelligence, S.C. Shapiro (ed.), pp. 994-998, John Wiley.

R. Manthey & F. Bry ( 1988) , ‘SATCHMO: a theorem prover implemented in Prolog’. In E. Lusk & R. Overbeek (eds), Proc. 9th International Conference on Automated Deduction, Lecture Notes in Computer Science 310, pp. 415-434, Springer‑Verlag.

The search strategies of the previous chapter do not make any assumptions about the plausibility of a certain node in the search space leading to a goal. Such a form of search is called blind search. Alternatively, search strategies which do make such assumptions are called informed search strategies. The extra information which is incorporated in the search process is provided by an evaluation function h called a heuristic, which estimates how far a given node is from a goal. This information can be used in several ways. If we use it to order the nodes on the agenda, such that most promising nodes are tried first, the resulting search method is called best-first search. In section 6.2, we will discuss a complete variant of best-first search called the A algorithm, and investigate the conditions under which this algorithm is optimal. In section 6.3, we will discuss non-exhaustive informed search strategies, that can be derived from best-first search by limiting the size of the agenda.

We will assume that a predicate eval/2 is defined, which returns for a given node in the search space an estimate of the distance between that node and a goal node. The children of the current node are added to the agenda according to their heuristic evaluation (lowest values first). Thus, the agenda will always be sorted.

% best-first search
% goal/1, children/2 and eval/2 depend on
% the search problem at hand



% add_bstf(A,B,C) <- C contains the elements of A and B
%                   (B and C sorted according to eval/2)



% add_one(S,A,B) <- B is A with S inserted acc. to eval/2





add_bstf/3 operates by inserting the new children one by one in the current agenda. Note that if the list of children were already sorted, it could more efficiently be merged with the current agenda.

Exercise 6.1. Suppose the call children(Current,Children) results in an ordered list of children. Write a predicate merge/3 which directly merges this list with the current agenda.

Figure 6.1. Initial board position.

As an application of best-first search, consider the following puzzle. We have a board consisting of seven consecutive squares, three black tiles and three white tiles, initially placed on the board as in fig. 6.1. The goal is to move the tiles in such a way that the black tiles are to the right of the white tiles (the position of the empty square is immaterial). Each move consists of moving one tile into the empty square, which is allowed if there are at most two other tiles in between. The cost of such a move is 1 if there are no tiles in between, and equals the number of tiles jumped over otherwise.

This puzzle defines a search space, in which nodes are board positions and arcs are single moves. We choose a simple list representation for board positions: e.g. [b,b,b,e,w,w,w] represents the starting position of fig. 6.1. The following predicates examine and manipulate board positions:

% get_tile(P,N,T) <- pos. P contains tile T at square N

	N1 is N0+1,

% replace(P,N,T,P1) <- P1 is P with tile T at square N
	N>1, N1 is N-1,

When not to use lists

Recall (section 1.3) that [b,b,b,e,w,w,w] is an alternative notation for the term .(b,.(b,.(b,.(e,.(w,.(w,.(w,[]))))))). This term contains, besides the seven constants in the linear notation, one additional constant (‘ [] ’) and seven functors (‘ . ’), each with two arguments. In contrast, a ‘flat’ term p(b,b,b,e,w,w,w) contains only one additional functor, with seven arguments. Recursive datastructures like lists are useful if the number of items to be stored is not fixed, but they require significantly more storage space. In general, if the number of items is fixed, a non-recursive datastructure is preferred as far as memory is concerned. Given a term T holding the items, the call arg(N,T,A) retrieves the N th argument A. However, arg/3 requires N to be instantiated, and cannot be used to generate all arguments on backtracking. Therefore, lists are sometimes used even if the nature of the data is non-recursive.

We use the above best-first search procedure, with a number of changes. First, rather than returning the goal position found, the program should construct a sequence of moves by which the goal position is reached. Therefore, nodes that are examined during the search process are collected in the list Visited. After a goal position has been found, the solution path and its total cost are reconstructed from the list Visited by means of the predicate construct_moves/6.

Secondly, the items on the agenda are represented as pairs v(Value,Move), where Value is the heuristic evaluation of the position reached by Move. Children of the current position are generated by means of the setof/3 predicate, which yields a sorted list. By putting the heuristic Value as the first argument of the functor v, the list Children is therefore sorted according to increasing heuristic value. Therefore, this list can be simply merged with the current agenda to yield the new agenda. The program thus looks as follows:

% tiles(M,C) <- moves M lead to a goal position at cost C
%              (best-first search strategy)


% tiles_a(A,M,V0,V) <- goal position can be reached from

%                     one of the positions on A with last

%                     move M (best-first strategy)


      ( move(LastMove,NextMove),
        eval(NextMove,Value) ),
merge(Children,Rest,NewAgenda),   % best-first

%%% merge/3: see exercise 6.1

setof0/3 is a variant of setof/3 which succeeds with the empty list if no solutions can be found (see Appendix A.2).

A move from OldPos to NewPos is represented by a triple


where Cost specifies the cost of the move. According to the principle of data abstraction, this representation is kept local to the following predicates:

% move(m(X,P,Y),m(P,NP,C)) <- position NP can be reached
%                            from position P in one move
%                            at cost C

Diff is abs(Ne-Nbw),Diff<4,
( Diff=1   -> Cost=1
; otherwise -> Cost is Diff-1 ).


% reconstruct total cost and path from visited nodes


element(m(GP,P,C1),Visited), % GP is parent of P
Cost1 is Cost0+C,


Finally, we have to choose a heuristic evaluation function. A first idea is to count, for each white tile, the number of black tiles to the left of it:




Note that this program actually counts the number of solutions to the query


by determining the length of the list that is returned by the second-order predicate findall/3.

Exercise 6.2. Rewrite bLeftOfw/2 such that it uses only first-order predicates.

The program writes every move it considers on the screen, together with its heuristic evaluation. For instance, the query

%%% all above code taken together %%%


results in the following output:



M = [[b,b,b,e,w,w,w],[b,b,b,w,e,w,w],[b,b,e,w,b,w,w],

C = 15

Since the only moves that are considered are those that are on the final solution path, there is no backtracking. This seems to suggest that the heuristic works quite well. On the other hand, the first few moves seem a bit awkward: in particular, the first and the fourth move are relatively expensive.

Let’s try another heuristic, which counts the number of tiles out of place: a wrong tile on the first or seventh square gives 3, on the second or sixth square 2, and on the third or fifth square 1.



K<8, K1 is K+1,
( K<4,get_tile(Pos,K,b) -> N1 is N0-(K-4)
; K>4,get_tile(Pos,K,w) -> N1 is N0+(K-4)
; otherwise -> N1=N0 ),

We get the following result:


M = [[b,b,b,e,w,w,w],[b,b,b,w,w,e,w],[b,b,e,w,w,b,w],

C = 14

We observe a couple of differences with the previous heuristic. First of all, there is backtracking: the first, second and fourth moves are not pursued any further. Furthermore, the solution found requires two moves less, and is also cheaper.

Figure 6.2. Solutions found for different heuristics.

This improvement seems to suggest that an increased punishment for wrongly placed tiles might lead to an even cheaper solution. For instance, we could increase the punishment to 4, 3 and 2, respectively, by adapting the predicate outOfPlace/4 (try it!). This leads to the following sequence of moves:


M = [[b,b,b,e,w,w,w],[b,b,b,w,w,e,w],[b,b,e,w,w,b,w],

C = 15

Obviously, this heuristic works no better than the previous two: it does not find an optimal solution, and it investigates more moves than the first heuristic. In fig. 6.2, the solutions found by the three heuristics are compared. In the next section, we will investigate the conditions under which a heuristic is guaranteed to find an optimal solution.

Best-first search is an exhaustive search strategy, with a possible behaviour ranging from depth-first search to breadth-first search, depending on the heuristic used. By itself, best-first search is not complete: the heuristic might consistently assign lower values to the nodes on an infinite path. This is because the heuristic evaluation only takes into account an estimate of the distance to a goal, while we are actually interested in minimising the total cost of reaching a goal along a particular path. In order to obtain a complete best-first search algorithm, we use an evaluation function f consisting of two components:

f (n) = g (n) + h (n)

Here, h (n) is the heuristic estimate of the cost of reaching a goal node from node n, as it was introduced before. g (n) is the actual cost of reaching n from the starting node. Their sum f (n) is used to order the nodes on the agenda.

A best-first search algorithm which uses such an evaluation function f to estimate the total cost of a path is called an A algorithm. An A algorithm is complete, since the depth count g (n) will prevent search from getting trapped in an infinite path. In effect, the depth count will give the search strategy more of a breadth-first flavour. Indeed, breadth-first search is a special case of an A algorithm, with h (n)=0 for every node n. A disadvantage of A algorithms is the decreased efficiency associated with this breadth-first flavour.

Exercise 6.3. Change the tiles program into an A algorithm, by associating with each move the g -value of the position reached by that move (i.e. the cost of the path leading to that position, instead of the cost of the last move). Demonstrate the decreased efficiency of the search.

Breadth-first search is not only complete, it is also optimal: it always returns a shortest solution path [17] . Do A algorithms inherit this property from breadth-first search? Obviously, this depends on the function h: if a node n 1  on the cheapest path gets an h -estimate that is too high, other nodes will be tried instead, and a solution along a non-optimal path may be found first. We say that the heuristic was too pessimistic regarding n 1 . Conversely, a heuristic which never assigns a value to a node that is higher than the actual cost of reaching a goal state from that node is called optimistic.

For instance, consider the first heuristic for the puzzle in the previous section, which counts for each white tile the number of black tiles to the left of it. Suppose one black tile has w white tiles to its right, which adds w to the heuristic value for that position. In order to reach a goal position, the black tile has to jump over some of the white tiles, while the remaining white tiles have to jump over the black tile; this has a cost of at least w. Therefore, this heuristic is optimistic. The second heuristic, calculating a weighted sum of tiles out of place, is also optimistic. For instance, suppose that a black tile is at the first square, then there are three white tiles to its right, over which it must jump. Analogously, if it is on the second square, then there are at least two white tiles to jump over. In contrast, the weights used in the third heuristic are too high.

Exercise 6.4. Find a position for which the third heuristic is too pessimistic.

It is possible to prove the following important result: an A algorithm with an optimistic heuristic h always results in an optimal solution. The resulting algorithm is called A* (A star); both A* search and optimistic heuristics are said to be admissible. This should not be mistaken to suggest that better heuristics are more optimistic! On the contrary, a good heuristic is as pessimistic as possible, without becoming non-admissible. In general, if h 1 (n)≥ h 2 (n) for any node n, then we call heuristic h 1  at least as informed as h 2 . It can be shown that a more informed heuristic indeed searches a smaller part of the search space.

As a small example, consider the search space in fig. 6.3. The h -values for each node are as indicated; the cost per arc is 1. The heuristic is optimistic, so A* search will return the shortest path start-r-s-goal. However, this path is not found immediately: since both p and q have a lower f -value than r, they are investigated first. After q has been investigated, s is put on the agenda with f -value 3+1=4. Since r has a lower f -value of 3, it is the next one to be investigated. Now s will again be added to the agenda, this time with f -value 2+1=3! In fact, it is this latter s which, being on the optimal path, leads to the goal.

Thus, although admissible search leads to optimal solutions, it is not necessarily the case that every node on an optimal path is immediately reached along that optimal path. In fig. 6.3, this is caused by ‘local pessimism’ of the heuristic, which estimates the cost of moving from start to p as 3–1=2, while the actual cost is 1. Indeed, if p would have an h -value of 2, s would have been reached the first time along the shortest path. This is true in general: if the heuristic estimates the cost of moving from one node to another optimistically, then any node is reached along the cheapest path first. This property is called monotonicity, since one can show that the f -values are monotonically non-decreasing along a path.

Figure 6.3. A heuristic which is not monotonic.

The first heuristic of the previous section is monotonic, while the second is not. This can be concluded from the following two evaluations:


The heuristic estimates the cost of this move as 9–7=2, while the actual cost is 1. Since monotonicity implies admissibility, the third heuristic is not monotonic either.

Exercise 6.5. Implement a Prolog meta-interpreter which employs an A search algorithm. Use h (R)= | R | (the number of literals in resolvent R) as heuristic. Is this heuristic admissible and monotonic?

The search strategies discussed until now are all exhaustive: they will all search the complete search space in the worst case. This is so because all children of a certain node will be put on the agenda, in some order. Exhaustive search is often impractical, since the size of the agenda grows exponentially with the search depth. The use of a heuristic offers the possibility of keeping only a selection of best nodes on the agenda. Such non-exhaustive search strategies are, of course, not guaranteed to be complete, and should only be applied in combination with a reasonably informed heuristic.

Beam search is a form of best-first search in which the number of nodes on the agenda is limited. In its most simple form, the agenda is of fixed size. Alternatively, one could allow the agenda to grow polynomially (instead of exponentially) with the search depth. The effect of this strategy is, that only a ‘beam’ of the search space is searched:


D1 is D+1,



In this program, two agendas are maintained, one for the current level, and one for the children of the nodes on the current level. Once the current level is exhausted, the agenda’s are swapped and the depth count is increased. The depth count is passed on to the predicate add_beam/4, in order to decide how many children to add to the agenda for the next level.

Exercise 6.6. Extend the program of exercise 6.3 with beam search with fixed agenda size. Demonstrate the non-optimality of the search strategy.

If we limit the size of the agenda to 1, we arrive at a search strategy called hill-climbing. It is also callled greedy search, since there is no backtrackinginvolved. Hill-climbing is the type of search employed by a wanderer who wants to reach the top of a hill by always moving in the steepest direction. Clearly, she will reach the top of a hill (and never get off it), but it is not necessarily the highest one.

The predicate search_hc/2 below implements a hill-climbing search strategy. Instead of maintaining an agenda of nodes yet to be investigated, it maintains a single node in its first argument. Therefore, hill-climbing has some similarity with depth-first search with implicit agenda:



The predicate select_best/2 selects the best child of the current node, according to the heuristic value to be optimised. To stress that backtracking is not needed after the best child has been selected, one can place a cut before the recursive call in the second clause.

Further reading

Nilsson (1980) gives a gentle introduction to the use of heuristics and their properties. (Pearl, 1984) is the main source for mathematical results on heuristics. The sliding tiles puzzle was taken from (Luger & Stubblefield, 1993).

G.F. Luger & W.A. Stubblefield ( 1993) , Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Benjamin/Cummings, second edition.

N.J. Nilsson ( 1980) , Principles of Artificial Intelligence, Tioga Press.

J. Pearl (1984), Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison-Wesley.

[12] The remaining disk on A can safely be ignored, since it is the largest.

[13] Such a program should perform breadth-first search; see Exercise 5.2.

[14] From now on, we denote a Predicate with Arity as Predicate/Arity. This is because predicates with different arity are different predicates, even if they share the same predicate name.

[15] Here, we refer to the resultant path, ignoring backtracking.

[16] We will assume for the moment that the head literals are ground by the substitution which makes the body true; a more detailed discussion follows below.

[17] If arcs can have different costs, breadth-first search does not necessarily return the cheapest solution path.

Simply Logical

Peter Flach,
University of Bristol

Intelligent Reasoning by Example

This book discusses methods to implement intelligent reasoning by means of Prolog programs. The book is written from the shared viewpoints of Computational Logic, which aims at automating various kinds of reasoning, and Artificial Intelligence, which seeks to implement aspects of intelligent behaviour on a computer.

Read more »