<!--H3: Section 5.2-->
(sec:5.2)=
# Depth-first search #

We obtain a depth-first search strategy if the agenda is implemented as a last in--first out datastructure. The obvious way to do this in Prolog is to represent the agenda by a list of nodes, and to add and remove nodes from the front of the list:
```Prolog
% depth-first search
search_df([Goal|Rest],Goal):-
    goal(Goal).
search_df([Current|Rest],Goal):-
    children(Current,Children),
    append(Children,Rest,NewAgenda),
    search_df(NewAgenda,Goal).
```
The `children/2` predicate finds all children of a given node. If arcs in the search space are defined as before by the `arc/2` predicate, we could define `children/2` as
```Prolog
children(Node,Children):-
    findall(C,arc(Node,C),Children).
```
In this way, all children of the current node are generated and stored on the agenda before examining the next node.
````{tip}
Here is an example of using depth-first search to find palindromes ending in a given series of letters.
(This is just for illustrative purposes, as a much better algorithm would be to extend the given letters with a prefix, reverse the extended string and append the two together.)
Note the depth bound is needed to get any answers at all.
```{swish} swish:search_df
---
source-text-start: search_df-start
---
```
````

+++

This depth-first search program can be refined in several ways, of which we will consider two: returning a path to the goal, and loop detection. In the above implementation, it is impossible to return a path if we discover a goal node on the agenda, because we do not know how that goal node was reached. Instead of putting a single node on the agenda, we will store a complete path to that node. This is simply accomplished by changing the `children/2` predicate as follows:
```Prolog
children([Node|Path],Children):-
    findall([C,Node|Path],arc(Node,C),Children).
```
Of course, the `goal/1` predicate must be changed accordingly, because its argument is now a path instead of a single node. A query now takes the form
```Prolog
?-search_df([[InitialNode]],PathToGoal).
```

+++

The second refinement concerns loop detection. In order to check whether a node has been investigated before, we must maintain a list of visited nodes. We only add nodes to the agenda which do not already occur on this list (or on the agenda):
```Prolog
% depth-first search with loop detection
search_df_loop([Goal|Rest],Visited,Goal):-
    goal(Goal).
search_df_loop([Current|Rest],Visited,Goal):-
    children(Current,Children),
    add_df(Children,Rest,Visited,NewAgenda),
    search_df_loop(NewAgenda,[Current|Visited],Goal).

add_df([],Agenda,Visited,Agenda).
add_df([Child|Rest],OldAgenda,Visited,[Child|NewAgenda]):-
    not element(Child,OldAgenda),
    not element(Child,Visited),
    add_df(Rest,OldAgenda,Visited,NewAgenda).
add_df([Child|Rest],OldAgenda,Visited,NewAgenda):-
    element(Child,OldAgenda),
    add_df(Rest,OldAgenda,Visited,NewAgenda).
add_df([Child|Rest],OldAgenda,Visited,NewAgenda):-
    element(Child,Visited),
    add_df(Rest,OldAgenda,Visited,NewAgenda).
```

+++

Note that the combination of loop detection and path construction allows the following optimisation: instead of maintaining complete paths to a node on the agenda and the list of visited nodes, we only store a node together with its parent. Once we encounter a goal, all its parents are on the list of visited nodes, which allows us to reconstruct the path.

```{exercise} ex:5.1
```

+++

We now analyse depth-first search with respect to completeness, optimality and efficiency. A search strategy is *complete* if it is guaranteed to find every goal. Obviously, any exhaustive strategy is complete for finite search spaces. However, in an infinite search space depth-first search might get trapped in an infinite branch before having found all the solutions. For instance, reconsider the infinite SLD-tree in {numref}`fig:3.2`. A left-to-right depth-first search strategy would dive deeper and deeper into the tree, taking the left branch at every node, and never find the goals in the branches to the right. So, *depth-first search is, in general, incomplete*. Since Prolog itself employs depth-first search, Prolog is also incomplete. Often, however, the incompleteness of Prolog can be avoided by reordering the clauses such that goals are found before infinite branches (for instance, by putting the recursive clause last), and to cut away the infinite parts of the search space.

+++

If there is no cost function, a search strategy is optimal if it is guaranteed to reach any goal along the shortest path possible. The Staatsgalerie example already showed that this is not true for depth-first search: you found your friend but, while she was in a room next to your initial position, you finally reached that room through two other rooms. Thus, *depth-first search does not always find a shortest solution path*. Finally, we can estimate the memory requirements for depth-first search as follows. Suppose we are searching a tree in which each node has, on the average, $B$ children. The number $B$ is known as the *branching factor*. Generating the children of a node adds $B$ nodes to the agenda. We are interested in the following question: if a goal is found at depth $n$ (i.e. the path from the root to the goal has length $n$), how many nodes are there on the agenda? Since at each level only the children of a single node are generated, the size of the agenda is of the order $B \times n$, that is, a linear function of the depth of the tree. The time complexity of depth-first search is of the order $B^n$, since the runtime is proportional to the number of nodes searched, and in the worst case the goal is found in the last branch, after searching $B^n$ nodes. Of course, we cannot hope to achieve any better for blind exhaustive search!

+++

In practice, depth-first search is only implemented as above if loop detection is an absolute must. Otherwise, the agenda is represented *implicitly* by means of Prolog's internal goal stack. Children of a given node are generated one at a time, by means of Prolog's backtracking mechanism, and examined immediately upon generation:
```Prolog
% depth-first search by means of backtracking
search_bt(Goal,Goal):-
    goal(Goal).
search_bt(Current,Goal):-
    arc(Current,Child),
    search_bt(Child,Goal).
```
If there is a chance that the search program gets trapped in an infinite loop, it might be a good idea to employ a predefined *depth bound*:
```Prolog
% backtracking depth-first search with depth bound
search_d(D,Goal,Goal):-
    goal(Goal).
search_d(D,Current,Goal):-
    D>0, D1 is D-1,
    arc(Current,Child),
    search_d(D1,Child,Goal).
```
In this way the search process is guaranteed to halt, but solutions which appear beyond the depth bound are missed.

+++

*Iterative deepening* is a form of depth-first search which employs a depth bound that is increased on each iteration. That is, after performing a depth-first search with depth bound $d$, search starts all over again from the starting nodes with an increased depth bound $d + n$. The predicate `search_id/2` implements iterative deepening for $n = 1$.
```Prolog
% iterative deepening
search_id(First,Goal):-
    search_id(1,First,Goal).     % start with depth 1

search_id(D,Current,Goal):-
    search_d(D,Current,Goal).
search_id(D,Current,Goal):-
    D1 is D+1,                   % increase depth
    search_id(D1,Current,Goal).
```
````{tip}
Here is iterative deepening applied to the palindrome example. Note no depth bound is needed.
Also note that the query just provides the starting node, not an initial agenda, due to the use of backtracking search within iterative deepening (`search_d/3`).
```{swish} swish:search_id
---
source-text-start: search_id-start
---
```
````
A big advantage of iterative deepening over simple depth-first search is that iterative deepening is complete: it will find all the goals at depth $d$ and less before proceeding to depth $d + n$. Moreover, if we set the depth increment $n$ to $1$, iterative deepening is also optimal: it will find shorter paths first. A disadvantage of iterative deepening is that upper parts of the search space are searched more than once (and goals in those upper parts are found more than once as well).