6.2. Optimal best-first search#
Best-first search is an exhaustive search strategy, with a possible behaviour ranging from depth-first search to breadth-first search, depending on the heuristic used. By itself, best-first search is not complete: the heuristic might consistently assign lower values to the nodes on an infinite path. This is because the heuristic evaluation only takes into account an estimate of the distance to a goal, while we are actually interested in minimising the total cost of reaching a goal along a particular path. In order to obtain a complete best-first search algorithm, we use an evaluation function \(f\) consisting of two components:
Here, \(h(n)\) is the heuristic estimate of the cost of reaching a goal node from node \(n\), as it was introduced before. \(g(n)\) is the actual cost of reaching \(n\) from the starting node. Their sum \(f(n)\) is used to order the nodes on the agenda.
A best-first search algorithm which uses such an evaluation function \(f\) to estimate the total cost of a path is called an A algorithm. An A algorithm is complete, since the depth count \(g(n)\) will prevent search from getting trapped in an infinite path. In effect, the depth count will give the search strategy more of a breadth-first flavour. Indeed, breadth-first search is a special case of an A algorithm, with \(h(n) = 0\) for every node \(n\). A disadvantage of A algorithms is the decreased efficiency associated with this breadth-first flavour.
Change the tiles
program into an A algorithm, by associating with each move the \(g\)-value of the position reached by that move (i.e. the cost of the path leading to that position, instead of the cost of the last move). Demonstrate the decreased efficiency of the search.
Breadth-first search is not only complete, it is also optimal: it always returns a shortest solution path[1]. Do A algorithms inherit this property from breadth-first search? Obviously, this depends on the function \(h\): if a node \(n_1\) on the cheapest path gets an \(h\)-estimate that is too high, other nodes will be tried instead, and a solution along a non-optimal path may be found first. We say that the heuristic was too pessimistic regarding \(n_1\). Conversely, a heuristic which never assigns a value to a node that is higher than the actual cost of reaching a goal state from that node is called optimistic.
For instance, consider the first heuristic for the puzzle in the previous section, which counts for each white tile the number of black tiles to the left of it. Suppose one black tile has \(w\) white tiles to its right, which adds \(w\) to the heuristic value for that position. In order to reach a goal position, the black tile has to jump over some of the white tiles, while the remaining white tiles have to jump over the black tile; this has a cost of at least \(w\). Therefore, this heuristic is optimistic. The second heuristic, calculating a weighted sum of tiles out of place, is also optimistic. For instance, suppose that a black tile is at the first square, then there are three white tiles to its right, over which it must jump. Analogously, if it is on the second square, then there are at least two white tiles to jump over. In contrast, the weights used in the third heuristic are too high.
It is possible to prove the following important result: an A algorithm with an optimistic heuristic h always results in an optimal solution. The resulting algorithm is called A* (A star); both A* search and optimistic heuristics are said to be admissible. This should not be mistaken to suggest that better heuristics are more optimistic! On the contrary, a good heuristic is as pessimistic as possible, without becoming non-admissible. In general, if \(h_1(n) \geq h_2(n)\) for any node \(n\), then we call heuristic \(h_1\) at least as informed as \(h_2\). It can be shown that a more informed heuristic indeed searches a smaller part of the search space.
As a small example, consider the search space in Figure 6.3. The \(h\)-values for each node are as indicated; the cost per arc is 1. The heuristic is optimistic, so A* search will return the shortest path start-r-s-goal. However, this path is not found immediately: since both \(p\) and \(q\) have a lower \(f\)-value than \(r\), they are investigated first. After \(q\) has been investigated, \(s\) is put on the agenda with \(f\)-value \(3+1=4\). Since \(r\) has a lower \(f\)-value of \(3\), it is the next one to be investigated. Now \(s\) will again be added to the agenda, this time with \(f\)-value \(2+1=3\)! In fact, it is this latter \(s\) which, being on the optimal path, leads to the goal.
Thus, although admissible search leads to optimal solutions, it is not necessarily the case that every node on an optimal path is immediately reached along that optimal path. In Figure 6.3, this is caused by ‘local pessimism’ of the heuristic, which estimates the cost of moving from start to \(p\) as \(3-1=2\), while the actual cost is \(1\). Indeed, if \(p\) would have an \(h\)-value of \(2\), \(s\) would have been reached the first time along the shortest path. This is true in general: if the heuristic estimates the cost of moving from one node to another optimistically, then any node is reached along the cheapest path first. This property is called monotonicity, since one can show that the \(f\)-values are monotonically non-decreasing along a path.
The first heuristic of the previous section is monotonic, while the second is not. This can be concluded from the following two evaluations:
[b,b,e,w,w,b,w]-9
[e,b,b,w,w,b,w]-7
The heuristic estimates the cost of this move as \(9-7=2\), while the actual cost is \(1\). Since monotonicity implies admissibility, the third heuristic is not monotonic either.
Implement a Prolog meta-interpreter which employs an A search algorithm. Use \(h(R) = |R|\) (the number of literals in resolvent \(R\)) as heuristic. Is this heuristic admissible and monotonic?