Heuristic function overestimates cost, leading to suboptimal plans. #9

steadmon · 2017-06-16T04:06:56Z

A* with a closed set can select suboptimal plans if the heuristic function overestimates the remaining cost of an intermediate state. The current heuristic of counting the conflicting states can be an overestimate in some cases. Consider the following setup:

Six state values: a, b, c, d, e, f
Initial state: all six values are False
Goal state: all six values are True
Four actions:

"setA", no preconditions, sets a:=True, cost of 2
"setAB", no preconditions, sets a:=True and b:=True, cost of 1
"optimal", precondition a==True and b==False, sets all six state values to True, cost of 1
"trap", precondition a==True and b==False, sets all six state values to True, cost of 5

The optimal plan is "setA" followed by "optimal", with a total cost of 3. However, with the current heuristic, the code actually selects a plan of "setAB" followed by "trap", with a total cost of 6. This is because the heuristic overestimates the remaining cost after the action "setA" as having a likely remaining cost of 5, when in fact the remaining cost is only 1.

This can be fixed by changing the heuristic function as follows:

After all actions and costs are known, find the action with the minimum (cost / number of output states) ratio. Save this optimal ratio, and pass it as an additional argument to the heuristic function. The heuristic function should then return the number of conflicting states multiplied by the cost/output ratio.

stolk · 2017-06-16T23:35:43Z

Good point.
Thanks for pointing this out.

Sorrien · 2018-09-09T15:21:13Z

Has this been resolved?

stolk · 2018-09-09T18:06:03Z

This issue is still present in the current code base.

You could follow steadmon's approach, or for an even quicker fix: returning a constant value 1 (or better: the lowest cost of all actions) for heuristic will also get rid of suboptimal paths.

To thoroughly fix, you would have to determine lowest cost of flipping on a

per atom basis
separately for both flip directions: 0->1 and 1->0.

Then dynamically, add this cost for each differing atom value based on direction.

This will slow down the calculation of the heuristic of course, so if you can live with occasional sub optimal paths, the current code may be better.

Note: If you decide to go steadmon's route, you will need to use floating point values for the heuristic.

kampeador · 2022-10-14T15:56:48Z

#include <stdio.h>
#include <assert.h>

#include "goap.h"
#include "astar.h"

void check_goap(bool err)
{
    assert(err == true);
}

int main(int argc, char* argv[]) {
    actionplanner_t ap;
    goap_actionplanner_clear(&ap);

    // intial state
    worldstate_t ws_initial;
    goap_worldstate_clear(&ws_initial);
    check_goap(goap_worldstate_set(&ap, &ws_initial, "a", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "b", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "c", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "d", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "e", false));
    check_goap(goap_worldstate_set(&ap, &ws_initial, "f", false));

    // goal state
    worldstate_t ws_goal;
    goap_worldstate_clear(&ws_goal);
    check_goap(goap_worldstate_set(&ap, &ws_goal, "a", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "b", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "c", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "d", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "e", true));
    check_goap(goap_worldstate_set(&ap, &ws_goal, "f", true));

    // actions
    // "setA", no preconditions, sets a:=True, cost of 2
    goap_set_pst(&ap, "setA", "a", true);
    goap_set_cost(&ap, "setA", 2);

    // "setAB", no preconditions, sets a:=True and b:=True, cost of 1
    goap_set_pst(&ap, "setAB", "a", true);
    goap_set_pst(&ap, "setAB", "b", true);
    goap_set_cost(&ap, "setAB", 1);

    // "optimal", precondition a==True and b==False, sets all six state values to True, cost of 1
    goap_set_pre(&ap, "optimal", "a", true);
    goap_set_pre(&ap, "optimal", "b", false);
    goap_set_pst(&ap, "optimal", "a", true);
    goap_set_pst(&ap, "optimal", "b", true);
    goap_set_pst(&ap, "optimal", "c", true);
    goap_set_pst(&ap, "optimal", "d", true);
    goap_set_pst(&ap, "optimal", "e", true);
    goap_set_pst(&ap, "optimal", "f", true);
    goap_set_cost(&ap, "optimal", 1);

    // "trap", precondition a==True and b==False, sets all six state values to True, cost of 5
    goap_set_pre(&ap, "trap", "a", true);
    goap_set_pre(&ap, "trap", "b", false);
    goap_set_pst(&ap, "trap", "a", true);
    goap_set_pst(&ap, "trap", "b", true);
    goap_set_pst(&ap, "trap", "c", true);
    goap_set_pst(&ap, "trap", "d", true);
    goap_set_pst(&ap, "trap", "e", true);
    goap_set_pst(&ap, "trap", "f", true);
    goap_set_cost(&ap, "trap", 5);


    char desc[4096];
    worldstate_t states[16];
    const char* plan[16];
    int plansz = 16;
    const int plancost = astar_plan(&ap, ws_initial, ws_goal, plan, states, &plansz);
    LOGI( "plancost = %d", plancost );
    goap_worldstate_description(&ap, &ws_initial, desc, sizeof(desc));
    LOGI( "%-23s%s", "", desc );
    for (int i = 0; i < plansz && i < 16; ++i)
    {
        goap_worldstate_description(&ap, states+i, desc, sizeof(desc));
        LOGI("%d: %-20s%s", i, plan[i], desc);
    }

    puts("Exit success");
    fflush(stdout);

    return 0;
}

Output:

plancost = 3
                       a,b,c,d,e,f,
0: setA                A,b,c,d,e,f,
1: optimal             A,B,C,D,E,F,

Am I missing something?

stolk · 2022-10-14T16:11:17Z

Maybe OP's example was not a good one, but it is a valid point that overestimating the remaining cost can cause suboptimal plans.

Personally, I don't worry much about those cases. In practice, they probably don't occur often, and if they do, you still end up with a working plan.

If you are hell-bent on always getting the best plan, you can make the search un-directed, and always put the estimate at '1'. But then you get a combinatorial explosion.

Playing with the heuristic is left as an exercise for the reader. Use what works for you. What's in the repo now works for me.

kampeador · 2022-10-14T16:53:47Z

It would be nice if issues like this one had practical unit test coverage, rather than theoretical.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heuristic function overestimates cost, leading to suboptimal plans. #9

Heuristic function overestimates cost, leading to suboptimal plans. #9

steadmon commented Jun 16, 2017

stolk commented Jun 16, 2017

Sorrien commented Sep 9, 2018

stolk commented Sep 9, 2018

kampeador commented Oct 14, 2022

stolk commented Oct 14, 2022

kampeador commented Oct 14, 2022

Heuristic function overestimates cost, leading to suboptimal plans. #9

Heuristic function overestimates cost, leading to suboptimal plans. #9

Comments

steadmon commented Jun 16, 2017

stolk commented Jun 16, 2017

Sorrien commented Sep 9, 2018

stolk commented Sep 9, 2018

kampeador commented Oct 14, 2022

stolk commented Oct 14, 2022

kampeador commented Oct 14, 2022