-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPPI #257
base: main
Are you sure you want to change the base?
MPPI #257
Conversation
Take a look at the action plots for the OP3 and Walker examples. Seems to be incorrect. Probably something related to |
Can you add the additional cost term: Should be something like double c = 0.0;
double* u = trajectory[i].actions.data()
double* eps = noise.data() + i * (model->nu * kMaxTrajectoryHorizon)
for (int t = 0; t < horizon - 1; t++) {
for (int j = 0; j < model->nu; j++) {
c += u[t * model->nu + j] * eps[t * model->nu + j] / noise_exploration;
}
}
trajectory[i].total_return += gamma * c; added after each rollout to |
One wrinkle here is that |
Let's add the cost term from the algorithm and default |
But then when |
Let's add the cost term. The user can still set the task control cost terms to zero. |
@alberthli can you update this branch so we can merge? |
I'll put some time into it next week - currently don't have access to my workstation. |
Implements a vanilla version of MPPI (only updates the mean of the policy distribution - the variance is fixed and diagonal). There are no bells and whistles attached to the policy update - the weights are strictly computed using the return over each noisy rollout. There are a number of possible improvements/variations in the literature.
Currently seems extremely sensitive to the temperature parameter
lambda
. If this is too high, then the weight associated with bad trajectories will be too high and the planner will do nothing. If this is too low, then the planner essentially becomes predictive sampling.Features to consider:
nu
[here])