New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

paper/code conflict: using minimum Q in policy gradient #14

Open

jpreiss opened this issue Aug 16, 2018 · 1 comment

jpreiss commented Aug 16, 2018 •

edited

Loading

The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:

We then use the minimum of the Q-functions for the value gradient in Equation 6 and policy gradient in Equation 13

However, the code in sac/algos/sac.py uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.

Is there a reason for the discrepancy? Thanks!

The text was updated successfully, but these errors were encountered:

Owner

haarnoja commented Aug 24, 2018

Good catch! We actually tried both versions and did not find much difference between them. We'll fix the code in the next release.

pranv mentioned this issue

Policy Loss with Minimum or Q1? pranz24/pytorch-soft-actor-critic#3

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment