We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:
We then use the minimum of the Q-functions for the value gradient in Equation 6 and policy gradient in Equation 13
However, the code in sac/algos/sac.py uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.
sac/algos/sac.py
Is there a reason for the discrepancy? Thanks!
The text was updated successfully, but these errors were encountered:
Good catch! We actually tried both versions and did not find much difference between them. We'll fix the code in the next release.
Sorry, something went wrong.
No branches or pull requests
The Soft Actor-Critic paper (arXiv v2) says, in the last paragraph on page 5:
However, the code in
sac/algos/sac.py
uses only one of Q functions in the policy gradient loss. It does use the minimum in the value gradient loss.Is there a reason for the discrepancy? Thanks!
The text was updated successfully, but these errors were encountered: