Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For SAC-discrete version, is it possible to update model with input of state and action just like Sac-continuous version? #62

Open
dbsxdbsx opened this issue Oct 13, 2020 · 0 comments

Comments

@dbsxdbsx
Copy link

dbsxdbsx commented Oct 13, 2020

Currently, I am trying to merge models for SAC discrete and continous version into just 1 model.
According to SAC discrete critic_model, it only need input state and output action distribution. To make it as consistent with continous one, I modified it with input with both state and action, and only output q-value for the input q(s,a)---just like what happens in continuous version. Also, for the training part, now it is ok to use the same code, without considering action distributions when updating parameter. BUT the modified SAC for discrete actions just doesn't converge!

The code below is some of what I modified to let the discrete version to have similar behavior as that in continuous version,
but as it doesn't converge, I guess whethere it is something wrong with log_prob?

_dist = self.distribution(action_dist) #torch.distributions.Categorical
actions = _dist.sample() 
# modified version
actions = actions.unsqueeze(1)
self.log_prob = torch.log(actions + (actions == 0.0).float() * 1e-8)
# original version        
# z = (action_dist == 0.0).float() * 1e-8
# self.log_prob = torch.log(action_dist + z)
# actions = actions.unsqueeze(1)# add batch dim

I wonder whether it is possible to let SAC-discrete version to update the same way as in sac-continuous?
If it is possible, then it is happy to use almost the same code for both discrete and continuous version--- that is what I want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant