-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587
base: master
Are you sure you want to change the base?
Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587
Conversation
Hello, |
I'am not sure whether this is breaking change or not. |
This is a breaking change, and I would change DDPG/SAC/TD3 for consistency then so we can fix #526 EDIT: layers should be [] by default in the case of a CNN and [64, 64] to match previous behavior. |
@araffin |
I agree with @araffin : We should change DDPG/SAC/TD3 to match the behavior of other policies (DQN, A2C, PPO, etc). I.e. if |
@araffin @Miffyli |
I do not understand what you mean "you have to allow adding layers after both advantage and value stream". Do you refer to how You can add layers after |
@Miffyli In cnn policy, the network would be composed of 4 parts. Defining the scope of cnn-extractor has some consequence for this dueling network what I've faced recently. As you can see from the code below, (1)-(3)-(4) is considered to be main network and (2) is an optional sub-network in dueling. And layers are appended to only V(s) in current implementation which is consistent to the conclusion so far (i.e. cnn extractor have to describe whole network so that layers are not appended to A(s,a) in main network). I guess (and I can see from my own test case) that A(s,a) needs to be more complex (or at least equal) function than V(s). stable-baselines/stable_baselines/deepq/policies.py Lines 103 to 132 in 04c35e1
So, about this question:
Actually, I didn't mean that we have to be able to define V(s), A(s,a) separately. I wanted to point out at least we need to append layers to not only V(s) but also A(s,a) in Dueling DQN. If you are to totally ignore layers/net_arch after cnn_extractor, don't we need some other named parameter to this Dueling DQN? I agree that too much modification would cause more cost but I feel like that current implementation of Dueling DQN is somewhat degenerate structure with respect to CNN policy. (more layers like [256,256,256] does not increase the ability of A(s,a)). Of course we can fully describe the whole dueling network in custom cnn-extractor as you said, but than we cannot utilize this option that already has been implemented. |
Ah yes, I see the issue here. I do not think we should add extra layers only for one branch in the dueling architecture. Best thing is to follow the structure/diagrams of the original paper, where you feed output of CNN extractor to two different fully connected layers (V(s) and A(a, s)) and then do the combination of the two. I.e. As for the other algorithms (DDPG/SAC/TD3): Follow this architecture of either only using PS: Sorry for the long silence. |
For some reason I completely missed the layers you have highlighted in the figure (the 512-unit ones for each head), my bad ^^. I think @araffin |
@Miffyli I don't have much time for that issue right now, I trust you to take the right decision ;) (unless you really want my point of view, in that case, you need to wait some time) |
I think we should deviate from paper a little bit by splitting the 512-unit output from |
Hello, I just ran into this inconsistency while implementing a project this week and we basically copied the code from the FeedForwardPolicy from DQN to adapt the code for our need. I am free now and I can work on this issue if you guys want some help! |
Which issue are you referring here, exactly? The confusion with how net_arch/layers behave with different algoriths, or the stuff related to dueling network architecture? |
@Miffyli Now that I read the issue a second time, I don't think it is worth it to change the behavior of the What I wanted to tackle is the fact that the |
I agree that we should avoid bigger changes to current version. I am confused though: what is it exactly that |
I can come up with three reasons for the change:
As for the usefulness of having this level of control, I will admit I'm not well suited to answer this question as I'm still inexperienced with Deep RL. I think that the use case of dueling architectures for deeper MLP policies would require it, but I don't know how common they are. If you believe it is not worth it, we could at least update the Custom Policies page. |
Sorry for the late reply! Such messy times ^^
But I see the benefit of |
Description
Motivation and Context
closes [Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526
Types of changes
Checklist:
pytest
andpytype
both pass.