Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587

seheevic · 2019-11-28T06:43:18Z

Description

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)
closes [Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have ensured pytest and pytype both pass.

araffin · 2019-11-28T07:02:53Z

Hello,
thanks for the PR, please fill the PR template completely.

seheevic · 2019-11-28T07:41:36Z

I'am not sure whether this is breaking change or not.
But if this PR is approved, newly created DQN model would be different from the previous version.
And there could be some inconsistency with legacy model.

araffin · 2019-11-28T09:39:01Z

This is a breaking change, and I would change DDPG/SAC/TD3 for consistency then so we can fix #526

EDIT: layers should be [] by default in the case of a CNN and [64, 64] to match previous behavior.

seheevic · 2019-11-28T10:00:54Z

@araffin
Are you saying that instead of this change, you are planning not to append layers after cnn_extractor for DDPG/SAC/TD3?
That would be a feasible option for now, but is it more plausible to guide users to implement/use cnn_extractor as feature extracting(literally) function?
I'm a little confused.

Miffyli · 2019-11-28T12:00:10Z

I agree with @araffin : We should change DDPG/SAC/TD3 to match the behavior of other policies (DQN, A2C, PPO, etc). I.e. if cnn_extractor is given, that is the whole "feature extraction" network without any other additional layers before policy/value estimation. This way users only have to change the cnn_extractor part if they have images, and do not have to remember adjusting layers parameter.

seheevic · 2019-11-29T02:19:38Z

@araffin @Miffyli
Ok, I got it.
I have just one opinion about this when changing DDPG/SAC/TD3 to ignore layers/net_arch.
I didn't fully understand those algorithms, but when it comes to DQN with dueling functionality, you have to allow some additional layers after both value-stream and advantage-stream because with current implementation we can only append layers for value-stream only as I described in the comment of the issue #526. (actually that's why I think it would be better to allow layers after cnn_extractor)
Anyway may I close this PR?

Miffyli · 2019-12-01T09:35:39Z

~~You can close this PR and open a new one, given that the approach and purpose are different (fixing one algo vs. normalizing behavior of most algorithms).~~
On second thought you can continue this work on this PR so we have all discussion in one place. Just rename the PR to something more descriptive.

I do not understand what you mean "you have to allow adding layers after both advantage and value stream". Do you refer to how vf and pi are usually separated on the actor-critic algorithms in when defining network structure? Dueling DQN could use similar approach but this would require too much modifications with not enough benefit.

You can add layers after cnn_extractor by modifying the cnn_extractor itself. If layers also added layers to the network things would get unnecessarily convoluted in my opinion.

seheevic · 2019-12-02T05:03:03Z

@Miffyli
When it comes to Dueling(enabled by default) DQN, let me explain more concretely.

In cnn policy, the network would be composed of 4 parts.
(1) cnn extractor
(2) V(s): state_score
(3) A(s,a): action_score
(4) Q(s,a): final aggregation of V(s) + A(s,a)

Defining the scope of cnn-extractor has some consequence for this dueling network what I've faced recently. As you can see from the code below, (1)-(3)-(4) is considered to be main network and (2) is an optional sub-network in dueling. And layers are appended to only V(s) in current implementation which is consistent to the conclusion so far (i.e. cnn extractor have to describe whole network so that layers are not appended to A(s,a) in main network). I guess (and I can see from my own test case) that A(s,a) needs to be more complex (or at least equal) function than V(s).

stable-baselines/stable_baselines/deepq/policies.py

Lines 103 to 132 in 04c35e1

    
           with tf.variable_scope("model", reuse=reuse): 
        
               with tf.variable_scope("action_value"): 
        
                   if feature_extraction == "cnn": 
        
                       extracted_features = cnn_extractor(self.processed_obs, **kwargs) 
        
                       action_out = extracted_features 
        
                   else: 
        
                       extracted_features = tf.layers.flatten(self.processed_obs) 
        
                       action_out = extracted_features 
        
                       for layer_size in layers: 
        
                           action_out = tf_layers.fully_connected(action_out, num_outputs=layer_size, activation_fn=None) 
        
                           if layer_norm: 
        
                               action_out = tf_layers.layer_norm(action_out, center=True, scale=True) 
        
                           action_out = act_fun(action_out) 
        
                   action_scores = tf_layers.fully_connected(action_out, num_outputs=self.n_actions, activation_fn=None) 
        
               if self.dueling: 
        
                   with tf.variable_scope("state_value"): 
        
                       state_out = extracted_features 
        
                       for layer_size in layers: 
        
                           state_out = tf_layers.fully_connected(state_out, num_outputs=layer_size, activation_fn=None) 
        
                           if layer_norm: 
        
                               state_out = tf_layers.layer_norm(state_out, center=True, scale=True) 
        
                           state_out = act_fun(state_out) 
        
                       state_score = tf_layers.fully_connected(state_out, num_outputs=1, activation_fn=None) 
        
                   action_scores_mean = tf.reduce_mean(action_scores, axis=1) 
        
                   action_scores_centered = action_scores - tf.expand_dims(action_scores_mean, axis=1) 
        
                   q_out = state_score + action_scores_centered 
        
               else: 
        
                   q_out = action_scores

So, about this question:

Do you refer to how vf and pi are usually separated on the actor-critic algorithms in when defining network structure? Dueling DQN could use similar approach but this would require too much modifications with not enough benefit.

Actually, I didn't mean that we have to be able to define V(s), A(s,a) separately. I wanted to point out at least we need to append layers to not only V(s) but also A(s,a) in Dueling DQN. If you are to totally ignore layers/net_arch after cnn_extractor, don't we need some other named parameter to this Dueling DQN? I agree that too much modification would cause more cost but I feel like that current implementation of Dueling DQN is somewhat degenerate structure with respect to CNN policy. (more layers like [256,256,256] does not increase the ability of A(s,a)). Of course we can fully describe the whole dueling network in custom cnn-extractor as you said, but than we cannot utilize this option that already has been implemented.

Miffyli · 2020-01-02T11:50:11Z

Ah yes, I see the issue here. I do not think we should add extra layers only for one branch in the dueling architecture. Best thing is to follow the structure/diagrams of the original paper, where you feed output of CNN extractor to two different fully connected layers (V(s) and A(a, s)) and then do the combination of the two. I.e. for layer_size in layers: block needs to be removed, and instead you just map straight from extracted_features to state_score with one layer. You can include this change in this PR as it relates to whole layer/net_arch mismatch.

As for the other algorithms (DDPG/SAC/TD3): Follow this architecture of either only using cnn_extractor or layers/net_arch, but not both at the same time (i.e. no appending extra fully-connected layers after CNN extractor).

PS: Sorry for the long silence.

seheevic · 2020-01-14T05:31:04Z

I need to make sure of what I understood. Do you mean that the output of CNN extractor for dueling DQN consists of two heads, one for V(s) and another for A(s,a) as the attached image above? The default CNN extractor has this signature nature_cnn(scaled_images, **kwargs) , and by means of kwargs it looks possible. But then, that says default/custom CNN extractors could return one head or could return two heads (or concatenated).

Miffyli · 2020-01-14T17:28:18Z

For some reason I completely missed the layers you have highlighted in the figure (the 512-unit ones for each head), my bad ^^.

I think cnn_extractor should at most do one fully-connected layer (like it does now), but indeed it does not match the paper's implementation. But if we take the features straight from convolutions and add hardcoded layer sizes, it will be a headache for users who want to use different networks.

@araffin
Comment: True to the original implementation (conv -> two heads of 512 -> advantage and value) which is more complicated and may have hardcoded values, or current method (conv -> fc 512 -> advantage and value) which is cleaner/ to implement?

araffin · 2020-01-15T09:22:22Z

which is cleaner/ to implement?

@Miffyli I don't have much time for that issue right now, I trust you to take the right decision ;) (unless you really want my point of view, in that case, you need to wait some time)

Miffyli · 2020-01-15T19:49:04Z

@seheevic

I think we should deviate from paper a little bit by splitting the 512-unit output from nature_cnn directly into advantages and value without any extra layers. The other options include hardcoded values (ewww), and if we include effects of layers/net_arch we collide with the original purpose of this PR. I.e. state_score is just computed with just one mapping from extracted_features, just like action_scores are computed. This difference from original paper should be mentioned in the documentation, though.

mmcenta · 2020-03-26T05:12:00Z

Hello, I just ran into this inconsistency while implementing a project this week and we basically copied the code from the FeedForwardPolicy from DQN to adapt the code for our need. I am free now and I can work on this issue if you guys want some help!

Miffyli · 2020-03-26T23:46:31Z

@mmcenta

Which issue are you referring here, exactly? The confusion with how net_arch/layers behave with different algoriths, or the stuff related to dueling network architecture?

mmcenta · 2020-03-27T00:41:05Z

@Miffyli Now that I read the issue a second time, I don't think it is worth it to change the behavior of the layers parameter now that v3 is on the oven.

What I wanted to tackle is the fact that the deepq policies do not implement the net_arch parameter. What do you think about using an approach similar to what is done is common/policies.py and updating the documentation?

Miffyli · 2020-03-27T23:13:07Z

I agree that we should avoid bigger changes to current version. I am confused though: what is it exactly that net_arch could offer to DQN? I see the naming issue and the unclear behaviour (as discussed in the issues), but for DQN there is no need for e.g. separate networks.

mmcenta · 2020-03-28T00:15:55Z

I can come up with three reasons for the change:

Uniformity: as I was working with a recent project, I used the Custom Policy documentation page as a reference. Quickly I realized that DQN policies did not conform to the documentation page, and I had to dig into the source code to figure out how to use the policies I needed.
Flexibility: as discussed in this PR and in this issue from keras-rl, one can find different ways to split the final layers of the DQNs depending on the implementation. I think that stripping CNN extractors of FC layers and implementing the net_arch parameter is a good way of accommodating both approaches.
Ease of use: if you want to control over the common layers on a DQN net, you would have to implement a custom CNN extractor and add the common FC layers. This is more cumbersome than simply passing the desired architecture via net_arch. I believe this is the issue @seheevic tried to address.

As for the usefulness of having this level of control, I will admit I'm not well suited to answer this question as I'm still inexperienced with Deep RL. I think that the use case of dueling architectures for deeper MLP policies would require it, but I don't know how common they are. If you believe it is not worth it, we could at least update the Custom Policies page.

Miffyli · 2020-03-31T16:32:42Z

@mmcenta

Sorry for the late reply! Such messy times ^^

The docs on custom policies should probably mention to which algorithms this applies. DQN page does note that (it uses its own policies)[https://stable-baselines.readthedocs.io/en/master/modules/dqn.html], not the common ones.
(also 3) Personally, I would say having one place for the full network architecture (cnn_extractor, in the case of CNN policies) is more flexible. You can add and modify layers as you please in this one spot. In case of MLPPolicies, you still have layers parameter (but indeed, this was the inconsistent part, not being named net_arch

But I see the benefit of net_arch, though: In dueling architecture, the shared features are fed to two separate layers (of size N), which are then mapped to Q-like-values and state value. This is closer to actor-critics with policy and value networks, so in that sense it could be done with net_arch. However this will get messier with the option to disable/enable dueling architecture.

Fix - After cnn features, flexible layers was not appended

78dbec8

araffin added the PR template not filled Please fill the pull request template label Nov 28, 2019

Adding info on fix to changelog.

6692c4e

araffin removed the PR template not filled Please fill the pull request template label Nov 28, 2019

seheevic changed the title ~~Fix - After cnn features, flexible layers was not appended~~ Fix -inconsistency of layers/net_arch usage w.r.t cnn policy between different algorithms Dec 2, 2019

seheevic changed the title ~~Fix -inconsistency of layers/net_arch usage w.r.t cnn policy between different algorithms~~ Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms Dec 2, 2019

Miffyli mentioned this pull request Jun 11, 2020

[QUESTION] DQN Policy Kwargs depreciated? #888

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587

seheevic commented Nov 28, 2019 •

edited by araffin

Loading

araffin commented Nov 28, 2019

seheevic commented Nov 28, 2019 •

edited

Loading

araffin commented Nov 28, 2019 •

edited

Loading

seheevic commented Nov 28, 2019 •

edited

Loading

Miffyli commented Nov 28, 2019 •

edited

Loading

seheevic commented Nov 29, 2019 •

edited

Loading

Miffyli commented Dec 1, 2019 •

edited

Loading

seheevic commented Dec 2, 2019 •

edited

Loading

Miffyli commented Jan 2, 2020

seheevic commented Jan 14, 2020 •

edited

Loading

Miffyli commented Jan 14, 2020

araffin commented Jan 15, 2020

Miffyli commented Jan 15, 2020

mmcenta commented Mar 26, 2020

Miffyli commented Mar 26, 2020

mmcenta commented Mar 27, 2020

Miffyli commented Mar 27, 2020

mmcenta commented Mar 28, 2020

Miffyli commented Mar 31, 2020

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587

Are you sure you want to change the base?

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587

Conversation

seheevic commented Nov 28, 2019 • edited by araffin Loading

Description

Motivation and Context

Types of changes

Checklist:

araffin commented Nov 28, 2019

seheevic commented Nov 28, 2019 • edited Loading

araffin commented Nov 28, 2019 • edited Loading

seheevic commented Nov 28, 2019 • edited Loading

Miffyli commented Nov 28, 2019 • edited Loading

seheevic commented Nov 29, 2019 • edited Loading

Miffyli commented Dec 1, 2019 • edited Loading

seheevic commented Dec 2, 2019 • edited Loading

Miffyli commented Jan 2, 2020

seheevic commented Jan 14, 2020 • edited Loading

Miffyli commented Jan 14, 2020

araffin commented Jan 15, 2020

Miffyli commented Jan 15, 2020

mmcenta commented Mar 26, 2020

Miffyli commented Mar 26, 2020

mmcenta commented Mar 27, 2020

Miffyli commented Mar 27, 2020

mmcenta commented Mar 28, 2020

Miffyli commented Mar 31, 2020

seheevic commented Nov 28, 2019 •

edited by araffin

Loading

seheevic commented Nov 28, 2019 •

edited

Loading

araffin commented Nov 28, 2019 •

edited

Loading

seheevic commented Nov 28, 2019 •

edited

Loading

Miffyli commented Nov 28, 2019 •

edited

Loading

seheevic commented Nov 29, 2019 •

edited

Loading

Miffyli commented Dec 1, 2019 •

edited

Loading

seheevic commented Dec 2, 2019 •

edited

Loading

seheevic commented Jan 14, 2020 •

edited

Loading