Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralized learning-decentralized execution clarification (engineering perspective) #79

Open
Kimonili opened this issue Aug 26, 2020 · 0 comments

Comments

@Kimonili
Copy link

Hi everyone,

I am implementing the PPO algorithm on this environment. I succesfully run a few experiments in the single agent simple environment which I used for debugging. Now I am trying to scale the code in order to be compatible to multiagent setting as well.

I can understand the theoritical concept of the centralized learning-decentralized execution approach, but I am quite confused about the coding-engineering changes to be done in the update of the networks in the PPO algo.

I think that the actor network (if shared layers is not the case) will use each agent's actor loss to update the network, but how the critcs are updated? Should I calculate a cummulative critic loss and backpropagate it in every critic network?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant