Centralized learning-decentralized execution clarification (engineering perspective) #79

Kimonili · 2020-08-26T14:43:31Z

Hi everyone,

I am implementing the PPO algorithm on this environment. I succesfully run a few experiments in the single agent simple environment which I used for debugging. Now I am trying to scale the code in order to be compatible to multiagent setting as well.

I can understand the theoritical concept of the centralized learning-decentralized execution approach, but I am quite confused about the coding-engineering changes to be done in the update of the networks in the PPO algo.

I think that the actor network (if shared layers is not the case) will use each agent's actor loss to update the network, but how the critcs are updated? Should I calculate a cummulative critic loss and backpropagate it in every critic network?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralized learning-decentralized execution clarification (engineering perspective) #79

Centralized learning-decentralized execution clarification (engineering perspective) #79

Kimonili commented Aug 26, 2020

Centralized learning-decentralized execution clarification (engineering perspective) #79

Centralized learning-decentralized execution clarification (engineering perspective) #79

Comments

Kimonili commented Aug 26, 2020