You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am implementing the PPO algorithm on this environment. I succesfully run a few experiments in the single agent simple environment which I used for debugging. Now I am trying to scale the code in order to be compatible to multiagent setting as well.
I can understand the theoritical concept of the centralized learning-decentralized execution approach, but I am quite confused about the coding-engineering changes to be done in the update of the networks in the PPO algo.
I think that the actor network (if shared layers is not the case) will use each agent's actor loss to update the network, but how the critcs are updated? Should I calculate a cummulative critic loss and backpropagate it in every critic network?
The text was updated successfully, but these errors were encountered:
Hi everyone,
I am implementing the PPO algorithm on this environment. I succesfully run a few experiments in the single agent simple environment which I used for debugging. Now I am trying to scale the code in order to be compatible to multiagent setting as well.
I can understand the theoritical concept of the centralized learning-decentralized execution approach, but I am quite confused about the coding-engineering changes to be done in the update of the networks in the PPO algo.
I think that the actor network (if shared layers is not the case) will use each agent's actor loss to update the network, but how the critcs are updated? Should I calculate a cummulative critic loss and backpropagate it in every critic network?
The text was updated successfully, but these errors were encountered: