why "return" remains constant in my custom environment? #38

SaraRezaei · 2022-11-17T15:58:38Z

"return_mean" value remains constant value = 45.0 in all steps in metric.json with COMA or MAA2C. but why?

class ClassName(MultiAgentEnv):
def init(self):
self.n_agents = 10
self.observation_space = gym.spaces.Tuple(tuple( [gym.spaces.Box(np.array([0,0,0,0,0,0,0,0,0,0]),
np.array([3,108,6,8,3,2,3,17,19,17]), shape=(10,), dtype=np.int64)] * self.n_agents ))
self.action_space = gym.spaces.Tuple(tuple([
gym.spaces.Discrete(4),
gym.spaces.Discrete(109),
gym.spaces.Discrete(7),
gym.spaces.Discrete(9),
gym.spaces.Discrete(4),
gym.spaces.Discrete(3),
gym.spaces.Discrete(4),
gym.spaces.Discrete(18),
gym.spaces.Discrete(20),
gym.spaces.Discrete(18)
]))
self.state = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
self.episode = 0
self.thereshold = -10000
super().init()

def reset(self):
    self.state = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
    return self.observation_space.sample()

def step(self, actions):
    cooperation_reward = Compute_reward_function(actions)
    self.state = actions      
    obs, rew, dones, info = {}, {}, {}, {}
    for i in range(10):               
        obs[i] = self.observation_space.sample()
        rew[i] = cooperation_reward
        dones[i] = False
        info[i] = {}
    if(cooperation_reward  > self.thereshold):
        self.thereshold = cooperation_reward
        dones = self.n_agents * [True]
    self.episode +=1 
    dones = self.n_agents * [True]
    return obs, rew, dones, info

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why "return" remains constant in my custom environment? #38

why "return" remains constant in my custom environment? #38

SaraRezaei commented Nov 17, 2022

why "return" remains constant in my custom environment? #38

why "return" remains constant in my custom environment? #38

Comments

SaraRezaei commented Nov 17, 2022