Implementation of MOPPO fails #102

thomaslautenb · 2024-06-06T11:48:22Z

I encountered a Problem with implementation of the MO_PPO.
There was a mismatch in dimensions of the _reward vector in the sync_vector_env environment.

I worked around the issues by extending the dimension of the reward vector by the number of objectives -> reward_dim. (its hardcoded and not pretty)

You might want to have a look on this.

I can provide more extensive report on the issue if required.

def step_wait(self) -> Tuple[Any, NDArray[Any], NDArray[Any], NDArray[Any], dict]:
        """Steps through each of the environments returning the batched results.

        Returns:
            The batched environment step results
        """
        **reward_dim = 2
        self._rewards = np.zeros((self.num_envs, reward_dim), dtype=np.float64)**
        observations, infos = [], {}
        for i, (env, action) in enumerate(zip(self.envs, self._actions)):
            (
                observation,
                self._rewards[i],
                self._terminateds[i],
                self._truncateds[i],
                info,
            ) = env.step(action)

            if self._terminateds[i] or self._truncateds[i]:
                old_observation, old_info = observation, info
                observation, info = env.reset()
                info["final_observation"] = old_observation
                info["final_info"] = old_info
            observations.append(observation)
            infos = self._add_info(infos, info, i)
        self.observations = concatenate(
            self.single_observation_space, observations, self.observations
        )

        return (
            deepcopy(self.observations) if self.copy else self.observations,
            np.copy(self._rewards),
            np.copy(self._terminateds),
            np.copy(self._truncateds),
            infos,
        )

The text was updated successfully, but these errors were encountered:

ffelten · 2024-06-06T16:59:19Z

Hi @thomaslautenb,

Thanks for reporting this.

Which version of Gymnasium are you using? The vector wrapper heavily relies on Gymnasium (and it changed recently in Gymnasium). The last time I ran that code it worked, it was (I think) with Gymnasium 0.28.1.

Now, for the longer term, we are revamping the wrapper implementations to match Gymnasium 1.0. The PR is here: Farama-Foundation/MO-Gymnasium#95 it would be great if we could validate that the new vector wrappers work with MORL-Baselines.

I don't have time to finish this month as I'm defending my thesis and moving abroad but it is on my todo for this summer. If you have time, you could also try it on your own by working from the current PR on MO-Gymnasium and Gymnasium 1.0 :-).

ffelten · 2024-08-08T13:00:56Z

Hi @thomaslautenb,

First, the PR for the migration to 1.0 is moving forward, see #109. Second, I've talked with different people running PGMORL these days and they did not experience any issues. Could you tell me if it is still buggy?

thomaslautenb · 2024-08-09T11:32:43Z

Hi @ffelten
excuse me for not getting back to you.
I was indeed using an older gymnasium version. Running with 0.28.1 worked.
You mgiht close the issue!

Thanks for the feedback!

ffelten closed this as completed Aug 9, 2024

wilhem mentioned this issue Aug 9, 2024

gym.vector.SyncVectorEnv does not works with the "mo-mountaincarcontinuous-v0" #112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of MOPPO fails #102

Implementation of MOPPO fails #102

thomaslautenb commented Jun 6, 2024

ffelten commented Jun 6, 2024 •

edited

Loading

ffelten commented Aug 8, 2024

thomaslautenb commented Aug 9, 2024 •

edited

Loading

Implementation of MOPPO fails #102

Implementation of MOPPO fails #102

Comments

thomaslautenb commented Jun 6, 2024

ffelten commented Jun 6, 2024 • edited Loading

ffelten commented Aug 8, 2024

thomaslautenb commented Aug 9, 2024 • edited Loading

ffelten commented Jun 6, 2024 •

edited

Loading

thomaslautenb commented Aug 9, 2024 •

edited

Loading