Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of MOPPO fails #102

Closed
thomaslautenb opened this issue Jun 6, 2024 · 3 comments
Closed

Implementation of MOPPO fails #102

thomaslautenb opened this issue Jun 6, 2024 · 3 comments

Comments

@thomaslautenb
Copy link

I encountered a Problem with implementation of the MO_PPO.
There was a mismatch in dimensions of the _reward vector in the sync_vector_env environment.

I worked around the issues by extending the dimension of the reward vector by the number of objectives -> reward_dim. (its hardcoded and not pretty)

You might want to have a look on this.

I can provide more extensive report on the issue if required.

def step_wait(self) -> Tuple[Any, NDArray[Any], NDArray[Any], NDArray[Any], dict]:
        """Steps through each of the environments returning the batched results.

        Returns:
            The batched environment step results
        """
        **reward_dim = 2
        self._rewards = np.zeros((self.num_envs, reward_dim), dtype=np.float64)**
        observations, infos = [], {}
        for i, (env, action) in enumerate(zip(self.envs, self._actions)):
            (
                observation,
                self._rewards[i],
                self._terminateds[i],
                self._truncateds[i],
                info,
            ) = env.step(action)

            if self._terminateds[i] or self._truncateds[i]:
                old_observation, old_info = observation, info
                observation, info = env.reset()
                info["final_observation"] = old_observation
                info["final_info"] = old_info
            observations.append(observation)
            infos = self._add_info(infos, info, i)
        self.observations = concatenate(
            self.single_observation_space, observations, self.observations
        )

        return (
            deepcopy(self.observations) if self.copy else self.observations,
            np.copy(self._rewards),
            np.copy(self._terminateds),
            np.copy(self._truncateds),
            infos,
        )

@ffelten
Copy link
Collaborator

ffelten commented Jun 6, 2024

Hi @thomaslautenb,

Thanks for reporting this.

Which version of Gymnasium are you using? The vector wrapper heavily relies on Gymnasium (and it changed recently in Gymnasium). The last time I ran that code it worked, it was (I think) with Gymnasium 0.28.1.

Now, for the longer term, we are revamping the wrapper implementations to match Gymnasium 1.0. The PR is here: Farama-Foundation/MO-Gymnasium#95 it would be great if we could validate that the new vector wrappers work with MORL-Baselines.

I don't have time to finish this month as I'm defending my thesis and moving abroad but it is on my todo for this summer. If you have time, you could also try it on your own by working from the current PR on MO-Gymnasium and Gymnasium 1.0 :-).

@ffelten
Copy link
Collaborator

ffelten commented Aug 8, 2024

Hi @thomaslautenb,

First, the PR for the migration to 1.0 is moving forward, see #109. Second, I've talked with different people running PGMORL these days and they did not experience any issues. Could you tell me if it is still buggy?

@thomaslautenb
Copy link
Author

thomaslautenb commented Aug 9, 2024

Hi @ffelten
excuse me for not getting back to you.
I was indeed using an older gymnasium version. Running with 0.28.1 worked.
You mgiht close the issue!

Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants