Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MuJoCo v5 environments #572

Merged

Conversation

Kallinteris-Andreas
Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas commented Jun 27, 2023

cont from: Farama-Foundation/Gymnasium-Robotics#104

Description

Adds the v5 version of the mujoco environments.

Changelog

  • Minimum mujoco version is now 2.3.3.

All v5 environments

  • Added support for fully custom/third party mujoco models using the xml_file argument (previously only a few changes could be made to the existing models).
  • Added default_camera_config argument, a dictionary for setting the mj_camera properties, mainly useful for custom environments.
  • Added env.observation_structure, a dictionary for specifying the observation space compose (e.g. qpos, qvel), useful for building tooling and wrappers for the MuJoCo environments.
  • Return a non-empty info with reset(), previously an empty dictionary was returned, the new keys are the same state information as step().
  • Added frame_skip argument, used to configure the dt (duration of step()), default varies by environment check environment documentation pages.

Ant

  • Fixed bug: healthy_reward was given on every step (even if the Ant is unhealthy), now it is only given when the Ant is healthy. The info["reward_survive"] is updated with this change (related Github issue).
  • The reward function now always includes contact_cost, before it was only included if use_contact_forces=True (can be set to 0 with contact_cost_weight=0).
  • Excluded the cfrc_ext of worldbody from the observation space as it was always 0, and thus provided no useful information to the agent, resulting is slightly faster training (related Github issue).
  • Added the main_body argument, which specifies the body used to compute the forward reward (mainly useful for custom MuJoCo models).
  • Added the forward_reward_weight argument, which defaults to 1 (effectively the same behavior as in v4).
  • Added the include_cfrc_ext_in_observation argument, previously in v4 the inclusion of cfrc_ext observations was controlled by use_contact_forces which defaulted to False, while include_cfrc_ext_in_observation defaults to True.
  • Removed the use_contact_forces argument (note: its functionality has been replaced by include_cfrc_ext_in_observation and contact_cost_weight) (related Github issue).
  • Fixed info["reward_ctrl"] sometimes containing contact_cost instead of ctrl_cost.
  • Fixed info["x_position"] & info["y_position"] & info["distance_from_origin"] giving xpos instead of qpos observations (xpos observations are behind 1 mj_step() more here) (related Github issue #1 & Github issue #2).
  • Removed info["forward_reward"] as it is equivalent to info["reward_forward"].

HalfCheetah

  • Restored the xml_file argument (was removed in v4).
  • Renamed info["reward_run"] to info["reward_forward"] to be consistent with the other environments.

Hopper

  • Fixed bug: healthy_reward was given on every step (even if the Hopper was unhealthy), now it is only given when the Hopper is healthy. The info["reward_survive"] is updated with this change (related Github issue).
  • Restored the xml_file argument (was removed in v4).
  • Added individual reward terms in info (info["reward_forward"], info["reward_ctrl"], info["reward_survive"]).
  • Added info["z_distance_from_origin"] which is equal to the vertical distance of the "torso" body from its initial position.

Humanoid

  • Fixed bug: healthy_reward was given on every step (even if the Humanoid was unhealthy), now it is only given when the Humanoid is healthy. The info["reward_survive"] is updated with this change (related Github issue).
  • Restored contact_cost and the corresponding contact_cost_weight and contact_cost_range arguments, with the same defaults as in Humanoid-v3 (was removed in v4) (related Github issue).
  • Excluded the cinert & cvel & cfrc_ext of worldbody and root/freejoint qfrc_actuator from the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).
  • Restored the xml_file argument (was removed in v4).
  • Added include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments to allow for the exclusion of observation elements from the observation space.
  • Fixed info["x_position"] & info["y_position"] & info["distance_from_origin"] returning xpos instead of qpos based observations (xpos observations are behind 1 mj_step() more here) (related Github issue #1 & Github issue #2).
  • Added info["tendon_length"] and info["tendon_velocity"] containing observations of the Humanoid's 2 tendons connecting the hips to the knees.
  • Renamed info["reward_alive"] to info["reward_survive"] to be consistent with the other environments.
  • Renamed info["reward_linvel"] to info["reward_forward"] to be consistent with the other environments.
  • Renamed info["reward_quadctrl"] to info["reward_ctrl"] to be consistent with the other environments.
  • Removed info["forward_reward"] as it is equivalent to info["reward_forward"].

Humanoid Standup

  • Excluded the cinert & cvel & cfrc_ext of worldbody and root/freejoint qfrc_actuator from the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).
  • Restored the xml_file argument (was removed in v4).
  • Added xml_file argument.
  • Added uph_cost_weight, ctrl_cost_weight, impact_cost_weight, impact_cost_range arguments, to configure the reward function (defaults are effectively the same as in v4).
  • Added reset_noise_scale argument, to set the range of initial states.
  • Added include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments to allow for the exclusion of observation elements from the observation space.
  • Added info["tendon_length"] and info["tendon_velocity"] containing observations of the Humanoid's 2 tendons connecting the hips to the knees.
  • Added info["x_position"] & info["y_position"] , which contain the observations excluded when exclude_current_positions_from_observation == True.
  • Added info["z_distance_from_origin"] which is equal to the vertical distance of the "torso" body from its initial position.

InvertedDoublePendulum

  • Fixed bug: healthy_reward was given on every step (even if the Pendulum is unhealthy), now it is only given if the DoublePendulum is healthy (not terminated)(related Github issue).
  • Excluded the qfrc_constraint ("constraint force") of the hinges from the observation space (as it was always 0, thus providing no useful information to the agent, resulting is slightly faster training) (related Github issue).
  • Added xml_file argument.
  • Added reset_noise_scale argument, to set the range of initial states.
  • Added healthy_reward argument to configure the reward function (defaults are effectively the same as in v4).
  • Added individual reward terms in info ( info["reward_survive"], info["distance_penalty"], info["velocity_penalty"]).

InvertedPendulum

  • Fixed bug: healthy_reward was given on every step (even if the Pendulum is unhealthy), now it is only given if the Pendulum is healthy (not terminated) (related Github issue).
  • Added xml_file argument.
  • Added reset_noise_scale argument to set the range of initial states.
  • Added info["reward_survive"] which contains the reward.

Pusher

  • Added xml_file argument.
  • Added reward_near_weight, reward_dist_weight, reward_control_weight arguments, to configure the reward function (defaults are effectively the same as in v4).
  • Fixed info["reward_ctrl"] being not being multiplied by the reward weight.
  • Added info["reward_near"] which is equal to the reward term reward_near.

Reacher

  • Removed "z - position_fingertip" from the observation space since it is always 0, and therefore provides no useful information to the agent, this should result is slightly faster training (related Github issue).
  • Added xml_file argument.
  • Added reward_dist_weight, reward_control_weight arguments, to configure the reward function (defaults are effectively the same as in v4).
  • Fixed info["reward_ctrl"] being not being multiplied by the reward weight.

Swimmer

  • Restored the xml_file argument (was removed in v4).
  • Added forward_reward_weight, ctrl_cost_weight, to configure the reward function (defaults are effectively the same as in v4).
  • Added reset_noise_scale argument to set the range of initial states.
  • Added exclude_current_positions_from_observation argument.
  • Replaced info["reward_fwd"] and info["forward_reward"] with info["reward_forward"] to be consistent with the other environments.

Walker2D

  • In v2, v3 and v4 the models have different friction values for the two feet (left foot friction == 1.9 and right foot friction == 0.9). The Walker-v5 model is updated to have the same friction for both feet (set to 1.9). This causes the Walker2d's the right foot to slide less on the surface and therefore require more force to move (related Github issue).
  • Fixed bug: healthy_reward was given on every step (even if the Walker2D is unhealthy), now it is only given if the Walker2d is healthy. The info "reward_survive" is updated with this change (related Github issue).
  • Restored the xml_file argument (was removed in v4).
  • Added individual reward terms in info (info["reward_forward"], info["reward_ctrl"], info["reward_survive"]).
  • Added info["z_distance_from_origin"] which is equal to the vertical distance of the "torso" body from its initial position.

Type of change

add new revision of MuJoCo environments.

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Benchmarks

issues fixed:

TODO

Finished environments

  • Ant
  • Half Cheetah
  • Hopper
  • Humanoid
  • Humanoid Standup
  • Inverted Double Pendulum
  • Inverted Pendulum
  • Reacher
  • Swimmer
  • Pusher
  • Walker2D

Cutting room floor (not included in the v5 release)

  • Add option to observe tendons in Humanoids (include_tendon_in_observation).
  • Update kinematics of Ant & Humanoid after step.
  • Add ManySegmentSwimmer & CoupledHalfCheetah environments.
  • Add reset_noise_scale to manipulation environments (Pusher & Reacher).
  • Increase configurability of the manipulation environments (Pusher & Reacher).
  • Add termination conditions to the manipulation environments (Pusher & Reacher).
  • Add more arguments to control the reward function of InvertedDoublePendulum.
  • Reduce the obsevation space limits of angles.
  • Add noisy actions & observations & rewards
  • define healthy_z_range's body
  • HumanoidStandup.uph_cost based on self.dt and not opt.timestep
  • HumanoidStandup.model.left_hip_y range fix

Credits

Lead Developer: @Kallinteris-Andreas
Specifications/Requirements & Code Review: @pseudo-rnd-thoughts
Debugging assistance: @rodrigodelazcano

@Kallinteris-Andreas Kallinteris-Andreas marked this pull request as draft June 27, 2023 19:26
@Kallinteris-Andreas Kallinteris-Andreas changed the title adding Mujoco v5 environments add Mujoco v5 environments Jun 30, 2023
Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation doesn't include in the version history the v5 changes

Are we planning on updating all of the assets to use the compile tool?

gymnasium/envs/mujoco/assets/walker2d_v5_old.xml Outdated Show resolved Hide resolved
@Kallinteris-Andreas
Copy link
Collaborator Author

  1. the version history is in the opening comment of this PR (for now)
  2. No we will keep the current assets, no benefit in changing

@pseudo-rnd-thoughts
Copy link
Member

@Kallinteris-Andreas Could you update Ant (and all other environments) to follow the detail that I outline in Farama-Foundation/Gymnasium-Robotics#104 (comment)

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the number of suggested changes, I have only done Humanoid but can look to do more soon

gymnasium/envs/mujoco/humanoid_v5.py Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved
@Kallinteris-Andreas
Copy link
Collaborator Author

@pseudo-rnd-thoughts wow, that was way more comments that I expected, many of which apply to multiple environments, so do not review another environments until I have resolved all the issue in Humanoid.

If you still want to review something, review the test_mujoco_v5.py

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are very comprehensive and impressive, nice job on them, only two points about them

tests/envs/mujoco/test_mujoco_v5.py Show resolved Hide resolved
tests/envs/mujoco/test_mujoco_v5.py Outdated Show resolved Hide resolved
@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Jul 14, 2023

@pseudo-rnd-thoughts
I have applied most of the requested changed in humanoid to all environments (when applicable),
a second pass to Humanoid should be enough, and then review the HumanoidStandup (Note: for HumanoidStandup, you can skip the action & observation space sections since they are a copy and paste from Humanoid)

Copy link
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a quick scroll through all of the environments and the changes are super impressive. Amazing job, I have two small requested changes otherwise looks good to merge to me

gymnasium/envs/mujoco/half_cheetah_v5.py Outdated Show resolved Hide resolved
gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved
@Kallinteris-Andreas
Copy link
Collaborator Author

For future reference, an additional bug has been fixed in Gymnasium/MuJoCo-v5 in this PR #832

@Kallinteris-Andreas
Copy link
Collaborator Author

For reference:
In the MuJoCo-v5 release, this PR was also added to fix a bug with Pusher-v5 and mujoco>=3.0.0 #1019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants