-
-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MuJoCo v5 environments #572
Add MuJoCo v5 environments #572
Conversation
`tests/env/mujoco/test_mojoco_v3.py`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation doesn't include in the version history the v5 changes
Are we planning on updating all of the assets to use the compile tool?
|
@Kallinteris-Andreas Could you update Ant (and all other environments) to follow the detail that I outline in Farama-Foundation/Gymnasium-Robotics#104 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the number of suggested changes, I have only done Humanoid but can look to do more soon
@pseudo-rnd-thoughts wow, that was way more comments that I expected, many of which apply to multiple environments, so do not review another environments until I have resolved all the issue in If you still want to review something, review the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are very comprehensive and impressive, nice job on them, only two points about them
@pseudo-rnd-thoughts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a quick scroll through all of the environments and the changes are super impressive. Amazing job, I have two small requested changes otherwise looks good to merge to me
For future reference, an additional bug has been fixed in |
For reference: |
cont from: Farama-Foundation/Gymnasium-Robotics#104
Description
Adds the
v5
version of themujoco
environments.Changelog
mujoco
version is now 2.3.3.All v5 environments
mujoco
models using thexml_file
argument (previously only a few changes could be made to the existing models).default_camera_config
argument, a dictionary for setting themj_camera
properties, mainly useful for custom environments.env.observation_structure
, a dictionary for specifying the observation space compose (e.g.qpos
,qvel
), useful for building tooling and wrappers for the MuJoCo environments.info
withreset()
, previously an empty dictionary was returned, the new keys are the same state information asstep()
.frame_skip
argument, used to configure thedt
(duration ofstep()
), default varies by environment check environment documentation pages.Ant
healthy_reward
was given on every step (even if the Ant is unhealthy), now it is only given when the Ant is healthy. Theinfo["reward_survive"]
is updated with this change (related Github issue).contact_cost
, before it was only included ifuse_contact_forces=True
(can be set to0
withcontact_cost_weight=0
).cfrc_ext
ofworldbody
from the observation space as it was always 0, and thus provided no useful information to the agent, resulting is slightly faster training (related Github issue).main_body
argument, which specifies the body used to compute the forward reward (mainly useful for custom MuJoCo models).forward_reward_weight
argument, which defaults to1
(effectively the same behavior as inv4
).include_cfrc_ext_in_observation
argument, previously inv4
the inclusion ofcfrc_ext
observations was controlled byuse_contact_forces
which defaulted toFalse
, whileinclude_cfrc_ext_in_observation
defaults toTrue
.use_contact_forces
argument (note: its functionality has been replaced byinclude_cfrc_ext_in_observation
andcontact_cost_weight
) (related Github issue).info["reward_ctrl"]
sometimes containingcontact_cost
instead ofctrl_cost
.info["x_position"]
&info["y_position"]
&info["distance_from_origin"]
givingxpos
instead ofqpos
observations (xpos
observations are behind 1mj_step()
more here) (related Github issue #1 & Github issue #2).info["forward_reward"]
as it is equivalent toinfo["reward_forward"]
.HalfCheetah
xml_file
argument (was removed inv4
).info["reward_run"]
toinfo["reward_forward"]
to be consistent with the other environments.Hopper
healthy_reward
was given on every step (even if the Hopper was unhealthy), now it is only given when the Hopper is healthy. Theinfo["reward_survive"]
is updated with this change (related Github issue).xml_file
argument (was removed inv4
).info
(info["reward_forward"]
, info["reward_ctrl"]
,info["reward_survive"]
).info["z_distance_from_origin"]
which is equal to the vertical distance of the "torso" body from its initial position.Humanoid
healthy_reward
was given on every step (even if the Humanoid was unhealthy), now it is only given when the Humanoid is healthy. Theinfo["reward_survive"]
is updated with this change (related Github issue).contact_cost
and the correspondingcontact_cost_weight
andcontact_cost_range
arguments, with the same defaults as inHumanoid-v3
(was removed inv4
) (related Github issue).cinert
&cvel
&cfrc_ext
ofworldbody
androot
/freejoint
qfrc_actuator
from the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).xml_file
argument (was removed inv4
).include_cinert_in_observation
,include_cvel_in_observation
,include_qfrc_actuator_in_observation
,include_cfrc_ext_in_observation
arguments to allow for the exclusion of observation elements from the observation space.info["x_position"]
&info["y_position"]
&info["distance_from_origin"]
returningxpos
instead ofqpos
based observations (xpos
observations are behind 1mj_step()
more here) (related Github issue #1 & Github issue #2).info["tendon_length"]
andinfo["tendon_velocity"]
containing observations of the Humanoid's 2 tendons connecting the hips to the knees.info["reward_alive"]
toinfo["reward_survive"]
to be consistent with the other environments.info["reward_linvel"]
toinfo["reward_forward"]
to be consistent with the other environments.info["reward_quadctrl"]
toinfo["reward_ctrl"]
to be consistent with the other environments.info["forward_reward"]
as it is equivalent toinfo["reward_forward"]
.Humanoid Standup
cinert
&cvel
&cfrc_ext
ofworldbody
androot
/freejoint
qfrc_actuator
from the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).xml_file
argument (was removed inv4
).xml_file
argument.uph_cost_weight
,ctrl_cost_weight
,impact_cost_weight
,impact_cost_range
arguments, to configure the reward function (defaults are effectively the same as inv4
).reset_noise_scale
argument, to set the range of initial states.include_cinert_in_observation
,include_cvel_in_observation
,include_qfrc_actuator_in_observation
,include_cfrc_ext_in_observation
arguments to allow for the exclusion of observation elements from the observation space.info["tendon_length"]
andinfo["tendon_velocity"]
containing observations of the Humanoid's 2 tendons connecting the hips to the knees.info["x_position"]
&info["y_position"]
, which contain the observations excluded whenexclude_current_positions_from_observation == True
.info["z_distance_from_origin"]
which is equal to the vertical distance of the "torso" body from its initial position.InvertedDoublePendulum
healthy_reward
was given on every step (even if the Pendulum is unhealthy), now it is only given if the DoublePendulum is healthy (not terminated)(related Github issue).qfrc_constraint
("constraint force") of the hinges from the observation space (as it was always 0, thus providing no useful information to the agent, resulting is slightly faster training) (related Github issue).xml_file
argument.reset_noise_scale
argument, to set the range of initial states.healthy_reward
argument to configure the reward function (defaults are effectively the same as inv4
).info
(info["reward_survive"]
,info["distance_penalty"]
,info["velocity_penalty"]
).InvertedPendulum
healthy_reward
was given on every step (even if the Pendulum is unhealthy), now it is only given if the Pendulum is healthy (not terminated) (related Github issue).xml_file
argument.reset_noise_scale
argument to set the range of initial states.info["reward_survive"]
which contains the reward.Pusher
xml_file
argument.reward_near_weight
,reward_dist_weight
,reward_control_weight
arguments, to configure the reward function (defaults are effectively the same as inv4
).info["reward_ctrl"]
being not being multiplied by the reward weight.info["reward_near"]
which is equal to the reward termreward_near
.Reacher
"z - position_fingertip"
from the observation space since it is always 0, and therefore provides no useful information to the agent, this should result is slightly faster training (related Github issue).xml_file
argument.reward_dist_weight
,reward_control_weight
arguments, to configure the reward function (defaults are effectively the same as inv4
).info["reward_ctrl"]
being not being multiplied by the reward weight.Swimmer
xml_file
argument (was removed inv4
).forward_reward_weight
,ctrl_cost_weight
, to configure the reward function (defaults are effectively the same as inv4
).reset_noise_scale
argument to set the range of initial states.exclude_current_positions_from_observation
argument.info["reward_fwd"]
andinfo["forward_reward"]
withinfo["reward_forward"]
to be consistent with the other environments.Walker2D
Walker-v5
model is updated to have the same friction for both feet (set to 1.9). This causes the Walker2d's the right foot to slide less on the surface and therefore require more force to move (related Github issue).healthy_reward
was given on every step (even if the Walker2D is unhealthy), now it is only given if the Walker2d is healthy. Theinfo
"reward_survive" is updated with this change (related Github issue).xml_file
argument (was removed inv4
).info
(info["reward_forward"]
, info["reward_ctrl"]
,info["reward_survive"]
).info["z_distance_from_origin"]
which is equal to the vertical distance of the "torso" body from its initial position.Type of change
add new revision of
MuJoCo
environments.Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)Benchmarks
v3
→v4
)https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation
issues fixed:
"global"
with"local"
coordinate system google-deepmind/mujoco#833Humanoid
&Ant
Have wronginfo["distance_from_origin"]
#539Ant
&Humanoid
have wrong "x_position" & "y_position"info
#521Humanoid-v4
does not havecontact_cost
#504InvertedDoublePendulumEnv
andInvertedPendulumEnv
always gives "alive_bonus" #500MuJoCo/Walker2d
left foot has different friction than right foot #477mujoco.InvertedDoublePendulum
last 2 observations (constraints) are const 0 #228MuJoCo.Ant
contact forces being off by default is based on a wrong experiment #214MuJoCo
] Reacher And Pusher reward is calculated prior to transition #821 (fixed in FixReacher-v5
&Pusher-v5
reward function being calculated using previous state #832)TODO
Finished environments
Cutting room floor (not included in the
v5
release)Humanoid
s (include_tendon_in_observation
).Ant
&Humanoid
after step.ManySegmentSwimmer
&CoupledHalfCheetah
environments.reset_noise_scale
to manipulation environments (Pusher
&Reacher
).Pusher
&Reacher
).healthy_z_range
's bodyHumanoidStandup.uph_cost
based onself.dt
and notopt.timestep
HumanoidStandup.model.left_hip_y
range fixCredits
Lead Developer: @Kallinteris-Andreas
Specifications/Requirements & Code Review: @pseudo-rnd-thoughts
Debugging assistance: @rodrigodelazcano