Issue with Mobile Aloha Inference in MuJoCo: Robot Wandering Without Performing Actions #24

yongzhengqi · 2024-11-18T05:01:26Z

Hi folks,

First off, I want to say amazing work—I'm really impressed by this project!

I've been trying to perform inference on the Mobile Aloha robot in MuJoCo, but I'm encountering an issue: the robot seems to wander aimlessly and doesn't perform any meaningful actions. Do you have any suggestions for resolving this?

Here’s my setup:

I’ve wrapped MuJoCo in a ROS node to interact with agilex_inference.py.
The simulation setup is cloned from Agilex's GitHub repository.

The only modifications I’ve made to RDT's code are adjustments to align the gripper action range with Agilex's setup (0 to 0.0475). Specifically:

In _format_joint_to_state(), I changed:

[[[1, 1, 1, 1, 1, 1, 4.7908, 1, 1, 1, 1, 1, 1, 4.7888]]]

to:

[[[1, 1, 1, 1, 1, 1, 0.0475, 1, 1, 1, 1, 1, 1, 0.0475]]]

In _unformat_action_to_joint(), I changed:

[[[1, 1, 1, 1, 1, 1, 11.8997, 1, 1, 1, 1, 1, 1, 13.9231]]]

to:

[[[1, 1, 1, 1, 1, 1, 0.0475, 1, 1, 1, 1, 1, 1, 0.0475]]]

Could this modification be causing the issue? Or is there another step I might have missed in setting up Mobile Aloha in MuJoCo for inference?

Thanks again for the great work—I'm eager to get this working and achieve good results in MuJoCo!

The text was updated successfully, but these errors were encountered:

yongzhengqi · 2024-11-18T05:04:31Z

Here's the video for the current result.

The command is Grab the soda can and put it in the plate. and I'm running your 1b model.
Screencast from 11-18-2024 12:02:33 AM.webm

yongzhengqi · 2024-11-18T05:07:33Z

And here's the images that each camera is getting at frame 0. Size for each camera is 480 * 640 * 3 (height, width, channel).

Left Camera

Right Camera

Front Camera

yongzhengqi · 2024-11-18T05:15:06Z

I noticed a discrepancy between the simulation environment and the joint range reported in your paper. For joint 3 of the arm, the paper specifies a range of -3.05433 to 0, while Agilex’s simulation setup defines the range as 0 to 3.14.

To address this, I tried flipping the joint position in _format_joint_to_state() and _unformat_action_to_joint(). However, despite this adjustment, the robot continues to exhibit meaningless, erratic movements. I also attempted to apply an offset of 3.054 in both functions, but unfortunately, that did not work either.

Please let me know if you need more information.

yongzhengqi · 2024-11-18T05:27:18Z

At this point, I’m thinking that fine-tuning might be required to run inference in simulation environments. OpenVLA also seems to require fine-tuning to achieve reasonable performance on Libero.

On the other hand, @chjchjchjchjchj's pull request gives me hope: while the WidowX arm didn’t complete the task, it at least approached the spool (the task instruction is to place the spool on the towel).

csuastt · 2024-11-18T11:45:18Z

Yes, fine-tuning is needed.

By the way, do not use SimplerEnv currently. The origin of their coordinate system is different from that of the real-world data. We are trying to build simulation inference. Stay tuned!

yongzhengqi · 2024-11-20T02:40:16Z

Thank you for your prompt reply! I’ll look forward to the good news.

Regarding the need for fine-tuning, do you think it’s primarily required on the perception side, the control side, or both?

csuastt · 2024-11-20T05:28:23Z

Both. We usually call it the embodiment gap.

yongzhengqi · 2024-11-20T17:57:04Z

Philosophically speaking, is it correct to understand that a more diverse set of robots in the training set leads to a smaller embodiment gap in practice (i.e., less fine-tuning needed) when adapting to new robots?

csuastt · 2024-11-21T05:51:40Z

Yes, I think it is. However, at present, the embodiment diversity of pre-training datasets is far from enough.

yongzhengqi · 2024-11-21T22:23:24Z

If the fine-tuning is solely for closing the embodiment gap, is it correct to assume that a diverse set of objects or tasks is not strictly necessary (although, of course, it would be beneficial to include them)? Can I assume the model has already learned aspects beyond the embodiment gap (e.g., visual reasoning, task planning, etc.) during pre-training?

csuastt · 2024-11-22T07:56:15Z

Yes, you are right.

zzl410 · 2024-11-24T14:51:55Z

We encountered the same issue in a real-world scenario and are unsure where the problem lies. We hope to receive some help.
The command is pour water from the bottle into the mug

Here's the video for the current result.

The command is Grab the soda can and put it in the plate. and I'm running your 1b model. Screencast from 11-18-2024 12:02:33 AM.webm

WeChat_20241124224257.mp4

csuastt · 2024-11-25T06:02:11Z

@zzl410 Have you fine-tuned the model? It seems quite abnormal...

zzl410 · 2024-11-25T09:39:57Z

Thank you for your attention.We did not perform fine-tuning on the models, using only two base models: rdt-1b and rdt-170m. We tested the following three instructions:

Pour water from the bottle into the mug.
Pick up the black marker on the right and put it into the packaging box on the left.
Fold the basketball shorts into a rectangle.

In all cases, a similar issue occurred where the robotic arm moved upwards during the grasping motion.
command is

python -m scripts.agilex_inference \
    --use_actions_interpolation \
    --pretrained_model_name_or_path /home/mobilealoha/RoboticsDiffusionTransformer/robotics-diffusion-transformer/rdt-1b \
    --lang_embeddings_path outs/Pour_water.pt \
    --ctrl_freq 25

zzl410 · 2024-11-25T09:53:02Z

Interestingly, even in a completely dark experimental environment, the robotic arm still exhibits the same issue. We verified the reception of camera image data, confirming that it is complete and accurate.

csuastt · 2024-11-25T11:52:53Z

You should fine-tune first. Since the pre-trained checkpoint does not see your embodiment before.

ROSKING · 2024-11-27T02:22:56Z

Yes, fine-tuning is needed.

By the way, do not use SimplerEnv currently. The origin of their coordinate system is different from that of the real-world data. We are trying to build simulation inference. Stay tuned!

Nice! when will the simulation inference version be released?

zzl410 · 2024-11-27T12:20:28Z

你应该先进行微调。因为预训练的检查点之前没有见过你的实例。

thank you ，it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Mobile Aloha Inference in MuJoCo: Robot Wandering Without Performing Actions #24

Issue with Mobile Aloha Inference in MuJoCo: Robot Wandering Without Performing Actions #24

yongzhengqi commented Nov 18, 2024

yongzhengqi commented Nov 18, 2024

yongzhengqi commented Nov 18, 2024 •

edited

Loading

yongzhengqi commented Nov 18, 2024 •

edited

Loading

yongzhengqi commented Nov 18, 2024

csuastt commented Nov 18, 2024

yongzhengqi commented Nov 20, 2024

csuastt commented Nov 20, 2024

yongzhengqi commented Nov 20, 2024

csuastt commented Nov 21, 2024

yongzhengqi commented Nov 21, 2024

csuastt commented Nov 22, 2024

zzl410 commented Nov 24, 2024

csuastt commented Nov 25, 2024

zzl410 commented Nov 25, 2024 •

edited

Loading

zzl410 commented Nov 25, 2024

csuastt commented Nov 25, 2024

ROSKING commented Nov 27, 2024

zzl410 commented Nov 27, 2024

Issue with Mobile Aloha Inference in MuJoCo: Robot Wandering Without Performing Actions #24

Issue with Mobile Aloha Inference in MuJoCo: Robot Wandering Without Performing Actions #24

Comments

yongzhengqi commented Nov 18, 2024

yongzhengqi commented Nov 18, 2024

yongzhengqi commented Nov 18, 2024 • edited Loading

Left Camera

Right Camera

Front Camera

yongzhengqi commented Nov 18, 2024 • edited Loading

yongzhengqi commented Nov 18, 2024

csuastt commented Nov 18, 2024

yongzhengqi commented Nov 20, 2024

csuastt commented Nov 20, 2024

yongzhengqi commented Nov 20, 2024

csuastt commented Nov 21, 2024

yongzhengqi commented Nov 21, 2024

csuastt commented Nov 22, 2024

zzl410 commented Nov 24, 2024

csuastt commented Nov 25, 2024

zzl410 commented Nov 25, 2024 • edited Loading

zzl410 commented Nov 25, 2024

csuastt commented Nov 25, 2024

ROSKING commented Nov 27, 2024

zzl410 commented Nov 27, 2024

yongzhengqi commented Nov 18, 2024 •

edited

Loading

yongzhengqi commented Nov 18, 2024 •

edited

Loading

zzl410 commented Nov 25, 2024 •

edited

Loading