Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wonderful Work!! A few questions regarding modelling and simulation tasks #42

Open
gaodechen opened this issue Dec 11, 2024 · 1 comment

Comments

@gaodechen
Copy link

Hi,

Thank you for your fantastic work and for making the codes and checkpoints publicly available! I was quite surprised by the trianing stability and its generalizaibility to simulation environments though it seems the pre-trianing does not include sim data.

I’m currently fine-tuning RDT-1B and testing it on several simulation tasks in AV-ALOHA. I would greatly appreciate your insights on a few questions:

  • Action Jerkiness: RDT-1B seems to use 64 as the default chunk size. Have you encountered any jerkiness between action chunks, as shown in the attached video (e.g., the robotic arm suddenly moving to an unintended position at 4s)?
  • Fine-Tuning Steps: Based on your extensive experiments, how many demonstrations or training steps are typically required to achieve a reasonable policy for few-shot learning of downstream tasks? I understand RDT-1B is able to learn solely with 5 or 6 demos as mentioned by the paper. I'm currently reusing the finetune.sh scripts and it took around 200 steps (bs=16, #GPU=4) for the loss to converge for 50 demos while the policy was kinda jerky. So just wondering if there were separate hyper-parameters for your few-shot experiments based your emprical results.
  • Model Choice: Could you share the rationale for selecting T5 as the backbone instead of other LLMs?

Looking forward to your inputs. Thanks again for this amazing contribution!

image_720

rollout_9.mp4
@gaodechen
Copy link
Author

gaodechen commented Dec 11, 2024

We’ve been training on a few downstream ALOHA tasks in simulation, each with 50 demonstrations. The model typically starts yielding a few successful episodes after a few hundred steps (batch size = 32), but the success rate remains low (~10%, lower than ACT). While the loss and sampling error continue to decrease throughout training, we haven’t been able to pinpoint the cause of suboptimality. Wondering if you guys have encountered similar issues before or have any insights into what might be going wrong?

An example episode after fine-tuning on 50 demos and the hyper-parameters were batch size = 32 and training steps = 400 when # GPU = 4

rollout_1.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant