You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your fantastic work and for making the codes and checkpoints publicly available! I was quite surprised by the trianing stability and its generalizaibility to simulation environments though it seems the pre-trianing does not include sim data.
I’m currently fine-tuning RDT-1B and testing it on several simulation tasks in AV-ALOHA. I would greatly appreciate your insights on a few questions:
Action Jerkiness: RDT-1B seems to use 64 as the default chunk size. Have you encountered any jerkiness between action chunks, as shown in the attached video (e.g., the robotic arm suddenly moving to an unintended position at 4s)?
Fine-Tuning Steps: Based on your extensive experiments, how many demonstrations or training steps are typically required to achieve a reasonable policy for few-shot learning of downstream tasks? I understand RDT-1B is able to learn solely with 5 or 6 demos as mentioned by the paper. I'm currently reusing the finetune.sh scripts and it took around 200 steps (bs=16, #GPU=4) for the loss to converge for 50 demos while the policy was kinda jerky. So just wondering if there were separate hyper-parameters for your few-shot experiments based your emprical results.
Model Choice: Could you share the rationale for selecting T5 as the backbone instead of other LLMs?
Looking forward to your inputs. Thanks again for this amazing contribution!
rollout_9.mp4
The text was updated successfully, but these errors were encountered:
We’ve been training on a few downstream ALOHA tasks in simulation, each with 50 demonstrations. The model typically starts yielding a few successful episodes after a few hundred steps (batch size = 32), but the success rate remains low (~10%, lower than ACT). While the loss and sampling error continue to decrease throughout training, we haven’t been able to pinpoint the cause of suboptimality. Wondering if you guys have encountered similar issues before or have any insights into what might be going wrong?
An example episode after fine-tuning on 50 demos and the hyper-parameters were batch size = 32 and training steps = 400 when # GPU = 4
Hi,
Thank you for your fantastic work and for making the codes and checkpoints publicly available! I was quite surprised by the trianing stability and its generalizaibility to simulation environments though it seems the pre-trianing does not include sim data.
I’m currently fine-tuning RDT-1B and testing it on several simulation tasks in AV-ALOHA. I would greatly appreciate your insights on a few questions:
Looking forward to your inputs. Thanks again for this amazing contribution!
rollout_9.mp4
The text was updated successfully, but these errors were encountered: