Wonderful Work!! A few questions regarding modelling and simulation tasks #42

gaodechen · 2024-12-11T00:20:51Z

Hi,

Thank you for your fantastic work and for making the codes and checkpoints publicly available! I was quite surprised by the trianing stability and its generalizaibility to simulation environments though it seems the pre-trianing does not include sim data.

I’m currently fine-tuning RDT-1B and testing it on several simulation tasks in AV-ALOHA. I would greatly appreciate your insights on a few questions:

Action Jerkiness: RDT-1B seems to use 64 as the default chunk size. Have you encountered any jerkiness between action chunks, as shown in the attached video (e.g., the robotic arm suddenly moving to an unintended position at 4s)?
Fine-Tuning Steps: Based on your extensive experiments, how many demonstrations or training steps are typically required to achieve a reasonable policy for few-shot learning of downstream tasks? I understand RDT-1B is able to learn solely with 5 or 6 demos as mentioned by the paper. I'm currently reusing the finetune.sh scripts and it took around 200 steps (bs=16, #GPU=4) for the loss to converge for 50 demos while the policy was kinda jerky. So just wondering if there were separate hyper-parameters for your few-shot experiments based your emprical results.
Model Choice: Could you share the rationale for selecting T5 as the backbone instead of other LLMs?

Looking forward to your inputs. Thanks again for this amazing contribution!

rollout_9.mp4

gaodechen · 2024-12-11T19:28:35Z

We’ve been training on a few downstream ALOHA tasks in simulation, each with 50 demonstrations. The model typically starts yielding a few successful episodes after a few hundred steps (batch size = 32), but the success rate remains low (~10%, lower than ACT). While the loss and sampling error continue to decrease throughout training, we haven’t been able to pinpoint the cause of suboptimality. Wondering if you guys have encountered similar issues before or have any insights into what might be going wrong?

An example episode after fine-tuning on 50 demos and the hyper-parameters were batch size = 32 and training steps = 400 when # GPU = 4

rollout_1.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wonderful Work!! A few questions regarding modelling and simulation tasks #42

Wonderful Work!! A few questions regarding modelling and simulation tasks #42

gaodechen commented Dec 11, 2024

gaodechen commented Dec 11, 2024 •

edited

Loading

Wonderful Work!! A few questions regarding modelling and simulation tasks #42

Wonderful Work!! A few questions regarding modelling and simulation tasks #42

Comments

gaodechen commented Dec 11, 2024

gaodechen commented Dec 11, 2024 • edited Loading

gaodechen commented Dec 11, 2024 •

edited

Loading