Fine-Tuning in Relative Action Space #14

lakomchik · 2024-11-06T11:32:07Z

I woule like for fine-tune RDT in a relative action space and have a question regarding the best method for mapping actions and proprieception.

Question: For fine-tuning a model in relative action space, would it be preferable to:

Map relative joint positions directly into a unified action space, as per existing guidelines?
Normalize values (e.g., scaling from -1 to 1) and then project these normalized values into the unified action space?

Using relative actions results in a smaller range for proprioception and action values. I’m curious if normalizing these values could help them better align with the model’s expected action space.

csuastt · 2024-11-08T08:02:24Z

Map them into the velocity slots in the unified action space (e.g., delta eef positions should be in eef position velocities). You could do normalization when fine-tuning.

lakomchik · 2024-11-11T11:23:08Z

@csuastt Thank you for your answer!

budzianowski · 2024-11-14T21:53:47Z

@csuastt - all preprocessing scripts are using eef_delta_pos_x (for example:

RoboticsDiffusionTransformer/data/preprocess_scripts/bridgev2.py

Line 128 in 414715e

    
           "eef_delta_pos_x,eef_delta_pos_y,eef_delta_pos_z, eef_delta_angle_x, eef_delta_angle_y, eef_delta_angle_z, eef_delta_angle_w, gripper_open")

(all the other preprocess_scripts for OXE as well).
I can't find the place where these slots are formatted to eef_vel_x?

alik-git · 2024-11-14T22:34:36Z

@csuastt Follow up question: In the example above you can see the model is predicting eef_ang_x, y, z, w for a quaternion, but I don't see a way to map these directly to velocities, because in the STATE_VEC_IDX_MAPPING all the angular velocities are roll pitch raw only, see here

RoboticsDiffusionTransformer/configs/state_vec.py

Line 62 in 414715e

'eef_angular_vel_roll': 42,

Could you please clarify which indices and how exactly you are mapping the quaternion eef_ang_x, y, z, w to STATE_VEC_IDX_MAPPING. That would be greatly appreciated. Thank you!

csuastt · 2024-11-15T06:27:09Z

@alik-git @budzianowski Sorry, it is our mistake:( In the current implementation, we do not use any action in TFDataset. We use future states instead. To use action, you may need to do some modifications:

In this line, remove the function converting RPY to quat, it was a mistake and we forgot to delete:

RoboticsDiffusionTransformer/data/preprocess_scripts/bridgev2.py

Line 115 in 414715e

eef_ang = euler_to_quaternion(eef_ang)

The original action is already delta RPY, which is the angle velocity.

you may need to modify the follow-up preprocessing script to make the producer generate the action instead of future states. See this readme:

https://github.com/thu-ml/RoboticsDiffusionTransformer/blob/main/docs/pretrain.md?plain=1#L242

budzianowski · 2024-11-15T07:44:18Z

Thanks for the prompt reply, this is very helpful! One more question - the model used in the demos from the paper is also following the this logic or the finetuning was performed with the modified logic?

ethan-iai · 2024-11-16T02:11:59Z

To clarify, in the demos mentioned in the paper, we predict the actions rather than future states. It depends on your robot, in our robot (ALOHA), the future states and actions are different. Please let me know if you’d like further details!

budzianowski · 2024-11-16T19:34:06Z

@ethan-iai Thanks for helpful explanation! If that's the case I'm still puzzled by the finetuning agilex setup where the actions are used?

alik-git · 2024-11-16T22:25:50Z

@ethan-iai @csuastt I just want to clarify when you say "predict the actions" are you saying that the neural network model directly outputs action deltas as the logits? or are you saying that the model directly outputs future states, and then you use that to manually compute the action deltas (future_state - current_state = action_deltas)?

The reason I ask is that during pretraining the model predicts future states as the logits (please correct me if that's wrong), so why not keep that consistent during fine tuning as well?

Just for context: we are trying to evaluate RDT on controlling a widowx robot arm. We are wondering if during our finetuning, it would be better to have the gt labels be the future states, and then compute the action deltas manually during deployment, or finetune with action deltas directly as the gt_labels. My naive assumption was that finetuning with action deltas directly would be worse since the model has to relearn more stuff (due to differences in scale and representation, e.g., smaller ranges for action deltas compared to joint positions).

But if you finetuned directly on action deltas (and empirically found it to be better) then we should reconsider our approach of finetuning on future states. Sorry for the long question, I just wanted to be extra clear in what the confusion is. Thank you for your time in answering all these questions, we greatly appreciate it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-Tuning in Relative Action Space #14

Fine-Tuning in Relative Action Space #14

lakomchik commented Nov 6, 2024

csuastt commented Nov 8, 2024

lakomchik commented Nov 11, 2024

budzianowski commented Nov 14, 2024 •

edited

Loading

alik-git commented Nov 14, 2024

csuastt commented Nov 15, 2024

budzianowski commented Nov 15, 2024

ethan-iai commented Nov 16, 2024 •

edited

Loading

budzianowski commented Nov 16, 2024

alik-git commented Nov 16, 2024

Fine-Tuning in Relative Action Space #14

Fine-Tuning in Relative Action Space #14

Comments

lakomchik commented Nov 6, 2024

csuastt commented Nov 8, 2024

lakomchik commented Nov 11, 2024

budzianowski commented Nov 14, 2024 • edited Loading

alik-git commented Nov 14, 2024

csuastt commented Nov 15, 2024

budzianowski commented Nov 15, 2024

ethan-iai commented Nov 16, 2024 • edited Loading

budzianowski commented Nov 16, 2024

alik-git commented Nov 16, 2024

budzianowski commented Nov 14, 2024 •

edited

Loading

ethan-iai commented Nov 16, 2024 •

edited

Loading