The role of dirty bit #36

COST-97 · 2024-12-05T03:03:16Z

Hello:
I carefully checked the data reading code _safe_load and found that the dirty bit seems to be used only to mark the data that has not been selected. After the data is selected for training, it is further marked by the code:

dirty_bit = read_dirty_bit(read_chunk_dir)
dirty_bit[read_chunk_item_index] = 1
save_dirty_bit(read_chunk_dir, dirty_bit)

So, if I understand correctly, during the training process, more and more data is marked, and less and less data is used for training. This is different from the common uniform sampling of data for training. Why is this data reading method used?

Could you give me some advice? Thank you very much!

The text was updated successfully, but these errors were encountered:

csuastt · 2024-12-10T18:58:39Z

We use a buffer to temporarily store samples. The dirty bit is used to tell which samples have been loaded. When the consumer reads samples from the buffer, it will not select those dirty samples to avoid repeated loading. The producer is responsible for replacing those dirty samples with new samples.

COST-97 · 2024-12-12T11:47:43Z

Hello:
I am glad to receive your reply.
I understand that "The dirty bit is used to tell which samples have been loaded.", but generally speaking, during the training phase of the model, the data in the dataset (or buffer) should be sampled evenly in batches, and some samples may indeed be sampled again. So in your method, why do you want to avoid data from being sampled repeatedly?

Moreover, I only see the read data marked as dirty in _safe_load:

dirty_bit = read_dirty_bit(read_chunk_dir)
dirty_bit[read_chunk_item_index] = 1
save_dirty_bit(read_chunk_dir, dirty_bit)

It seems that I have not seen the implementation of "replacing those dirty samples with new samples".
Maybe there are some details that I didn't notice, which led to some confusion. Looking forward to your feedback.
Thank you so much!

csuastt · 2024-12-12T19:25:34Z

See:

https://github.com/thu-ml/RoboticsDiffusionTransformer/blob/main/data/producer.py#L233

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The role of dirty bit #36

The role of dirty bit #36

COST-97 commented Dec 5, 2024

csuastt commented Dec 10, 2024

COST-97 commented Dec 12, 2024

csuastt commented Dec 12, 2024

The role of dirty bit #36

The role of dirty bit #36

Comments

COST-97 commented Dec 5, 2024

csuastt commented Dec 10, 2024

COST-97 commented Dec 12, 2024

csuastt commented Dec 12, 2024