Why you delete the COT in token level rather than step level? #4

Monstarrr · 2024-07-29T09:52:48Z

In this way, the model will see lots of intermedia steps without semantic information. Will this affect other capabilities of the model?

Moreover, in the appendix experiment, you compared that the model was unstable when 8 tokens were deleted in each epoch. Could this instability be related to the confusion of semantic information? If there is cot in the first epoch and no cot in the second epoch, would there be such instability?

da03 · 2024-08-27T06:46:58Z

Yeah I think removing at the step level is worth investigating, but we removed at the token level for simplicity of the approach (such that we don't need to find out where are step boundaries), and empirically this seems to work very well. That said, I think removing at the step level (or the minimal meaning unit) might be better tailored for each problem.

Regarding instability, I think that's more likely to be related to removing too quickly. Empirically we found that removing slower is generally more stable than removing quickly, but at the cost of longer training schedules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why you delete the COT in token level rather than step level? #4

Why you delete the COT in token level rather than step level? #4

Monstarrr commented Jul 29, 2024

da03 commented Aug 27, 2024

Why you delete the COT in token level rather than step level? #4

Why you delete the COT in token level rather than step level? #4

Comments

Monstarrr commented Jul 29, 2024

da03 commented Aug 27, 2024