Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why you delete the COT in token level rather than step level? #4

Open
Monstarrr opened this issue Jul 29, 2024 · 1 comment
Open

Why you delete the COT in token level rather than step level? #4

Monstarrr opened this issue Jul 29, 2024 · 1 comment

Comments

@Monstarrr
Copy link

In this way, the model will see lots of intermedia steps without semantic information. Will this affect other capabilities of the model?

Moreover, in the appendix experiment, you compared that the model was unstable when 8 tokens were deleted in each epoch. Could this instability be related to the confusion of semantic information? If there is cot in the first epoch and no cot in the second epoch, would there be such instability?

@da03
Copy link
Owner

da03 commented Aug 27, 2024

Yeah I think removing at the step level is worth investigating, but we removed at the token level for simplicity of the approach (such that we don't need to find out where are step boundaries), and empirically this seems to work very well. That said, I think removing at the step level (or the minimal meaning unit) might be better tailored for each problem.

Regarding instability, I think that's more likely to be related to removing too quickly. Empirically we found that removing slower is generally more stable than removing quickly, but at the cost of longer training schedules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants