-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why you delete the COT in token level rather than step level? #4
Comments
Yeah I think removing at the step level is worth investigating, but we removed at the token level for simplicity of the approach (such that we don't need to find out where are step boundaries), and empirically this seems to work very well. That said, I think removing at the step level (or the minimal meaning unit) might be better tailored for each problem. Regarding instability, I think that's more likely to be related to removing too quickly. Empirically we found that removing slower is generally more stable than removing quickly, but at the cost of longer training schedules. |
In this way, the model will see lots of intermedia steps without semantic information. Will this affect other capabilities of the model?
Moreover, in the appendix experiment, you compared that the model was unstable when 8 tokens were deleted in each epoch. Could this instability be related to the confusion of semantic information? If there is cot in the first epoch and no cot in the second epoch, would there be such instability?
The text was updated successfully, but these errors were encountered: