Suggestion to generate training data (avoid training wrong answers) #27

ArneDeutsch · 2024-09-14T17:59:11Z

ArneDeutsch
Sep 14, 2024

If I understand correctly you plan to create training questions like this:

<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
<OUTPUT>[LM]WrongAnswer[/LM]
[LM]VerifyAnswer[/LM]
[LM]SecondTryWrong[/LM]
[LM]VerifySecondTry[/LM]
[LM]ThirdTryCorrect[/LM]
[LM]VerifyThirdTry[/LM]
</OUTPUT>

Doing it like this would train the LM to produce wrong answers at first, to then correct itself multiple times to ultimately come to the correct solution. This is probably not exactly what you want, because you harm the level one thinking system (the ability to create good answers on the first try).

I would suggest to split the thinking process into multiple training steps like this (note the placement of !!! ):

<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
<OUTPUT>[LM]CorrectAnswer[/LM]
[LM]VerifyAnswer[/LM]
</OUTPUT>

<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
[LM]WrongAnswer[/LM]
<OUTPUT>[LM]VerifyAnswer[/LM]
[LM]SecondTryCorrect[/LM]
[LM]VerifySecondTry[/LM]
</OUTPUT>

<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
[LM]WrongAnswer[/LM]
[LM]VerifyAnswer[/LM]
[LM]SecondTryWrong[/LM]
<OUTPUT>[LM]VerifySecondTry[/LM]
[LM]ThirdTryCorrect[/LM]
[LM]VerifyThirdTry[/LM]
</OUTPUT>

This way you would preserve the ability to produce high quality answers in the first try while implementing the verification loop in case an answer is not correct right away. Do not train the LM to produce bad answers in the first try, just train the verification/correction loop!

daveshap · 2024-09-14T21:04:58Z

daveshap
Sep 14, 2024
Maintainer

This has been brought up multiple times, but it is a nonstarter for numerous reasons:

We want the dataset to include recognition of wrong answers
We want the model to see backtracking and self-correction
This is how OpenAI did it

Thank you for the suggestion but this is not the direction we're going at present.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion to generate training data (avoid training wrong answers) #27

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Suggestion to generate training data (avoid training wrong answers) #27

ArneDeutsch Sep 14, 2024

Replies: 1 comment

daveshap Sep 14, 2024 Maintainer

ArneDeutsch
Sep 14, 2024

daveshap
Sep 14, 2024
Maintainer