Suggestion to generate training data (avoid training wrong answers) #27
Closed
ArneDeutsch
started this conversation in
Data
Replies: 1 comment
-
This has been brought up multiple times, but it is a nonstarter for numerous reasons:
Thank you for the suggestion but this is not the direction we're going at present. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
If I understand correctly you plan to create training questions like this:
<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
<OUTPUT>[LM]WrongAnswer[/LM]
[LM]VerifyAnswer[/LM]
[LM]SecondTryWrong[/LM]
[LM]VerifySecondTry[/LM]
[LM]ThirdTryCorrect[/LM]
[LM]VerifyThirdTry[/LM]
</OUTPUT>
Doing it like this would train the LM to produce wrong answers at first, to then correct itself multiple times to ultimately come to the correct solution. This is probably not exactly what you want, because you harm the level one thinking system (the ability to create good answers on the first try).
I would suggest to split the thinking process into multiple training steps like this (note the placement of !!! ):
<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
<OUTPUT>[LM]CorrectAnswer[/LM]
[LM]VerifyAnswer[/LM]
</OUTPUT>
<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
[LM]WrongAnswer[/LM]
<OUTPUT>[LM]VerifyAnswer[/LM]
[LM]SecondTryCorrect[/LM]
[LM]VerifySecondTry[/LM]
</OUTPUT>
<INPUT>[User]QuestionThatRequiresThinking[/User]</INPUT>
[LM]WrongAnswer[/LM]
[LM]VerifyAnswer[/LM]
[LM]SecondTryWrong[/LM]
<OUTPUT>[LM]VerifySecondTry[/LM]
[LM]ThirdTryCorrect[/LM]
[LM]VerifyThirdTry[/LM]
</OUTPUT>
This way you would preserve the ability to produce high quality answers in the first try while implementing the verification loop in case an answer is not correct right away. Do not train the LM to produce bad answers in the first try, just train the verification/correction loop!
Beta Was this translation helpful? Give feedback.
All reactions