You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should change that to "" and "" to try to get the judge to talk in terms of expected and actual.
Here's an example explanation issued by the judge
The two programs perform distinct operations.
Program 1:
Uses jq to extract the .requestHtml field from a JSON file and stores the result in another file.
Then, it displays the content of the newly created file using cat.
Program 2:
Utilizes curl to make an HTTP POST request to fetch logs data and store the response in a JSON file.
Checks for successful execution and displays an error message if the curl command fails.
The two programs have no overlapping functionality and perform entirely different tasks. Therefore, they are not equivalent.
The text was updated successfully, but these errors were encountered:
Right now in its explanations, the LLM as judge is referring to program1 and program2.
This is likely because our prompt is using "" and ""
foyle/app/pkg/eval/judge_prompt.tmpl
Line 14 in a52dcde
We should change that to "" and "" to try to get the judge to talk in terms of expected and actual.
Here's an example explanation issued by the judge
The two programs perform distinct operations.
Program 1:
Uses jq to extract the .requestHtml field from a JSON file and stores the result in another file.
Then, it displays the content of the newly created file using cat.
Program 2:
Utilizes curl to make an HTTP POST request to fetch logs data and store the response in a JSON file.
Checks for successful execution and displays an error message if the curl command fails.
The two programs have no overlapping functionality and perform entirely different tasks. Therefore, they are not equivalent.
The text was updated successfully, but these errors were encountered: