-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: tweak smoke test tool bodies to standardize response text #876
Conversation
|
||
--- | ||
name: bob | ||
description: I'm Bob, a friendly guy. | ||
args: question: The question to ask Bob. | ||
|
||
When asked how I am doing, respond with exactly "Thanks for asking "${QUESTION}", I'm doing great fellow friendly AI tool!" | ||
When asked how I am doing, respond with the following exactly: "Thanks for asking '${question}'! I'm doing great fellow friendly AI tool!" with ${question} replaced with the question text as given. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we changed this such that question
should be QUESTION
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, we did.
Fixed and pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious if this makes any functional difference since this isnt a code tool - ${QUESTION}
is being made to look like an environment variable here, but there isnt anything actually setting or reading env vars... its just the llm being a smarty pants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is more for convention sake than anything.
The extra text explaining how to "interpolate" the variable is because 4o isn't actually that much of a smarty pants after all
Tweak the tool bodies for smoke test GPTScripts to reduce ambiguity in the response. This prevents models -- like gpt-4o -- from doing things like failing to interpolate strings consistently between runs. Signed-off-by: Nick Hale <[email protected]>
Signed-off-by: Nick Hale <[email protected]>
1c95f69
to
1b6f172
Compare
Smoke tests flake for
gpt-4o
b/c of non-determinism in how it interpreted the test case instructions (e.g. failed to interpolate a string variable consistently). This change reduces ambiguity in the tool instructions so that it produces consistent results across smoke test runs.Note: I regenerated golden files across all models and ran the tests 10 times per model to vet this change.