Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: tweak smoke test tool bodies to standardize response text #876

Merged
merged 2 commits into from
Oct 15, 2024

Conversation

njhale
Copy link
Member

@njhale njhale commented Oct 14, 2024

Smoke tests flake for gpt-4o b/c of non-determinism in how it interpreted the test case instructions (e.g. failed to interpolate a string variable consistently). This change reduces ambiguity in the tool instructions so that it produces consistent results across smoke test runs.

Note: I regenerated golden files across all models and ran the tests 10 times per model to vet this change.

@njhale njhale requested a review from thedadams October 14, 2024 21:59
@njhale njhale changed the title test/smoke tweak tc bodies test: tweak smoke test tool bodies to standardize response text Oct 14, 2024
thedadams
thedadams previously approved these changes Oct 14, 2024
@njhale njhale requested review from thedadams and removed request for iwilltry42 and ryanhopperlowe October 14, 2024 22:41

---
name: bob
description: I'm Bob, a friendly guy.
args: question: The question to ask Bob.

When asked how I am doing, respond with exactly "Thanks for asking "${QUESTION}", I'm doing great fellow friendly AI tool!"
When asked how I am doing, respond with the following exactly: "Thanks for asking '${question}'! I'm doing great fellow friendly AI tool!" with ${question} replaced with the question text as given.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we changed this such that question should be QUESTION?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, we did.

Fixed and pushed.

Copy link
Contributor

@drpebcak drpebcak Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if this makes any functional difference since this isnt a code tool - ${QUESTION} is being made to look like an environment variable here, but there isnt anything actually setting or reading env vars... its just the llm being a smarty pants.

Copy link
Member Author

@njhale njhale Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is more for convention sake than anything.

The extra text explaining how to "interpolate" the variable is because 4o isn't actually that much of a smarty pants after all

Tweak the tool bodies for smoke test GPTScripts to reduce ambiguity in
the response. This prevents models -- like gpt-4o -- from doing things
like failing to interpolate strings consistently between runs.

Signed-off-by: Nick Hale <[email protected]>
@njhale njhale merged commit b7d31f2 into gptscript-ai:main Oct 15, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants