-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fs 104/Fix-Webagent #44
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,4 +13,7 @@ Output your result in the following json format: | |
"question": "string of the original question", | ||
"user_intent": "string of the intent of the user's question", | ||
"questions": array of singular objective questions or if the question mentions csv, dataset or database an empty array | ||
} | ||
} | ||
|
||
Guidelines: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we want this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I saw that Intent was not coming right as it for ESG related tasks, it was going to Datastore agent when there were more than 1 question. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have an example of when it was wrong and how, I'm still not quite getting the problem |
||
- If the user has asked to check online, then each question in the questions array should also specify that. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,7 +2,7 @@ You are an expert validator. You can help with validating the answers to the tas | |
|
||
Your entire purpose is to return a "true" or "false" value to indicate if the answer has fulfilled the task, along with a reasoning to explain your decision. | ||
|
||
You will be passed a task and an answer. You need to determine if the answer is correct or not. | ||
You will be passed a task and an answer. You need to determine if the answer is correct or not, ensuring that the task's specific requirements are addressed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are there changes to the validator template if we are disabling it for the webAgent? Again promptfoo tests should be added for these changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I disabled the second validation, I will add promptfoo tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, thanks for explaining |
||
|
||
Output format: | ||
|
||
|
@@ -14,10 +14,13 @@ json | |
} | ||
|
||
**Validation Guidelines:** | ||
- Be lenient - if the answer looks reasonably accurate, return "true". | ||
- If multiple entities have the same highest score and this matches the query intent, return "true". | ||
- The answer must fulfill the specific intent of the task, not just provide related information. | ||
- Be lenient if the answer is reasonably accurate and fulfills the task's intent, even if it lacks minor details. | ||
- If specific data (like a list of companies) is requested but missing, return "false." | ||
- If multiple entities have the same highest score and this matches the query intent, return "true." | ||
- Spending is negative; ensure any calculations involving spending reflect this if relevant to the task. | ||
|
||
|
||
Example: | ||
Task: What is 2 + 2? | ||
Answer: 4 | ||
|
@@ -33,6 +36,20 @@ Answer: 5 | |
"reasoning": "The answer is incorrect; 2 + 2 equals 4, not 5." | ||
} | ||
|
||
Task: Provide a list of companies with the highest ESG scores in the Technology sector. | ||
Answer: As of the end of 2023, the Technology sector had the highest weighted-average ESG score among all sectors, according to the MSCI ACWI SRI Index. However, I don't have a specific list of individual companies with the highest scores. | ||
{ | ||
"response": "false", | ||
"reasoning": "The answer provides general information about ESG scores in the Technology sector but fails to fulfill the task's intent of listing companies with the highest scores." | ||
} | ||
|
||
Task: Provide a list of companies with the highest ESG scores in the Technology sector. | ||
Answer: Here are the companies with the highest ESG scores in the Technology sector: 1. Apple Inc., 2. Microsoft Corp., 3. Alphabet Inc. | ||
{ | ||
"response": "true", | ||
"reasoning": "The answer lists companies with the highest ESG scores in the Technology sector, fulfilling the task's intent." | ||
} | ||
|
||
Task: What are Apple's ESG scores? | ||
Answer: Apple's ESG (Environmental, Social, and Governance) scores are as follows: Environmental Score of 95.0, Social Score of 90.0, Governance Score of 92.0. | ||
{ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,7 +13,7 @@ | |
engine = PromptEngine() | ||
|
||
|
||
async def search_urls(search_query, num_results=10) -> str: | ||
async def search_urls(search_query, num_results=30) -> str: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this change slow down the web agent significantly? We are looking to improve the web agent search in general with https://scottlogic.atlassian.net/browse/FS-46 |
||
logger.info(f"Searching the web for: {search_query}") | ||
try: | ||
https_urls = [str(url) for url in search(search_query, num_results=num_results) if str(url).startswith("https")] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we disabling validation for the web agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evpearce - Because we have already validated it on the line 84.