Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fs 104/Fix-Webagent #44

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion backend/src/agents/web_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ async def web_general_search_core(search_query, llm, model) -> str:
continue # Skip if the summarization is not valid
response = {
"content": summary,
"ignore_validation": "false"
"ignore_validation": "true" # This is to ignore the validation of the answer again by the supervisor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we disabling validation for the web agent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evpearce - Because we have already validated it on the line 84.

}
return json.dumps(response, indent=4)
return "No relevant information found on the internet for the given query."
Expand Down
5 changes: 4 additions & 1 deletion backend/src/prompts/templates/intent-system.j2
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,7 @@ Output your result in the following json format:
"question": "string of the original question",
"user_intent": "string of the intent of the user's question",
"questions": array of singular objective questions or if the question mentions csv, dataset or database an empty array
}
}

Guidelines:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want this?
If this is a beneficial change, then I would expect more tests to be added to intent_config.yaml to prove that this is working as expected.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that Intent was not coming right as it for ESG related tasks, it was going to Datastore agent when there were more than 1 question.
Sorry, I had no clue about intent_config.yaml, will have a look at it and add more tests related to it there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of when it was wrong and how, I'm still not quite getting the problem

- If the user has asked to check online, then each question in the questions array should also specify that.
23 changes: 20 additions & 3 deletions backend/src/prompts/templates/validator.j2
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ You are an expert validator. You can help with validating the answers to the tas

Your entire purpose is to return a "true" or "false" value to indicate if the answer has fulfilled the task, along with a reasoning to explain your decision.

You will be passed a task and an answer. You need to determine if the answer is correct or not.
You will be passed a task and an answer. You need to determine if the answer is correct or not, ensuring that the task's specific requirements are addressed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there changes to the validator template if we are disabling it for the webAgent? Again promptfoo tests should be added for these changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled the second validation, I will add promptfoo tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explaining


Output format:

Expand All @@ -14,10 +14,13 @@ json
}

**Validation Guidelines:**
- Be lenient - if the answer looks reasonably accurate, return "true".
- If multiple entities have the same highest score and this matches the query intent, return "true".
- The answer must fulfill the specific intent of the task, not just provide related information.
- Be lenient if the answer is reasonably accurate and fulfills the task's intent, even if it lacks minor details.
- If specific data (like a list of companies) is requested but missing, return "false."
- If multiple entities have the same highest score and this matches the query intent, return "true."
- Spending is negative; ensure any calculations involving spending reflect this if relevant to the task.


Example:
Task: What is 2 + 2?
Answer: 4
Expand All @@ -33,6 +36,20 @@ Answer: 5
"reasoning": "The answer is incorrect; 2 + 2 equals 4, not 5."
}

Task: Provide a list of companies with the highest ESG scores in the Technology sector.
Answer: As of the end of 2023, the Technology sector had the highest weighted-average ESG score among all sectors, according to the MSCI ACWI SRI Index. However, I don't have a specific list of individual companies with the highest scores.
{
"response": "false",
"reasoning": "The answer provides general information about ESG scores in the Technology sector but fails to fulfill the task's intent of listing companies with the highest scores."
}

Task: Provide a list of companies with the highest ESG scores in the Technology sector.
Answer: Here are the companies with the highest ESG scores in the Technology sector: 1. Apple Inc., 2. Microsoft Corp., 3. Alphabet Inc.
{
"response": "true",
"reasoning": "The answer lists companies with the highest ESG scores in the Technology sector, fulfilling the task's intent."
}

Task: What are Apple's ESG scores?
Answer: Apple's ESG (Environmental, Social, and Governance) scores are as follows: Environmental Score of 95.0, Social Score of 90.0, Governance Score of 92.0.
{
Expand Down
2 changes: 1 addition & 1 deletion backend/src/utils/web_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
engine = PromptEngine()


async def search_urls(search_query, num_results=10) -> str:
async def search_urls(search_query, num_results=30) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change slow down the web agent significantly? We are looking to improve the web agent search in general with https://scottlogic.atlassian.net/browse/FS-46

logger.info(f"Searching the web for: {search_query}")
try:
https_urls = [str(url) for url in search(search_query, num_results=num_results) if str(url).startswith("https")]
Expand Down
Loading