Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fs 80/use https for webagent search #21

Merged
merged 15 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions backend/src/agents/web_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ async def web_general_search_core(search_query, llm, model) -> str:
}
return json.dumps(response, indent=4)
logger.info(f'Answer found successfully {answer_result}')
valid_answer = json.loads(answer_result["response"]).get("is_valid", "")
if valid_answer:
search_more = json.loads(answer_result["response"]).get("search_more", "")
if not search_more:
Copy link
Collaborator

@IMladjenovic IMladjenovic Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a bit of extra safety, can we also update perform_scrape method in this class to verify that the urls it has been given are https? This method would be a good one to add tests for as well

final_answer = json.loads(answer_result["response"]).get("answer", "")
if not final_answer:
return "No answer found."
Expand Down
22 changes: 11 additions & 11 deletions backend/src/prompts/templates/answer-user-ques.j2
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,24 @@ User's question is:
Once you generate an answer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this file (and all references) be renamed to "answer-user-question"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry to add some housekeeping!)

- **Check** if the answer completely and accurately addresses the user's question.
- **Determine** if the answer is valid, based on the content provided.
- **Provide** a reason for the validity or invalidity of the answer.
- **Indicate** if a more general web search is required then indicate that.

Reply only in JSON format with the following structure:

```json
{
"answer": "The answer to the user's question, based on the content provided",
"is_valid": true or false,
"validation_reason": "A sentence explaining whether the answer is valid or not, and why"
"search_more": true or false,
"search_more_reason": "A sentence explaining whether the answer is valid or not, and why"
}



### **Explanation:**

1. **Answer**: The LLM generates an answer based on the user’s question and the provided content.
2. **Validity Check**: The LLM checks if its generated answer is complete and correct. This could be based on factual accuracy, coverage of the query, or relevance to the user's question.
3. **Validation Reason**: The LLM explains why the answer is valid or invalid.
2. **Search More**: If a more general web search using a web search engine is required then indicate that, it should set this field to true.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be renamed "should perform web search"?

Am I correct in understanding that when we trigger a web_general_search from the WebAgent we're getting to this prompt first and attempting to answer the question using the underlying models training data, and if it doesn't have info we then perform a web search?

3. **Search More Reason**: The LLM explains why further web search is required.

### **Example of Usage:**

Expand All @@ -40,15 +41,14 @@ Reply only in JSON format with the following structure:
```json
{
"answer": "Tesla's revenue since its creation is: 2008: $15 million, 2009: $30 million, ..., 2023: $81 billion.",
"is_valid": true,
"validation_reason": "The answer includes Tesla's revenue for every year since its creation, based on the data provided."
"search_more": false,
"search_more_reason": "The answer includes Tesla's revenue for every year since its creation, based on the data provided. No further search is required."
}

{
"answer": "Tesla's revenue for 2010 to 2023 is available, but data for the earlier years is missing.",
"is_valid": false,
"validation_reason": "The answer is incomplete because data for Tesla's early years is missing."
"search_more": true,
"search_more_reason": "The answer is incomplete because data for Tesla's early years is missing. A more general web search is required."
}


Important: If the question is related to real time data, the LLM should provide is_valid is false.
Important: If the question is related to real time data, the LLM should provide search_more is true.
6 changes: 2 additions & 4 deletions backend/src/utils/web_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,12 @@

async def search_urls(search_query, num_results=10) -> str:
logger.info(f"Searching the web for: {search_query}")
urls = []
try:
for url in search(search_query, num_results=num_results):
urls.append(url)
https_urls = [str(url) for url in search(search_query, num_results=num_results) if str(url).startswith("https")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test this by mocking out search library?

return json.dumps(
{
"status": "success",
"urls": urls,
"urls": https_urls,
"error": None,
}
)
Expand Down
73 changes: 0 additions & 73 deletions backend/tests/websockets/user_confirmer_test.py

This file was deleted.

Loading