-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Sidebar - Max #27091
Open
slshults
wants to merge
43
commits into
master
Choose a base branch
from
support-sidebar-max-integration
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+3,505
−553
Open
feat: Support Sidebar - Max #27091
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
55b8ac9
Support sidebar Max AI - not yet integrated
slshults a386201
Fixed the `Send` button, no longer looks disabled.
slshults eb2a395
Fixed autoscrolling
slshults 9c62eaa
Created an exception to allow to work on links in Max's responses. A…
slshults f764a7c
Put a stop to the weird autoscroll upward and
slshults 2204e21
Restoring error handling for 500 errors
slshults 8e3058a
On the support sidebar, hid the docs ToC behind
slshults 4264bbf
Fixed links to the support form, generated by Max, so that they open …
slshults b8f3401
Starting to integrate Max. He works in the previous commit, broken in…
slshults cd6e53d
A step in bringing this branch closer to up-to-date with master
slshults 39da5d1
Another step in bringing up to date with
slshults 6e3c9f5
another step in bringing the branch up to date with master. Server c…
slshults 828143d
Max is now integrated and working on my local, steps remain (secrets …
slshults 76d5364
catching up with master
slshults 7e911a9
Simplify settings
Twixes 6df7860
merged Max's requirements, fixed session handling, simplified convers…
slshults 671b350
Merge remote-tracking branch 'origin/master' into support-sidebar-max…
slshults 3273fbc
Raising the feature flag. ⛳️
slshults 0028ae4
catching up merged requirements-dev.txt
slshults 624069c
Merge branch 'master' into support-sidebar-max-integration
Twixes 04c2256
Remove redundant new deps
Twixes 5ae5b1d
Roll back some redundant frontend changes
Twixes 3206da7
Update query snapshots
github-actions[bot] 51a6bcf
Update UI snapshots for `chromium` (1)
github-actions[bot] 7be9b10
Roll back some redundant frontend changes
Twixes b0273dc
Merge branch 'support-sidebar-max-integration' of https://github.com/…
Twixes e00b48c
steps toward moving functions from python to django. The connection t…
slshults 5487d9b
Update query snapshots
github-actions[bot] 0657a19
Update UI snapshots for `chromium` (1)
github-actions[bot] 78bbc9f
Merge branch 'master' into support-sidebar-max-integration
Twixes a47ce70
Remove some dead code
Twixes f286857
Fix views.py
Twixes 601cc4d
Merge branch 'master' into support-sidebar-max-integration
slshults 8be4487
Finished moving functions from Flask to Django, fixed resulting bugs,…
slshults 4de70d0
Merge remote-tracking branch 'origin/master' into support-sidebar-max…
slshults 1983eb8
Resolving 'ERR_PNPM_LOCKFILE_CONFIG_MISMATCH'
slshults 9eeff2a
Update query snapshots
github-actions[bot] 1158d70
chore: update dependencies from master
slshults f77cd57
Merge branch 'support-sidebar-max-integration' of https://github.com/…
slshults 5d9532e
Merge branch 'master' into support-sidebar-max-integration
slshults e489b1e
Reducing token usage by 33% to 66% per turn with a clever system prom…
slshults ab99ce4
upgrading to Anthropic Python SDK v 0.42.0, corrected some logging, d…
slshults 79fcceb
Merge branch 'master' into support-sidebar-max-integration
slshults File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# This file is intentionally empty to mark the directory as a Python package. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
import requests | ||
from bs4 import BeautifulSoup # type: ignore | ||
import logging | ||
|
||
logging.basicConfig(level=logging.INFO) | ||
logger = logging.getLogger(__name__) | ||
|
||
SITEMAP_URL = "https://posthog.com/sitemap/sitemap-0.xml" | ||
|
||
STATUS_PAGE_URL = "https://status.posthog.com" | ||
|
||
HOGQL_PRIORITY_URLS = [ | ||
"https://posthog.com/docs/hogql", | ||
"https://posthog.com/docs/hogql/aggregations", | ||
"https://posthog.com/docs/hogql/clickhouse-functions", | ||
"https://posthog.com/docs/hogql/expressions", | ||
"https://posthog.com/docs/product-analytics/sql", | ||
] | ||
|
||
|
||
def is_hogql_query(query): | ||
hogql_keywords = ["hogql", "sql", "query", "aggregate", "function", "expression"] | ||
return any(keyword in query.lower() for keyword in hogql_keywords) | ||
|
||
|
||
def is_status_query(query): | ||
status_keywords = ["status", "incident", "outage", "downtime", "ingestion", "slow", "lag", "delays"] | ||
return any(keyword in query.lower() for keyword in status_keywords) | ||
|
||
|
||
def get_relevant_urls(query): | ||
urls = [] | ||
|
||
try: | ||
response = requests.get(SITEMAP_URL) | ||
response.raise_for_status() | ||
soup = BeautifulSoup(response.content, "xml") | ||
for url in soup.find_all("loc"): | ||
loc = url.text | ||
if "/questions/" not in loc: | ||
urls.append(loc) | ||
if is_hogql_query(query): | ||
urls.extend(HOGQL_PRIORITY_URLS) | ||
urls.append(STATUS_PAGE_URL) | ||
return urls | ||
except requests.RequestException as e: | ||
logger.error(f"Error fetching sitemap: {str(e)}") # noqa: TRY400 | ||
return urls | ||
|
||
|
||
def prioritize_urls(urls, query): | ||
priority_dirs = { | ||
"docs": ["docs", "tutorials"], | ||
"how": ["docs", "tutorials"], | ||
"pricing": ["pricing"], | ||
"jobs": ["careers"], | ||
"history": ["about", "handbook", "blog"], | ||
"teams": ["teams"], | ||
} | ||
|
||
query_type = "docs" # default | ||
for key in priority_dirs: | ||
if key in query.lower(): | ||
query_type = key | ||
break | ||
|
||
def calculate_relevance(url): | ||
query_words = query.lower().split() | ||
url_lower = url.lower() | ||
word_match_score = sum(3 if word in url_lower else 1 for word in query_words if word in url_lower) | ||
url_depth = len(url.strip("/").split("/")) | ||
depth_score = min(url_depth, 5) | ||
priority_score = 5 if any(dir in url for dir in priority_dirs[query_type]) else 0 | ||
|
||
if is_hogql_query(query) and url in HOGQL_PRIORITY_URLS: | ||
priority_score += 10 | ||
|
||
if is_status_query(query) and url == STATUS_PAGE_URL: | ||
priority_score += 15 | ||
|
||
return (word_match_score * 2) + (depth_score * 1.5) + priority_score | ||
|
||
return sorted(urls, key=calculate_relevance, reverse=True) | ||
|
||
|
||
def max_search_tool(query): | ||
relevant_urls = get_relevant_urls(query) | ||
prioritized_urls = prioritize_urls(relevant_urls, query) | ||
results = [] | ||
errors = [] | ||
|
||
max_urls_to_process = 30 | ||
max_chars = 10000 | ||
relevance_threshold = 0.6 | ||
min_results = 5 | ||
|
||
def has_highly_relevant_results(results, threshold=2): | ||
return len(results) >= threshold and all( | ||
len(result["relevant_passages"]) >= 2 for result in results[:threshold] | ||
) | ||
|
||
for url in prioritized_urls[:max_urls_to_process]: | ||
try: | ||
logger.info(f"Searching {url}") | ||
response = requests.get(url, allow_redirects=True, timeout=10) | ||
response.raise_for_status() | ||
soup = BeautifulSoup(response.content, "html.parser") | ||
|
||
for script in soup(["script", "style"]): | ||
script.decompose() | ||
text = soup.get_text() | ||
lines = (line.strip() for line in text.splitlines()) | ||
chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) | ||
text = "\n".join(chunk for chunk in chunks if chunk) | ||
|
||
paragraphs = text.split("\n\n") | ||
relevant_passages = [] | ||
for i, paragraph in enumerate(paragraphs): | ||
relevance_score = sum(word.lower() in paragraph.lower() for word in query.split()) | ||
if relevance_score > 0: | ||
relevant_text = paragraph | ||
char_count = len(relevant_text) | ||
|
||
for j in range(i + 1, min(i + 5, len(paragraphs))): | ||
if char_count + len(paragraphs[j]) <= max_chars: | ||
relevant_text += "\n\n" + paragraphs[j] | ||
char_count += len(paragraphs[j]) | ||
else: | ||
break | ||
|
||
heading = "Unknown Section" | ||
for tag in soup.find_all(["h1", "h2", "h3", "h4", "h5", "h6"]): | ||
if tag.string and tag.string in paragraph: | ||
heading = tag.string | ||
break | ||
|
||
relevant_passages.append( | ||
{ | ||
"text": relevant_text[:10000], | ||
"url": url, | ||
"heading": heading, | ||
"relevance_score": relevance_score, | ||
} | ||
) | ||
|
||
if relevant_passages: | ||
relevant_passages.sort(key=lambda x: x["relevance_score"], reverse=True) | ||
result = { | ||
"page_title": soup.title.string if soup.title else "Untitled", | ||
"url": url, | ||
"relevant_passages": relevant_passages[:4], | ||
} | ||
results.append(result) | ||
|
||
if len(results) >= min_results and relevant_passages[0]["relevance_score"] > relevance_threshold: | ||
logger.info(f"Found sufficient relevant results so stopping search.") | ||
break | ||
|
||
if has_highly_relevant_results(results): | ||
logger.info("Found highly relevant results so stopping search.") | ||
break | ||
|
||
except requests.RequestException as e: | ||
error_message = f"Error fetching {url}: {str(e)}" | ||
logger.error(error_message) # noqa: TRY400 | ||
errors.append(error_message) | ||
|
||
if not results and not errors: | ||
return ( | ||
"Well this is odd. My searches aren't finding anything for that. Could you try asking with different words?" | ||
) | ||
elif errors and not results: | ||
return f"Oof. Sorry about this. I ran into errors when trying to search: {'; '.join(errors)}" | ||
else: | ||
return results[:5] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
import time | ||
import logging | ||
|
||
|
||
class ConversationHistory: | ||
def __init__(self): | ||
self.turns = [] | ||
self.last_access = time.time() # Add timestamp | ||
|
||
def touch(self): | ||
"""Update last access time""" | ||
self.last_access = time.time() | ||
|
||
def add_turn_user(self, content): | ||
self.touch() # Update timestamp on activity | ||
self.turns.append( | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": content, | ||
} | ||
], | ||
} | ||
) | ||
|
||
def add_turn_assistant(self, content): | ||
self.touch() # Update timestamp on activity | ||
if isinstance(content, list): | ||
# Content is already properly structured | ||
self.turns.append({"role": "assistant", "content": content}) | ||
else: | ||
# Simple text responses | ||
self.turns.append( | ||
{ | ||
"role": "assistant", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": content, | ||
} | ||
], | ||
} | ||
) | ||
|
||
def get_turns(self): | ||
self.touch() # Update timestamp on activity | ||
return self.turns | ||
|
||
|
||
# Active logging configuration used by ViewSet | ||
logging.basicConfig( | ||
level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[logging.StreamHandler()] | ||
) | ||
logger = logging.getLogger(__name__) | ||
|
||
# Active tool definition used by ViewSet | ||
max_search_tool_tool = { | ||
"name": "max_search_tool", | ||
"description": ( | ||
"Searches the PostHog documentation at https://posthog.com/docs, " | ||
"https://posthog.com/tutorials, to find information relevant to the " | ||
"user's question. The search query should be a question specific to using " | ||
"and configuring PostHog." | ||
), | ||
"cache_control": {"type": "ephemeral"}, | ||
"input_schema": { | ||
"type": "object", | ||
"properties": { | ||
"query": { | ||
"type": "string", | ||
"description": "The search query, in the form of a question, related to PostHog usage and configuration.", | ||
} | ||
}, | ||
"required": ["query"], | ||
}, | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of these additions should be needed after we've moved this Max out of Flask and into Django – let's remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some functions were moved, but the main python script is still in use and still in
ee/support_sidebar_max
(sidebar_max_AI.py
), so I still don't want to be pushing those items from my local. (I can and will removemax-test-venv
though, done with that.)