Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Sidebar - Max #27091

Open
wants to merge 43 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
55b8ac9
Support sidebar Max AI - not yet integrated
slshults Dec 11, 2024
a386201
Fixed the `Send` button, no longer looks disabled.
slshults Dec 11, 2024
eb2a395
Fixed autoscrolling
slshults Dec 12, 2024
9c62eaa
Created an exception to allow to work on links in Max's responses. A…
slshults Dec 12, 2024
f764a7c
Put a stop to the weird autoscroll upward and
slshults Dec 13, 2024
2204e21
Restoring error handling for 500 errors
slshults Dec 14, 2024
8e3058a
On the support sidebar, hid the docs ToC behind
slshults Dec 14, 2024
4264bbf
Fixed links to the support form, generated by Max, so that they open …
slshults Dec 15, 2024
b8f3401
Starting to integrate Max. He works in the previous commit, broken in…
slshults Dec 16, 2024
cd6e53d
A step in bringing this branch closer to up-to-date with master
slshults Dec 16, 2024
39da5d1
Another step in bringing up to date with
slshults Dec 17, 2024
6e3c9f5
another step in bringing the branch up to date with master. Server c…
slshults Dec 17, 2024
828143d
Max is now integrated and working on my local, steps remain (secrets …
slshults Dec 17, 2024
76d5364
catching up with master
slshults Dec 17, 2024
7e911a9
Simplify settings
Twixes Dec 17, 2024
6df7860
merged Max's requirements, fixed session handling, simplified convers…
slshults Dec 18, 2024
671b350
Merge remote-tracking branch 'origin/master' into support-sidebar-max…
slshults Dec 18, 2024
3273fbc
Raising the feature flag. ⛳️
slshults Dec 18, 2024
0028ae4
catching up merged requirements-dev.txt
slshults Dec 18, 2024
624069c
Merge branch 'master' into support-sidebar-max-integration
Twixes Dec 18, 2024
04c2256
Remove redundant new deps
Twixes Dec 18, 2024
5ae5b1d
Roll back some redundant frontend changes
Twixes Dec 18, 2024
3206da7
Update query snapshots
github-actions[bot] Dec 18, 2024
51a6bcf
Update UI snapshots for `chromium` (1)
github-actions[bot] Dec 18, 2024
7be9b10
Roll back some redundant frontend changes
Twixes Dec 18, 2024
b0273dc
Merge branch 'support-sidebar-max-integration' of https://github.com/…
Twixes Dec 18, 2024
e00b48c
steps toward moving functions from python to django. The connection t…
slshults Dec 19, 2024
5487d9b
Update query snapshots
github-actions[bot] Dec 19, 2024
0657a19
Update UI snapshots for `chromium` (1)
github-actions[bot] Dec 19, 2024
78bbc9f
Merge branch 'master' into support-sidebar-max-integration
Twixes Dec 19, 2024
a47ce70
Remove some dead code
Twixes Dec 19, 2024
f286857
Fix views.py
Twixes Dec 19, 2024
601cc4d
Merge branch 'master' into support-sidebar-max-integration
slshults Dec 19, 2024
8be4487
Finished moving functions from Flask to Django, fixed resulting bugs,…
slshults Dec 20, 2024
4de70d0
Merge remote-tracking branch 'origin/master' into support-sidebar-max…
slshults Dec 20, 2024
1983eb8
Resolving 'ERR_PNPM_LOCKFILE_CONFIG_MISMATCH'
slshults Dec 20, 2024
9eeff2a
Update query snapshots
github-actions[bot] Dec 20, 2024
1158d70
chore: update dependencies from master
slshults Dec 20, 2024
f77cd57
Merge branch 'support-sidebar-max-integration' of https://github.com/…
slshults Dec 20, 2024
5d9532e
Merge branch 'master' into support-sidebar-max-integration
slshults Dec 20, 2024
e489b1e
Reducing token usage by 33% to 66% per turn with a clever system prom…
slshults Dec 20, 2024
ab99ce4
upgrading to Anthropic Python SDK v 0.42.0, corrected some logging, d…
slshults Dec 23, 2024
79fcceb
Merge branch 'master' into support-sidebar-max-integration
slshults Dec 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,13 @@ plugin-transpiler/dist
*.log
# pyright config (keep this until we have a standardized one)
pyrightconfig.json

# Max-specific entries
ee/support_sidebar_max/max-venv/
ee/support_sidebar_max/.vscode
ee/support_sidebar_max/.vscode/settings.json
max-test-venv/
ee/support_sidebar_max/.env
Comment on lines +71 to +77
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these additions should be needed after we've moved this Max out of Flask and into Django – let's remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions were moved, but the main python script is still in use and still in ee/support_sidebar_max (sidebar_max_AI.py), so I still don't want to be pushing those items from my local. (I can and will remove max-test-venv though, done with that.)

# Assistant Evaluation with Deepeval
.deepeval
.deepeval-cache.json
Expand Down
2 changes: 2 additions & 0 deletions ee/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,3 +73,5 @@
LANGFUSE_PUBLIC_KEY = get_from_env("LANGFUSE_PUBLIC_KEY", "", type_cast=str)
LANGFUSE_SECRET_KEY = get_from_env("LANGFUSE_SECRET_KEY", "", type_cast=str)
LANGFUSE_HOST = get_from_env("LANGFUSE_HOST", "https://us.cloud.langfuse.com", type_cast=str)

ANTHROPIC_API_KEY = get_from_env("ANTHROPIC_API_KEY", "")
1 change: 1 addition & 0 deletions ee/support_sidebar_max/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# This file is intentionally empty to mark the directory as a Python package.
175 changes: 175 additions & 0 deletions ee/support_sidebar_max/max_search_tool.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
import requests
from bs4 import BeautifulSoup # type: ignore

Check failure on line 2 in ee/support_sidebar_max/max_search_tool.py

View workflow job for this annotation

GitHub Actions / Python code quality checks

Unused "type: ignore" comment
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

SITEMAP_URL = "https://posthog.com/sitemap/sitemap-0.xml"

STATUS_PAGE_URL = "https://status.posthog.com"

HOGQL_PRIORITY_URLS = [
"https://posthog.com/docs/hogql",
"https://posthog.com/docs/hogql/aggregations",
"https://posthog.com/docs/hogql/clickhouse-functions",
"https://posthog.com/docs/hogql/expressions",
"https://posthog.com/docs/product-analytics/sql",
]


def is_hogql_query(query):
hogql_keywords = ["hogql", "sql", "query", "aggregate", "function", "expression"]
return any(keyword in query.lower() for keyword in hogql_keywords)


def is_status_query(query):
status_keywords = ["status", "incident", "outage", "downtime", "ingestion", "slow", "lag", "delays"]
return any(keyword in query.lower() for keyword in status_keywords)


def get_relevant_urls(query):
urls = []

try:
response = requests.get(SITEMAP_URL)
response.raise_for_status()
soup = BeautifulSoup(response.content, "xml")
for url in soup.find_all("loc"):
loc = url.text
if "/questions/" not in loc:
urls.append(loc)
if is_hogql_query(query):
urls.extend(HOGQL_PRIORITY_URLS)
urls.append(STATUS_PAGE_URL)
return urls
except requests.RequestException as e:
logger.error(f"Error fetching sitemap: {str(e)}") # noqa: TRY400
return urls


def prioritize_urls(urls, query):
priority_dirs = {
"docs": ["docs", "tutorials"],
"how": ["docs", "tutorials"],
"pricing": ["pricing"],
"jobs": ["careers"],
"history": ["about", "handbook", "blog"],
"teams": ["teams"],
}

query_type = "docs" # default
for key in priority_dirs:
if key in query.lower():
query_type = key
break

def calculate_relevance(url):
query_words = query.lower().split()
url_lower = url.lower()
word_match_score = sum(3 if word in url_lower else 1 for word in query_words if word in url_lower)
url_depth = len(url.strip("/").split("/"))
depth_score = min(url_depth, 5)
priority_score = 5 if any(dir in url for dir in priority_dirs[query_type]) else 0

if is_hogql_query(query) and url in HOGQL_PRIORITY_URLS:
priority_score += 10

if is_status_query(query) and url == STATUS_PAGE_URL:
priority_score += 15

return (word_match_score * 2) + (depth_score * 1.5) + priority_score

return sorted(urls, key=calculate_relevance, reverse=True)


def max_search_tool(query):
relevant_urls = get_relevant_urls(query)
prioritized_urls = prioritize_urls(relevant_urls, query)
results = []
errors = []

max_urls_to_process = 30
max_chars = 10000
relevance_threshold = 0.6
min_results = 5

def has_highly_relevant_results(results, threshold=2):
return len(results) >= threshold and all(
len(result["relevant_passages"]) >= 2 for result in results[:threshold]
)

for url in prioritized_urls[:max_urls_to_process]:
try:
logger.info(f"Searching {url}")
response = requests.get(url, allow_redirects=True, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, "html.parser")

for script in soup(["script", "style"]):
script.decompose()
text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
text = "\n".join(chunk for chunk in chunks if chunk)

paragraphs = text.split("\n\n")
relevant_passages = []
for i, paragraph in enumerate(paragraphs):
relevance_score = sum(word.lower() in paragraph.lower() for word in query.split())
if relevance_score > 0:
relevant_text = paragraph
char_count = len(relevant_text)

for j in range(i + 1, min(i + 5, len(paragraphs))):
if char_count + len(paragraphs[j]) <= max_chars:
relevant_text += "\n\n" + paragraphs[j]
char_count += len(paragraphs[j])
else:
break

heading = "Unknown Section"
for tag in soup.find_all(["h1", "h2", "h3", "h4", "h5", "h6"]):
if tag.string and tag.string in paragraph:
heading = tag.string
break

relevant_passages.append(
{
"text": relevant_text[:10000],
"url": url,
"heading": heading,
"relevance_score": relevance_score,
}
)

if relevant_passages:
relevant_passages.sort(key=lambda x: x["relevance_score"], reverse=True)
result = {
"page_title": soup.title.string if soup.title else "Untitled",
"url": url,
"relevant_passages": relevant_passages[:4],
}
results.append(result)

if len(results) >= min_results and relevant_passages[0]["relevance_score"] > relevance_threshold:
logger.info(f"Found sufficient relevant results so stopping search.")
break

if has_highly_relevant_results(results):
logger.info("Found highly relevant results so stopping search.")
break

except requests.RequestException as e:
error_message = f"Error fetching {url}: {str(e)}"
logger.error(error_message) # noqa: TRY400
errors.append(error_message)

if not results and not errors:
return (
"Well this is odd. My searches aren't finding anything for that. Could you try asking with different words?"
)
elif errors and not results:
return f"Oof. Sorry about this. I ran into errors when trying to search: {'; '.join(errors)}"
else:
return results[:5]
78 changes: 78 additions & 0 deletions ee/support_sidebar_max/sidebar_max_AI.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import time
import logging


class ConversationHistory:
def __init__(self):
self.turns = []
self.last_access = time.time() # Add timestamp

def touch(self):
"""Update last access time"""
self.last_access = time.time()

def add_turn_user(self, content):
self.touch() # Update timestamp on activity
self.turns.append(
{
"role": "user",
"content": [
{
"type": "text",
"text": content,
}
],
}
)

def add_turn_assistant(self, content):
self.touch() # Update timestamp on activity
if isinstance(content, list):
# Content is already properly structured
self.turns.append({"role": "assistant", "content": content})
else:
# Simple text responses
self.turns.append(
{
"role": "assistant",
"content": [
{
"type": "text",
"text": content,
}
],
}
)

def get_turns(self):
self.touch() # Update timestamp on activity
return self.turns


# Active logging configuration used by ViewSet
logging.basicConfig(
level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s", handlers=[logging.StreamHandler()]
)
logger = logging.getLogger(__name__)

# Active tool definition used by ViewSet
max_search_tool_tool = {
"name": "max_search_tool",
"description": (
"Searches the PostHog documentation at https://posthog.com/docs, "
"https://posthog.com/tutorials, to find information relevant to the "
"user's question. The search query should be a question specific to using "
"and configuring PostHog."
),
"cache_control": {"type": "ephemeral"},
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query, in the form of a question, related to PostHog usage and configuration.",
}
},
"required": ["query"],
},
}
Loading
Loading