-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call Analyst to perform validation #187
Conversation
|
||
|
||
@st.cache_data(ttl=60, show_spinner=False) | ||
def send_message( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just moved send_message
out of iteration.py
and into chat.py
so that we could share it with the validation code.
Most of the code is the same except for how we handle errors; instead of preprocessing it into a "Failed request with status" message, we attempt to pick out the message
field from the error payload and render it, which hopefully shows a bit nicer in the UI.
logger.info(f"Checking logical table: {table.name}") | ||
try: | ||
validate_all_cols(table) | ||
sqls = generate_select(table, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logan recently added an "explain select" PR in orchestrator which does this generate select behavior.
It is implemented as part of the parallel task flow (so further away from the validation path). Can we test the behavior and see if we also get a useful error message displayed to the user if the physical columns dont exist on the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the physical column doesn't exist (e.g. i just renamed one of the dimensions to something fake), it seems that the validation error isn't thrown. Looking through the orchestrator logs I do see that there is an error being warned but it doesn't look like it is propagated as a fatal error, I've asked Logan some questions on which warnings are returned from Analyst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning is temporary (like baby sit it in prod) but i think he plans to return error eventually. Good to confirm with him
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logan confirmed what you mentioned - for now it just logs the error but doesn't return yet, so we don't get a validation error yet. Once that's changed on the orchestrator side I think we'll see errors here properly
|
||
logger.info("Successfully validated!") | ||
dummy_request = [ | ||
{"role": "user", "content": [{"type": "text", "text": "SMG app validation"}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you plan to handle this special string of "SMG app validation" on orchestrator side?
the one part to confirm is whether we will reliable get the "explain select" behavior in parallel task flow, or whether, if the classification module returns first (which is unlikely) we would not get the right error message back to user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting, thanks for calling that out. I do think we see the explain select
behavior, in the orchestrator logs I can see Begin validating semantic context
and End validating semantic context
with this dummy question.
I don't think I want to do any special casing for this question in the orchestrator inference paths (besides from perhaps preventing logs from being emitted, or something). Is there a generic question that'd work better to make sure we hit the relevant validation codepaths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is good enough if you saw it working.
I like your sample question now because it is basically guaranted to be non-sql (and thus not block on sqlgen latency)
We currently maintain copies of the validation logic in both the internal Analyst codepaths as well as this OSS app. Often, the OSS app can become out of date. Instead of performing validation locally, we will simply call Analyst with the current YAML string, as it performs validation at inference time. Any error returned is shown to the user.
The diff for this PR seems big but it's mostly deleting unnecessary code + tests.