Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call Analyst to perform validation #187

Merged
merged 2 commits into from
Oct 23, 2024
Merged

Conversation

sfc-gh-cnivera
Copy link
Collaborator

@sfc-gh-cnivera sfc-gh-cnivera commented Oct 18, 2024

We currently maintain copies of the validation logic in both the internal Analyst codepaths as well as this OSS app. Often, the OSS app can become out of date. Instead of performing validation locally, we will simply call Analyst with the current YAML string, as it performs validation at inference time. Any error returned is shown to the user.

The diff for this PR seems big but it's mostly deleting unnecessary code + tests.

@sfc-gh-cnivera sfc-gh-cnivera linked an issue Oct 18, 2024 that may be closed by this pull request


@st.cache_data(ttl=60, show_spinner=False)
def send_message(
Copy link
Collaborator Author

@sfc-gh-cnivera sfc-gh-cnivera Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved send_message out of iteration.py and into chat.py so that we could share it with the validation code.

Most of the code is the same except for how we handle errors; instead of preprocessing it into a "Failed request with status" message, we attempt to pick out the message field from the error payload and render it, which hopefully shows a bit nicer in the UI.

logger.info(f"Checking logical table: {table.name}")
try:
validate_all_cols(table)
sqls = generate_select(table, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logan recently added an "explain select" PR in orchestrator which does this generate select behavior.
It is implemented as part of the parallel task flow (so further away from the validation path). Can we test the behavior and see if we also get a useful error message displayed to the user if the physical columns dont exist on the table?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the physical column doesn't exist (e.g. i just renamed one of the dimensions to something fake), it seems that the validation error isn't thrown. Looking through the orchestrator logs I do see that there is an error being warned but it doesn't look like it is propagated as a fatal error, I've asked Logan some questions on which warnings are returned from Analyst.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is temporary (like baby sit it in prod) but i think he plans to return error eventually. Good to confirm with him

Copy link
Collaborator Author

@sfc-gh-cnivera sfc-gh-cnivera Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logan confirmed what you mentioned - for now it just logs the error but doesn't return yet, so we don't get a validation error yet. Once that's changed on the orchestrator side I think we'll see errors here properly


logger.info("Successfully validated!")
dummy_request = [
{"role": "user", "content": [{"type": "text", "text": "SMG app validation"}]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to handle this special string of "SMG app validation" on orchestrator side?

the one part to confirm is whether we will reliable get the "explain select" behavior in parallel task flow, or whether, if the classification module returns first (which is unlikely) we would not get the right error message back to user

Copy link
Collaborator Author

@sfc-gh-cnivera sfc-gh-cnivera Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, thanks for calling that out. I do think we see the explain select behavior, in the orchestrator logs I can see Begin validating semantic context and End validating semantic context with this dummy question.

I don't think I want to do any special casing for this question in the orchestrator inference paths (besides from perhaps preventing logs from being emitted, or something). Is there a generic question that'd work better to make sure we hit the relevant validation codepaths?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this is good enough if you saw it working.
I like your sample question now because it is basically guaranted to be non-sql (and thus not block on sqlgen latency)

@sfc-gh-cnivera sfc-gh-cnivera merged commit 2f1f675 into main Oct 23, 2024
2 checks passed
@sfc-gh-cnivera sfc-gh-cnivera deleted the cnivera/remote-validation branch October 23, 2024 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate local validation to Analyst call
2 participants