-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NL-to-ESQL] improve correctCommonEsqlMistakes
#198942
Comments
One common mistake that has been reported is the model using the wrong wildcard when using the We should probably try to auto-correct |
Another common mistake that has been reported is the model using strings instead of timespan literal when using date functions such as E.g We should try to autocorrect such grammar errors |
…correct (#202190) ## Summary Part of #198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ```
…correct (elastic#202190) ## Summary Part of elastic#198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ``` (cherry picked from commit 742854f)
## Summary Part of #198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>
## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 2ace6ff)
We currently not tracking any more potential improvements, but I will keep the issue open, as we will likely get feedback about other potential improvements later. |
## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>
…correct (elastic#202190) ## Summary Part of elastic#198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ```
## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>
…correct (elastic#202190) ## Summary Part of elastic#198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ```
## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>
We have a
correctCommonEsqlMistakes
function that perform some programmatic (no LLM) fixes of generated queries, to fix things like bad quotes, wrong field name escaping and so on:kibana/x-pack/plugins/inference/common/tasks/nl_to_esql/correct_common_esql_mistakes.ts
Lines 228 to 232 in 5c298a1
We should track the kind of improvements we could do, and what kind of other common generation mistakes we could easily fix programmatically.
This probably can't be done without proper utilization feedback.
Identified common mistakes
List of common mistakes that are currently identified and should be ideally addressed:
DATE_TRUNC
andBUCKET
LIKE
(_
instead of?
and%
instead of*
)The text was updated successfully, but these errors were encountered: