[NL-to-ESQL] improve `correctCommonEsqlMistakes` #198942

pgayvallet · 2024-11-05T13:18:01Z

We have a correctCommonEsqlMistakes function that perform some programmatic (no LLM) fixes of generated queries, to fix things like bad quotes, wrong field name escaping and so on:

kibana/x-pack/plugins/inference/common/tasks/nl_to_esql/correct_common_esql_mistakes.ts

Lines 228 to 232 in 5c298a1

    
           export function correctCommonEsqlMistakes(query: string): { 
        
             isCorrection: boolean; 
        
             input: string; 
        
             output: string; 
        
           } {

We should track the kind of improvements we could do, and what kind of other common generation mistakes we could easily fix programmatically.

This probably can't be done without proper utilization feedback.

Identified common mistakes

List of common mistakes that are currently identified and should be ideally addressed:

Status	Mistake	Details	PR
🟢 done	string literals used instead of timespan literals for `DATE_TRUNC` and `BUCKET`	comment	#202190
🟢 done	wrong wildcards used for `LIKE` (`_` instead of `?` and `%` instead of `*`)	comment	#202464
---	Waiting on feedback... ⌛

The text was updated successfully, but these errors were encountered:

pgayvallet · 2024-11-11T11:59:46Z

One common mistake that has been reported is the model using the wrong wildcard when using the LIKE function: they are using the SQL wildcards (% for multi and _ for single) instead of the ESQL ones (* for multi and ? for single).

We should probably try to auto-correct % to * and _ to ? within LIKE commands

pgayvallet · 2024-11-25T18:45:10Z

Another common mistake that has been reported is the model using strings instead of timespan literal when using date functions such as DATE_TRUNC.

E.g DATE_TRUNC("year", date) instead of DATE_TRUNC(1 year, date).

We should try to autocorrect such grammar errors

…correct (#202190) ## Summary Part of #198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ```

…correct (elastic#202190) ## Summary Part of elastic#198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ``` (cherry picked from commit 742854f)

## Summary Part of #198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>

## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 2ace6ff)

pgayvallet · 2024-12-03T07:40:11Z

string literals used instead of timespan literals for DATE_TRUNC and BUCKET has been addressed by [NL-to-ESQL] correctCommonEsqlMistakes: add timespan literals auto-correct #202190
wrong wildcards used for LIKE (_ instead of ? and % instead of *) has been addressed by [NL-to-ESQL] autocorrect bad LIKE wildcards #202464

We currently not tracking any more potential improvements, but I will keep the issue open, as we will likely get feedback about other potential improvements later.

## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>

…correct (elastic#202190) ## Summary Part of elastic#198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ```

## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>

…correct (elastic#202190) ## Summary Part of elastic#198942 Fixes bad grammar regarding using string literals instead of timespan literals for `DATE_TRUNC` and `BUCKET` functions. This PR also paves the way for additional AST-based grammar corrections **Example** *Input* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC("1 year", date) | EVAL trunc_month = DATE_TRUNC("month", date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR") ``` *Output* ```esql FROM logs | EVAL trunc_year = DATE_TRUNC(1 year, date) | EVAL trunc_month = DATE_TRUNC(1 month, date) | STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour) ```

## Summary Part of elastic#198942 Add autocorrect for wrong `LIKE` wildcard. The LLM can make mistake and use SQL wildcards for LIKE operators (`_` instead of `?` and `%` instead of `*`) Examples **generated** ``` FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_" ``` **corrected** ``` FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?" ``` --------- Co-authored-by: kibanamachine <[email protected]>

pgayvallet added the Team:AI Infra AppEx AI Infrastructure Team label Nov 5, 2024

pgayvallet mentioned this issue Nov 28, 2024

[NL-to-ESQL] correctCommonEsqlMistakes: add timespan literals auto-correct #202190

Merged

pgayvallet mentioned this issue Dec 2, 2024

[NL-to-ESQL] autocorrect bad LIKE wildcards #202464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NL-to-ESQL] improve `correctCommonEsqlMistakes` #198942

[NL-to-ESQL] improve `correctCommonEsqlMistakes` #198942

pgayvallet commented Nov 5, 2024 •

edited

Loading

pgayvallet commented Nov 11, 2024 •

edited

Loading

pgayvallet commented Nov 25, 2024 •

edited

Loading

pgayvallet commented Dec 3, 2024

[NL-to-ESQL] improve correctCommonEsqlMistakes #198942

[NL-to-ESQL] improve correctCommonEsqlMistakes #198942

Comments

pgayvallet commented Nov 5, 2024 • edited Loading

Identified common mistakes

pgayvallet commented Nov 11, 2024 • edited Loading

pgayvallet commented Nov 25, 2024 • edited Loading

pgayvallet commented Dec 3, 2024

[NL-to-ESQL] improve `correctCommonEsqlMistakes` #198942

[NL-to-ESQL] improve `correctCommonEsqlMistakes` #198942

pgayvallet commented Nov 5, 2024 •

edited

Loading

pgayvallet commented Nov 11, 2024 •

edited

Loading

pgayvallet commented Nov 25, 2024 •

edited

Loading