Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NL-to-ESQL] improve correctCommonEsqlMistakes #198942

Open
pgayvallet opened this issue Nov 5, 2024 · 3 comments
Open

[NL-to-ESQL] improve correctCommonEsqlMistakes #198942

pgayvallet opened this issue Nov 5, 2024 · 3 comments
Labels
Team:AI Infra AppEx AI Infrastructure Team

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Nov 5, 2024

We have a correctCommonEsqlMistakes function that perform some programmatic (no LLM) fixes of generated queries, to fix things like bad quotes, wrong field name escaping and so on:

export function correctCommonEsqlMistakes(query: string): {
isCorrection: boolean;
input: string;
output: string;
} {

We should track the kind of improvements we could do, and what kind of other common generation mistakes we could easily fix programmatically.

This probably can't be done without proper utilization feedback.

Identified common mistakes

List of common mistakes that are currently identified and should be ideally addressed:

Status Mistake Details PR
🟢 done string literals used instead of timespan literals for DATE_TRUNC and BUCKET comment #202190
🟢 done wrong wildcards used for LIKE (_ instead of ? and % instead of *) comment #202464
--- Waiting on feedback... ⌛
@pgayvallet pgayvallet added the Team:AI Infra AppEx AI Infrastructure Team label Nov 5, 2024
@pgayvallet
Copy link
Contributor Author

pgayvallet commented Nov 11, 2024

One common mistake that has been reported is the model using the wrong wildcard when using the LIKE function: they are using the SQL wildcards (% for multi and _ for single) instead of the ESQL ones (* for multi and ? for single).

We should probably try to auto-correct % to * and _ to ? within LIKE commands

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Nov 25, 2024

Another common mistake that has been reported is the model using strings instead of timespan literal when using date functions such as DATE_TRUNC.

E.g DATE_TRUNC("year", date) instead of DATE_TRUNC(1 year, date).

We should try to autocorrect such grammar errors

pgayvallet added a commit that referenced this issue Dec 2, 2024
…correct (#202190)

## Summary

Part of #198942

Fixes bad grammar regarding using string literals instead of timespan
literals for `DATE_TRUNC` and `BUCKET` functions.

This PR also paves the way for additional AST-based grammar corrections


**Example**

*Input*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC("1 year", date)
| EVAL trunc_month = DATE_TRUNC("month", date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR")
```
*Output*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC(1 year, date)
| EVAL trunc_month = DATE_TRUNC(1 month, date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour)
```
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Dec 2, 2024
…correct (elastic#202190)

## Summary

Part of elastic#198942

Fixes bad grammar regarding using string literals instead of timespan
literals for `DATE_TRUNC` and `BUCKET` functions.

This PR also paves the way for additional AST-based grammar corrections

**Example**

*Input*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC("1 year", date)
| EVAL trunc_month = DATE_TRUNC("month", date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR")
```
*Output*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC(1 year, date)
| EVAL trunc_month = DATE_TRUNC(1 month, date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour)
```

(cherry picked from commit 742854f)
pgayvallet added a commit that referenced this issue Dec 3, 2024
## Summary

Part of #198942

Add autocorrect for wrong `LIKE` wildcard.

The LLM can make mistake and use SQL wildcards for LIKE operators (`_`
instead of `?` and `%` instead of `*`)


Examples

**generated**
```
FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_"
```
**corrected**
```
FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?"
```

---------

Co-authored-by: kibanamachine <[email protected]>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Dec 3, 2024
## Summary

Part of elastic#198942

Add autocorrect for wrong `LIKE` wildcard.

The LLM can make mistake and use SQL wildcards for LIKE operators (`_`
instead of `?` and `%` instead of `*`)

Examples

**generated**
```
FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_"
```
**corrected**
```
FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?"
```

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 2ace6ff)
@pgayvallet
Copy link
Contributor Author

We currently not tracking any more potential improvements, but I will keep the issue open, as we will likely get feedback about other potential improvements later.

hop-dev pushed a commit to hop-dev/kibana that referenced this issue Dec 5, 2024
## Summary

Part of elastic#198942

Add autocorrect for wrong `LIKE` wildcard.

The LLM can make mistake and use SQL wildcards for LIKE operators (`_`
instead of `?` and `%` instead of `*`)


Examples

**generated**
```
FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_"
```
**corrected**
```
FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?"
```

---------

Co-authored-by: kibanamachine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 9, 2024
…correct (elastic#202190)

## Summary

Part of elastic#198942

Fixes bad grammar regarding using string literals instead of timespan
literals for `DATE_TRUNC` and `BUCKET` functions.

This PR also paves the way for additional AST-based grammar corrections


**Example**

*Input*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC("1 year", date)
| EVAL trunc_month = DATE_TRUNC("month", date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR")
```
*Output*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC(1 year, date)
| EVAL trunc_month = DATE_TRUNC(1 month, date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour)
```
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 9, 2024
## Summary

Part of elastic#198942

Add autocorrect for wrong `LIKE` wildcard.

The LLM can make mistake and use SQL wildcards for LIKE operators (`_`
instead of `?` and `%` instead of `*`)


Examples

**generated**
```
FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_"
```
**corrected**
```
FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?"
```

---------

Co-authored-by: kibanamachine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 12, 2024
…correct (elastic#202190)

## Summary

Part of elastic#198942

Fixes bad grammar regarding using string literals instead of timespan
literals for `DATE_TRUNC` and `BUCKET` functions.

This PR also paves the way for additional AST-based grammar corrections


**Example**

*Input*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC("1 year", date)
| EVAL trunc_month = DATE_TRUNC("month", date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, "3 HOUR")
```
*Output*
```esql
FROM logs
| EVAL trunc_year = DATE_TRUNC(1 year, date)
| EVAL trunc_month = DATE_TRUNC(1 month, date)
| STATS hires = COUNT(*) BY hour = BUCKET(hire_date, 3 hour)
```
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 12, 2024
## Summary

Part of elastic#198942

Add autocorrect for wrong `LIKE` wildcard.

The LLM can make mistake and use SQL wildcards for LIKE operators (`_`
instead of `?` and `%` instead of `*`)


Examples

**generated**
```
FROM logs | WHERE message LIKE "a%" AND TO_UPPER(level) LIKE "err%" | WHERE foo LIKE "ba_"
```
**corrected**
```
FROM logs | WHERE message LIKE "a*" AND TO_UPPER(level) LIKE "err*" | WHERE foo LIKE "ba?"
```

---------

Co-authored-by: kibanamachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:AI Infra AppEx AI Infrastructure Team
Projects
None yet
Development

No branches or pull requests

1 participant