-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auto Import] Improve log format recognition #196228
[Auto Import] Improve log format recognition #196228
Conversation
Pinging @elastic/security-scalability (Team:Security-Scalability) |
…n/kibana into auto-import/csv-over-unstructured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.. minor comments
x-pack/plugins/integration_assistant/server/graphs/log_type_detection/detection.ts
Show resolved
Hide resolved
x-pack/plugins/integration_assistant/server/graphs/log_type_detection/prompts.ts
Outdated
Show resolved
Hide resolved
|
…tection/prompts.ts Co-authored-by: Bharat Pasupula <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I considered this, but it's mostly needed for CSV and we already have the release note about the CSV format support which kind of covers this one. |
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]
|
Starting backport for target branches: 8.x |
Previously the LLM would often select `unstructured` format for what (to our eye) clearly are CSV samples. We add the missing line break between the log samples (which should help format recognition in general) and change the prompt to clarify when the comma-separated list should be treated as a `csv` and when as `structured` format. See GitHub for examples. --------- Co-authored-by: Bharat Pasupula <[email protected]> (cherry picked from commit bdc9ce9)
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
# Backport This will backport the following commits from `main` to `8.x`: - [[Auto Import] Improve log format recognition (#196228)](#196228) <!--- Backport version: 9.4.3 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Ilya Nikokoshev","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-10-15T12:02:00Z","message":"[Auto Import] Improve log format recognition (#196228)\n\nPreviously the LLM would often select `unstructured` format for what (to\r\nour eye) clearly are CSV samples.\r\n\r\nWe add the missing line break between the log samples (which should help\r\nformat recognition in general) and change the prompt to clarify when the\r\ncomma-separated list should be treated as a `csv` and when as\r\n`structured` format.\r\n\r\nSee GitHub for examples.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <[email protected]>","sha":"bdc9ce932bbfa606dd1f1e188c8b32df4327a0a4","branchLabelMapping":{"^v9.0.0$":"main","^v8.16.0$":"8.x","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["bug","release_note:skip","backport missing","v9.0.0","backport:prev-minor","Team:Security-Scalability","Feature:AutomaticImport"],"title":"[Auto Import] Improve log format recognition","number":196228,"url":"https://github.com/elastic/kibana/pull/196228","mergeCommit":{"message":"[Auto Import] Improve log format recognition (#196228)\n\nPreviously the LLM would often select `unstructured` format for what (to\r\nour eye) clearly are CSV samples.\r\n\r\nWe add the missing line break between the log samples (which should help\r\nformat recognition in general) and change the prompt to clarify when the\r\ncomma-separated list should be treated as a `csv` and when as\r\n`structured` format.\r\n\r\nSee GitHub for examples.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <[email protected]>","sha":"bdc9ce932bbfa606dd1f1e188c8b32df4327a0a4"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.0.0","branchLabelMappingKey":"^v9.0.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/196228","number":196228,"mergeCommit":{"message":"[Auto Import] Improve log format recognition (#196228)\n\nPreviously the LLM would often select `unstructured` format for what (to\r\nour eye) clearly are CSV samples.\r\n\r\nWe add the missing line break between the log samples (which should help\r\nformat recognition in general) and change the prompt to clarify when the\r\ncomma-separated list should be treated as a `csv` and when as\r\n`structured` format.\r\n\r\nSee GitHub for examples.\r\n\r\n---------\r\n\r\nCo-authored-by: Bharat Pasupula <[email protected]>","sha":"bdc9ce932bbfa606dd1f1e188c8b32df4327a0a4"}}]}] BACKPORT--> Co-authored-by: Ilya Nikokoshev <[email protected]>
Context
Previously the LLM would often select
unstructured
format for what (to our eye) clearly are CSV samples, e.g. for this PAN-OS integration log (note the 5 samples are squashed to one line):Summary
We add the missing line break between the log samples (which should help format recognition in general) and change the prompt to clarify when the comma-separated list should be treated as a
csv
and when asstructured
format.Testing
The result is (compare to the existing integration):
Generated integration: ai_panw_traffic_202410150239-1.0.0.zip
(check the sample event)