Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] [Auto Import] Use larger number of samples on the backend (#196233) #196386

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

kibanamachine
Copy link
Contributor

Backport

This will backport the following commits from main to 8.x:

Questions ?

Please refer to the Backport tool documentation

…6233)

## Release Notes

Automatic Import now analyses larger number of samples to generate an
integration.

## Summary

Closes elastic/security-team#9844

**Added: Backend Sampling**

We pass 100 rows (these numeric values are adjustable) to the backend
[^1]

[^1]: As before, deterministically selected on the frontend, see
elastic#191598

The Categorization chain now processes the samples in batches,
performing after initial categorization a number of review cycles (but
not more than 5, tuned so that we stay under the 2 minute limit for a
single API call).

To decide when to stop processing we keep the list of _stable_ samples
as follows:

1. The list is initially empty.
2. For each review we select a random subset of 40 samples, preferring
to pick up the not-stable samples.
3. After each review – when the LLM potentially gives us new or changes
the old processors – we compare the new pipeline results with the old
pipeline results.
4. Those reviewed samples that did not change their categorization are
added to the stable list.
5. Any samples that have changed their categorization are removed from
the stable list.
6. If all samples are stable, we finish processing.

**Removed: User Notification**

Using 100 samples provides a balance between expected complexity and
time budget we work with. We might want to change it in the future,
possibly dynamically, making the specific number of no importance to the
user. Thus we remove the truncation notification.

**Unchanged:**

- No batching is made in the related chain: it seems to work as-is.

**Refactored:**

- We centralize the sizing constants in the
`x-pack/plugins/integration_assistant/common/constants.ts` file.
- We remove the unused state key `formattedSamples` and combine
`modelJSONInput` back into `modelInput`.

> [!NOTE]
> I had difficulty generating new graph diagrams, so they remain
unchanged.

(cherry picked from commit fc3ce54)
@kibanamachine kibanamachine merged commit a4938bc into elastic:8.x Oct 15, 2024
26 checks passed
@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #11 / dashboard Export import saved objects between versions should render all panels on the dashboard

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
integrationAssistant 55 56 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
integrationAssistant 961.0KB 960.7KB -297.0B
Unknown metric groups

API count

id before after diff
integrationAssistant 66 71 +5

cc @ilyannn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants