Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Avoid un-necessary predictions in ingest processors #2413

Closed
mingshl opened this issue May 7, 2024 · 2 comments
Closed

[ENHANCEMENT] Avoid un-necessary predictions in ingest processors #2413

mingshl opened this issue May 7, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request v2.15.0

Comments

@mingshl
Copy link
Collaborator

mingshl commented May 7, 2024

Is your feature request related to a problem?
When re-indexing, text_embedding processors and ml_inference processors will run the prediction again even when the inference fields are already existed in the document. text_embedding processors will over-write the the inference field, while ml_inference will not write to the inference field, then throw exception or skip writing to the document.

What solution would you like?
In the ingest processor that used ml inference, we should check the model output field name, for example, text_embedding field exists in the document before the prediction tasks run. If the field already exists, skip the predictions.

In this case, when re-index happens, it won't run the prediction tasks again if the field is existed.

What alternatives have you considered?
Welcome any other suggestions.

@mingshl mingshl added enhancement New feature or request untriaged labels May 7, 2024
@mingshl mingshl changed the title [ENHANCEMENT] [ENHANCEMENT] Avoid un-necessary predictions in ingest processors May 7, 2024
@IanMenendez
Copy link

IanMenendez commented May 7, 2024

Maybe it would be nice to add a parameter on ml ingest processors called overwrite: Option[Boolean] if true it overwrites current embeddings upon reindexing if false it does not overwrite

It is nice to still have the option to overwrite in case we change the ML model to one that outputs different embeddings

@ylwu-amzn
Copy link
Collaborator

Added in this PR #2508

@github-project-automation github-project-automation bot moved this from On-deck to Done in ml-commons projects Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v2.15.0
Projects
Development

No branches or pull requests

5 participants