Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug where ingestion failed for input document containing list of nested objects #1040

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yizheliu-amazon
Copy link
Contributor

Description

Fix bug where ingestion failed for input document containing list of nested objects

Related Issues

Resolves #1024

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@heemin32
Copy link
Collaborator

Can we have IT test for this?

@heemin32 heemin32 added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Dec 24, 2024
*/
Map<String, Object> child1Level2 = buildObjMapWithSingleField(CHILD_1_TEXT_FIELD, TEXT_VALUE_1);
Map<String, Object> child1Level1 = buildObjMapWithSingleField(CHILD_FIELD_LEVEL_1, child1Level2);
Map<String, Object> child2Level2 = buildObjMapWithSingleField(CHILD_1_TEXT_FIELD, TEXT_VALUE_1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this critical for test case to have all identical values for both nested fields? In real life scenario most of the times values will be different, can we edit this method or add a new test case with 2+ different fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can add different fields so that objects in the list are not identical.

@yizheliu-amazon
Copy link
Contributor Author

yizheliu-amazon commented Dec 26, 2024

Can we have IT test for this?

Thanks for the review. I tried adding IT test for it, but found a new issue in the case of doc containing list of nested objects with multiple dots .: issue #1042 . The ingest pipeline example in issue #1042 is actually from config file of existing IT. That being said, given pipeline config of existing IT in the code, new IT test for this change will fail. Such issue is not related to this bug fix PR, but related to case where doc containing list of nested objects with multiple dots . is being ingested. Existing ITs can pass because such case is not covered.

To work around it, we can either

  1. fix current bug, then fix issue [BUG] Fail to generate embedding for ingest document with nested field defined in field map #1042; in the PR for issue [BUG] Fail to generate embedding for ingest document with nested field defined in field map #1042, I can add IT for the case of ingestion of doc with list of nested objects.
  2. create a new pipeline configuration like below for IT which is working for this PR, but this may seem unnecessary because such new pipeline is very similar to existing one. If IT for this PR can pass given existing pipeline, it can also pass for below pipeline.
{
  "description": "text embedding pipeline for hybrid",
  "processors": [
    {
      "text_embedding": {
        "model_id": "%s",
        "field_map": {
          "title": "title_knn",
          "favor_list": "favor_list_knn",
          "favorites": {
            "game": "game_knn",
            "movie": "movie_knn"
          },
          "nested_passages": "level_1_embedding"
        }
      }
    }
  ]
}

I may prefer option 1 since option 2 seems unnecessary to me.

@heemin32
Copy link
Collaborator

@yizheliu-amazon Thanks for the detail explanation. I will leave it to you to decided for the next step among the two option. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Fail to ingest document with nested list into text_embedding processor
3 participants