Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fail to generate embedding for ingest document with nested field defined in field map #1042

Open
yizheliu-amazon opened this issue Dec 24, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@yizheliu-amazon
Copy link
Contributor

yizheliu-amazon commented Dec 24, 2024

What is the bug?

When field map is defined with nested field like "a.b.c": "d.e" for ingest pipeline, the ingestion for document with field in a list of "a.b.c" will fail the ingestion

How can one reproduce the bug?

  1. Create ingest pipeline
PUT /_ingest/pipeline/nlp-ingest-pipeline-v4

{
  "description": "text embedding pipeline example",
  "processors": [
    {
      "text_embedding": {
        "model_id": "AMT_KJQBWWsqXNqdQAfI",
        "field_map": {
          "nested_field.level1.level2": "level3.level4Embedding" 
				}
      }
    }
  ]
}


  1. Simulate
POST /_ingest/pipeline/nlp-ingest-pipeline-v4/_simulate

{
	"docs": [
		{
			"_index": "neural-search-index-v2",
			"_id": "1",
			"_source": {
				"nested_field.level1": [
					{
						"level2": "hello"
					},
					{
						"level2": "world"
					}
				]
			}
		}
	]
}
  1. Result
{
  "docs": [
    {
      "doc": {
        "_index": "neural-search-index-v2",
        "_id": "1",
        "_source": {
          "nested_field": [
	   {
              "level1": {
                  "level2": "hello"
               }
            },
	   {
              "level1": {
                  "level2": "world"
               }
            }
          ]
        },
        "_ingest": {
          "timestamp": "2025-01-02T22:45:34.884624Z"
        }
      }
    }
  ]
}

What is the expected behavior?

Doc in nested_field list should have embedding fields like

{
  "docs": [
    {
      "doc": {
        "_index": "neural-search-index-v2",
        "_id": "1",
        "_source": {
          "nested_field": [
            {
              "level1": {
                  "level2": "hello",
                  "level3.level4Embedding": [
                    0.18129997,
                    -0.056219965,
                   ...
                  ]
               }
            },
            {
              "level1": {
                  "level2": "world",
                  "level3.level4Embedding": [
                    0.0358946,
                    0.04183194,
                   ...
                  ]
               }
            }
          ]
        },
        "_ingest": {
          "timestamp": "2025-01-02T22:51:03.541304Z"
        }
      }
    }
  ]
}

What is your host/environment?

Mac OS

Do you have any screenshots?

N/A

Do you have any additional context?

N/A

@yizheliu-amazon yizheliu-amazon added bug Something isn't working untriaged labels Dec 24, 2024
@heemin32
Copy link
Collaborator

Could be duplicated issue with #686?

@yizheliu-amazon
Copy link
Contributor Author

No. Issue #686 is for bug where source field is overridden by embedding value, but this issue is about embedding value missing for doc containing list of nested objects.

Actually, given change of PR #1040 in my local workspace, I am not able to reproduce #686 now. It might get closed once PR #1040 is merged.

@yizheliu-amazon yizheliu-amazon changed the title [BUG] Fail to ingest document with nested field defined in field map [BUG] Fail to generate embedding for ingest document with nested field defined in field map Dec 27, 2024
@heemin32 heemin32 moved this from Backlog to Backlog(Hot) in Neural Search RoadMap Jan 3, 2025
@heemin32 heemin32 moved this from Backlog(Hot) to 2.19 in Neural Search RoadMap Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 2.19
Development

No branches or pull requests

2 participants