Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot make use of default_model_id in neural_sparse query type #871

Closed
jdnvn opened this issue Aug 22, 2024 · 3 comments
Closed

[BUG] Cannot make use of default_model_id in neural_sparse query type #871

jdnvn opened this issue Aug 22, 2024 · 3 comments
Labels
bug Something isn't working untriaged

Comments

@jdnvn
Copy link

jdnvn commented Aug 22, 2024

What is the bug?

Receive an error when omitting the model_id param in a nested neural_sparse query after configuring a default_model_id for the index. This seems to only be for nested queries, I cannot reproduce it for non-nested queries.

How can one reproduce the bug?

Create an index called my_index

PUT /my_index

{
	"settings": {
		"index": {"knn": True},
		"number_of_shards": 1,
		"number_of_replicas": 1,
		"analysis": {
			"analyzer": {
				"default": {
					"type": "standard"
				}
			}
		}
	},
	"mappings": {
		"properties": {
			"id": {"type": "keyword"},
			"chunks": {
				"type": "nested",
				"properties": {
					"chunk_id": {"type": "keyword"},
					"chunked_content": {"type": "text"},
					"chunked_content_embedding": {"type": "rank_features"},
				}
			},
		}
	}
}

Update cluster settings

PUT /_cluster/settings
{
	"persistent": {
		"plugins": {
			"ml_commons": {
				"allow_registering_model_via_url": "true",
				"only_run_on_ml_node": "false",
				"model_access_control_enabled": "true",
				"native_memory_threshold": "99"
			}
		}
	}
}

Create the neural sparse model

POST /_plugins/_ml/model_groups/_register
{
	"name": "my_model_group",
	"description": "Models for search",
}

POST /_plugins/_ml/models/_register?deploy=true
{
	"name": "neural-sparse/opensearch-neural-sparse-encoding-v1",
	"version": "1.0.1",
	"model_group_id": <model_group_id>
	"description": "This is a neural sparse encoding model: It transfers text into sparse vector, and then extract nonzero index and value to entry and weights. It serves in both ingestion and search.",
	"model_format": "TORCH_SCRIPT",
	"function_name": "SPARSE_ENCODING",
	"model_content_size_in_bytes": 492184214,
	"model_content_hash_value": "d1ebaa26615090bdb0195a62b180afd2a8524c68c5d406a11ad787267f515ea8",
	"url": "https://artifacts.opensearch.org/models/ml-models/amazon/neural-sparse/opensearch-neural-sparse-encoding-v1/1.0.1/torch_script/neural-sparse_opensearch-neural-sparse-encoding-v1-1.0.1-torch_script.zip",
	"created_time": 1696913667239
}

Create the search pipeline with a neural query enricher

PUT /_search/pipeline/my_pipeline
{
	"request_processors": [
		{
			"neural_query_enricher": {
				"default_model_id": <model_id>
			}
		}
	]
}

Update the index settings with the default pipeline

PUT /my_index/_settings
{
  "index.search.default_pipeline" : "my_pipeline"
}

Search the index

POST /_search

{
	"query": {
		"function_score": {
			"query": {
				"bool": {
					"should": [
						{
							"nested": {
								"path": "chunks",
								"score_mode": "max",
								"query": {
									"bool": {
										"must": [
											{
												"match": {
													"chunks.chunked_content": {
														"query": "contract"
													}	
												}	
											},
											{
												"neural_sparse": {
													"chunks.chunked_content_embedding": {
														"query_text": "contract" # NO MODEL ID!
													}
												}
											}
										]
									}
								},
								"inner_hits": {
									"_source": [
										"chunks.chunk_id",
										"chunks.chunked_content"
									]
								}
							}
						}
					]
				}
			},
			"score_mode": "sum",
			"min_score": 0.0
		}
	},
	"_source": {
		"excludes": [
			"_index",
			"chunks.chunked_content_embedding",
			"chunks.chunked_content",
			"chunks.chunk_id"
		]
	},
	"size": 100,
	"explain": "true"
}

Receive error:

{

	"error": {
		"root_cause": [
			{
				"type": "illegal_argument_exception",
				"reason": "query_text and model_id cannot be null"
			}
		],
		"type": "illegal_argument_exception",
		"reason": "query_text and model_id cannot be null"
	},
	"status": 400
}

What is the expected behavior?

There is no error and the default_model_id configured on the search pipeline is used to embed the query.

What is your host/environment?

MacOS Sonoma 14.3, Docker 4.28.0, OpenSearch 2.16.0

Do you have any screenshots?

N/A

Do you have any additional context?

Might be due to this conditional not taking the default model id into account

@yuye-aws
Copy link
Member

@zhichao-aws Do you know whether opensearch-project/OpenSearch#14739 can fix this?

@zhichao-aws
Copy link
Member

zhichao-aws commented Aug 23, 2024

Hi @jdnvn the root cause is the compound query visitor logics, and we have added the logics in nested query opensearch-project/OpenSearch#14739. However, function score query is also compound query but we haven't add the visitor logics in it yet https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/index/query/functionscore/FunctionScoreQueryBuilder.java

This seems to only be for nested queries, I cannot reproduce it for non-nested queries.

Can you try this query? it should also throw exception

{
	"query": {
		"function_score": {
			"query": {
				"neural_sparse": {
                    "chunks.chunked_content_embedding": {
                        "query_text": "contract" # NO MODEL ID!
                    }
                }
			}
		}
	}
}

@jdnvn
Copy link
Author

jdnvn commented Aug 25, 2024

@yuye-aws @zhichao-aws thank you both for the quick responses! This makes sense, and I did get an exception from that query. I opened an issue in OpenSearch core and a PR to fix it. I'm going to close this as a duplicate.

@jdnvn jdnvn closed this as completed Aug 25, 2024
@jdnvn jdnvn reopened this Aug 25, 2024
@jdnvn jdnvn closed this as completed Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
3 participants