Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC DO NOT MERGE - Semantic Query #11

Conversation

carlosdelest
Copy link
Owner

@carlosdelest carlosdelest commented Nov 8, 2023

PoC for a semantic_query query builder:

GET test-semantic/_search
{
    "query": {
        "semantic_query": {
            "infer_field": {
                "query": "burger"
            }
        }
    }
}

Uses a new search phase for performing coordinator level query rewriting.

New information for field to model IDs is added to the coordinator rewrite context so the query can perform the actual inference on rewriting.

You can follow this gist to test.

@@ -71,6 +71,8 @@ protected UpdateByQueryRequest buildRequest(RestRequest request, NamedWriteableR
consumers.put("script", o -> internal.setScript(Script.parse(o)));
consumers.put("max_docs", s -> setMaxDocsValidateIdentical(internal, ((Number) s).intValue()));

// TODO There surely must be a better way of doing this
request.params().put("_source_includes", "*");
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included source exclusion from the demo. Hacky but gets the job done for now

* sort them according to the provided order. This can be useful for instance to ensure that shards that contain recent
* data are executed first when sorting by descending timestamp.
*/
final class CoordinatorQueryRewriteSearchPhase extends SearchPhase {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New search phase for rewriting queries in the coordinator node

ThreadPool threadPool,
SearchResponse.Clusters clusters
) {
if (preFilter) {
if (runCoordinatorPhase) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as a pre-step for search phases

this.fieldNamesToInferenceModel = Map.of();
}

public CoordinatorRewriteContext(
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new constructor for the rewriting in the coordinator search phase

@@ -63,4 +66,9 @@ public CoordinatorRewriteContext getCoordinatorRewriteContext(Index index) {

return new CoordinatorRewriteContext(parserConfig, client, nowInMillis, timestampRange, dateFieldType);
}

@Nullable
public CoordinatorRewriteContext getCoordinatorRewriteContextForModels(Map<String, Set<String>> fieldToModelIds) {
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New method for obtaining the rewrite context for the new search phase

if (source == null) {
return false;
}
return source.subSearches().stream().anyMatch(sqwb -> sqwb.getQueryBuilder() instanceof CoordinatorRewriteableQueryBuilder);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used a marker interface for this - we can change that later

@@ -124,7 +125,8 @@ public static FetchSourceContext parseFromRestRequest(RestRequest request) {
if (fetchSource != null || sourceIncludes != null || sourceExcludes != null) {
return FetchSourceContext.of(fetchSource == null || fetchSource, sourceIncludes, sourceExcludes);
}
return null;

return FetchSourceContext.of(true, null, new String[]{"*." + SemanticTextFieldMapper.SPARSE_VECTOR_SUBFIELD_NAME});
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hacky source exclusion

return inferenceResultsToQuery(fieldName, inferenceResultsSupplier.get());
}

Set<String> modelNames = coordinatorRewriteContext.inferenceModelsForFieldName(fieldName);
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrieves model ids for the field and performs some validations.

@carlosdelest carlosdelest changed the title carlosCarlosdelest/semantic query Semantic Query Nov 8, 2023
@carlosdelest carlosdelest changed the title Semantic Query PoC DO NOT MERGE - Semantic Query Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant