-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC DO NOT MERGE - Store semantic_text mapping info #9
PoC DO NOT MERGE - Store semantic_text mapping info #9
Conversation
…mantic-text-mapping-info # Conflicts: # server/src/main/java/org/elasticsearch/TransportVersions.java # server/src/main/java/org/elasticsearch/node/Node.java
public SemanticTextFieldMapper build(MapperBuilderContext context) { | ||
String fullName = context.buildFullName(name); | ||
String subfieldName = fullName + "." + SPARSE_VECTOR_SUBFIELD_NAME; | ||
SparseVectorFieldMapper sparseVectorFieldMapper = new SparseVectorFieldMapper.Builder(subfieldName).build(context); | ||
return new SemanticTextFieldMapper( | ||
name(), | ||
new SemanticTextFieldType(name(), modelId.getValue(), meta.getValue()), | ||
modelId.getValue(), | ||
sparseVectorFieldMapper, | ||
copyTo, | ||
this | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure about this. I am thinking there is a top level field like _inference_results
maybe?
It gets really tricky to dynamically default to not including fields in the results.
How we store these things will likely be dictated to how we figure out how to default to not including them in _source
in search requests, while still allowing users to specifically request them (and allowing them to be indexed via reindex).
Take a look at MetadataFieldMapper
for some inspiration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - so the idea would be instead of the subfields nesting into the actual semantic_text
field, to use a top level field that would nest all inference results. The _inference_results
field would be populated by the ingestion process.
So you're suggesting we create a new MetadataFieldMapper
(or similar) that handles all the information that is passed in _source
under the _inference_results
and create the appropriate Lucene fields for storing that.
I'll give it a go as a separate PoC to check, thanks for the pointers!
semantic_text
mapping information is added to theMappingLookup
structure, so it can be retrieved from the Field Inference service.Some fixes were done to both semantic_text field type and the field inference service so they are compatible with multiple inference fields in the same doc.
Code for testing:
Deploy ELSERv2 model:
Create an index mapping with the real model id used:
Ingest some doc:
Inference process uses the model_id specified in the mapping, and produces the following doc: