Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Weaviate #180

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Add support for Weaviate #180

wants to merge 12 commits into from

Conversation

hsm207
Copy link

@hsm207 hsm207 commented Jun 12, 2024

Signed-off-by: hsm207 [email protected]


def enable_ltr(self, collection):
"Initializes LTR dependencies for a given collection"
raise NotImplementedError("¯\\_(ツ)_/¯")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method may go away (TBD). It's currently just a hook to tell the engine to adjust it's configuration if needed to support LTR on the given collection. In the case of Weaviate, I assume the LTR model will be running and invoked outside the engine, so you'll either have it always running (in which enable_ltr can be a noop) OR you can use enable_ltr to copy any data/models/config needed into place.

@treygrainger
Copy link
Owner

@hsm207 - Awesome to see you working on this. If you have any questions, don't hesitate to reach out.

After doing several other implementations, here's a bit of an implementation checklist for key things you'll come across during the implementation:

Dockerfile / Docker compose configuration
Install Spark connector (inside the aips-notebooks Dockerfle)
Collection management: creation/deletion/healthcheck
Collection schemas:
Primative field types: text, string, keyword, boolean, integer, double
location coordinate field
dense vector field: dimensions (512, 768), vector encoding/quantization (1bit, 32 bits), and dot_product similarity
tokenizers/filters: comma delimited, lower case, whitespace/punctuation, NGram, delimited payload

Query functionality:
sorting, filtering, limit, query fields, return fields
multi-field search
AND/OR/NOT operators
minimum phrase matching
query time boosting
index time boosting
vector search
reranking by query
highlighting
debug/explain
spell check/autocomplete

There are some other things like hybrid search (reciprocal rank fusion) that are implemented at the Collection level already generically, but that you can override in the WeaviateCollection to push down into the engine, since Weaviate has native support for that built in.

As mentioned in the /engines/README.md, the LTR implementation is required, but can be done outside the engine. Happy to chat with you on this if you need a generic implementation. The SparseLexicalSemanticSearch implementation is likewise required, but it's just crafting some very specific Weaviate query syntax for a handful of specific query patterns (popularity boosting, geo radius filtering, etc.) I wouldn't worry about the EntityExtractor or the SemanticKnowledgeGraph, as most engines don't have this built in and you just treat this as an external library call.

At any rate, hope that's helpful. Let us know if you have any questions we can assist with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants