Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we request similarity search? API? What does the API look like? #28

Open
1 of 2 tasks
Tracked by #26
kikuomax opened this issue Oct 9, 2023 · 3 comments
Open
1 of 2 tasks
Tracked by #26
Assignees
Milestone

Comments

@kikuomax
Copy link
Member

kikuomax commented Oct 9, 2023

Simply receive a query text? Or receive an embedding vector?

@kikuomax kikuomax added this to the Sprint 6 milestone Oct 9, 2023
@kikuomax kikuomax self-assigned this Oct 9, 2023
@kikuomax
Copy link
Member Author

kikuomax commented Oct 9, 2023

I think it is OK to receive a query text. OpenAI's embedding vector needs at least 1,536 × 4 bytes ≒ 6 KB in binary form.

@kikuomax
Copy link
Member Author

kikuomax commented Oct 9, 2023

We do not want to expose an API to calculate embeddings to prevent abuse.

@kikuomax
Copy link
Member Author

kikuomax commented Oct 12, 2023

The core of the similarity search will be a Lambda function implemented with flechasdb + flechasdb-s3.

The viewer may directly invoke the Lambda function.

@kikuomax kikuomax moved this to In Progress in FlechasDB Alpha Oct 12, 2023
kikuomax added a commit to kikuomax/mumble that referenced this issue Oct 15, 2023
- Introduces a new submodule `utils/search` that provides the search
  features. So far, it provides a function `searchSimilarMumblings` that
  performs similarity search on mumblings.

issue codemonger-io#28
kikuomax added a commit to kikuomax/mumble that referenced this issue Oct 15, 2023
- Introduces a new CDK construct `Indexer` that will provision resources
  necessary for indexing and search of data. So far, it provides a
  Lambda function `SearchSimilarLambda` that performs similarity search
  over mumblings. The function is implemented in Rust and located in
  `lambda/indexer` as `search-similar` binary.
  `Indexer` also provisions an S3 bucket to store database files.

issue codemonger-io#28
kikuomax added a commit to kikuomax/mumble that referenced this issue Oct 15, 2023
- `CdkStack` provisions `Indexer` and links `Viewer` and `Indexer`.

issue codemonger-io#28
kikuomax added a commit that referenced this issue Oct 17, 2023
- Introduces a new submodule `utils/search` that provides the search
  features. So far, it provides a function `searchSimilarMumblings` that
  performs similarity search on mumblings.

issue #28
kikuomax added a commit that referenced this issue Oct 17, 2023
- Introduces a new CDK construct `Indexer` that will provision resources
  necessary for indexing and search of data. So far, it provides a
  Lambda function `SearchSimilarLambda` that performs similarity search
  over mumblings. The function is implemented in Rust and located in
  `lambda/indexer` as `search-similar` binary.
  `Indexer` also provisions an S3 bucket to store database files.

issue #28
kikuomax added a commit that referenced this issue Oct 17, 2023
- `CdkStack` provisions `Indexer` and links `Viewer` and `Indexer`.

issue #28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant