Skip to content

Commit

Permalink
Merge pull request monarch-initiative#70 from iQuxLE/improve_batching…
Browse files Browse the repository at this point in the history
…_embed_duckdbadapter

improve batching and minimise API requests when embedding docs in `duckdb_adapter`
  • Loading branch information
caufieldjh authored Aug 29, 2024
2 parents 4475f64 + 3fef41e commit b42111e
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/curate_gpt/store/duckdb_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,8 @@ def _process_objects(
texts = [tokenizer.decode(tokens) for tokens in current_batch]
short_name, _ = MODEL_MAP[openai_model]
embedding_model = llm.get_embedding_model(short_name)
embeddings = list(embedding_model.embed_multi(texts))
logger.info(f"Number of texts/docs to embed in batch: {len(texts)}")
embeddings = list(embedding_model.embed_multi(texts, len(texts)))
logger.info(f"Number of Documents in batch: {len(embeddings)}")
batch_embeddings.extend(embeddings)

Expand Down

0 comments on commit b42111e

Please sign in to comment.