Releases: dsg-polymtl/flockmtl-duckdb
FlockMTL v0.1.0 "Schwartz's Deli"
FlockMTL v0.1.0 is a DuckDB extension that integrates language model (LLM) capabilities directly into your queries and workflows. This experimental extension enables DuckDB users to add semantic analysis (classification, filtering, completion, all w/ structured output) and embeddings using GPT models—all from within SQL commands. Following the tradition of declarativity, we introduce an administrative view of MODEL
(s) and PROMPT
(s) akin to TABLE
(s).
Key Features:
- LLM Integration: Seamlessly call OpenAI models, such as GPT-4, directly from SQL with different function signatures to obtain structured output, VARCHAR completions, boolean filters.
- Prompt Definition: Define and store custom prompts for repeated use across queries. Similarly, fix the query once with a prompt name and iterate on the prompt separately or even change it later.
- Model Definition: Define and switch between different models by giving them names and updating the models name without a change to the query.
The full documentation can be found in our repo's README file.
Next, we demonstrate how to use the DuckDB LLM Extension and using product reviews analysis as an examples using a product_reviews
table with attributes review_text
, review_id
, and customer_name
. You can also manage LLM prompts and models but we do not show those examples here.
1. Text Generation with llm_complete
llm_complete
generates text based on a given prompt and LLM model. The following examples show how this functionality can be used in real-world scenarios.
- Basic Text Generation – Product Description:
Imagine you have a list of product names, and you want to automatically generate product descriptions based on the product name.SELECT product_name, llm_complete('generate_product_description', 'default', {'product_name': product_name}, {'max_tokens': 150}) AS product_description FROM products;
2. JSON Output with llm_complete_json
llm_complete_json
is useful when you want structured output, such as detailed classification or multi-part answers.
- Example: Sentiment and Detailed Analysis:
Classify whether a review is positive or negative and provide a detailed analysis of the sentiment in JSON format.Result:SELECT review_id, llm_complete_json('detailed_sentiment_analysis', 'default', {'text': review_text}, {'max_tokens': 200}) AS analysis FROM product_reviews;
review_id analysis 112233 {"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]}
3. Filtering with llm_filter
Use llm_filter
to filter records based on LLM-powered classifications, such as identifying specific sentiments or themes in product reviews.
- Example: Filter Reviews Mentioning Specific Features:
Filter reviews to return only those that mention specific product features, such as "battery life" or "performance."Result:SELECT review_id, customer_name, review_text FROM product_reviews WHERE llm_filter('mentions_feature', 'default', {'text': review_text, 'feature': 'battery life'});
review_id customer_name review_text 56789 John Doe "The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge."
4. Text Embedding with llm_embedding
llm_embedding
generates vector embeddings for text, which can be used for tasks like semantic similarity, clustering, or advanced search.
- Example: Find Reviews Similar to a Target Review:
Generate embeddings for each review and compare them to a target review to find similar reviews based on their semantic content.Result:WITH input_embedding AS ( SELECT llm_embedding({'text': 'Comfortable to wear, but the sound quality is just average. Expected better for the price.'}, 'text-embedding-3-small') AS embedding ), review_embeddings AS ( SELECT review_id, array_distance(input_embedding.embedding::DOUBLE[1536], llm_embedding({'text': review_text}, 'text-embedding-3-small')::DOUBLE[1536]) AS similarity FROM product_reviews, input_embedding ) SELECT review_id AS similar_review_id FROM review_embeddings ORDER BY similarity LIMIT 1;
similar_review_id 4