Releases · dsg-polymtl/flockmtl-duckdb

FlockMTL v0.1.0 is a DuckDB extension that integrates language model (LLM) capabilities directly into your queries and workflows. This experimental extension enables DuckDB users to add semantic analysis (classification, filtering, completion, all w/ structured output) and embeddings using GPT models—all from within SQL commands. Following the tradition of declarativity, we introduce an administrative view of MODEL(s) and PROMPT(s) akin to TABLE(s).

Key Features:

LLM Integration: Seamlessly call OpenAI models, such as GPT-4, directly from SQL with different function signatures to obtain structured output, VARCHAR completions, boolean filters.
Prompt Definition: Define and store custom prompts for repeated use across queries. Similarly, fix the query once with a prompt name and iterate on the prompt separately or even change it later.
Model Definition: Define and switch between different models by giving them names and updating the models name without a change to the query.

The full documentation can be found in our repo's README file.

Next, we demonstrate how to use the DuckDB LLM Extension and using product reviews analysis as an examples using a product_reviews table with attributes review_text, review_id, and customer_name. You can also manage LLM prompts and models but we do not show those examples here.

1. Text Generation with `llm_complete`

llm_complete generates text based on a given prompt and LLM model. The following examples show how this functionality can be used in real-world scenarios.

Basic Text Generation – Product Description:
Imagine you have a list of product names, and you want to automatically generate product descriptions based on the product name.

SELECT product_name, llm_complete('generate_product_description', 'default', {'product_name': product_name}, {'max_tokens': 150}) AS product_description 
FROM products;

2. JSON Output with `llm_complete_json`

llm_complete_json is useful when you want structured output, such as detailed classification or multi-part answers.

Example: Sentiment and Detailed Analysis:
Classify whether a review is positive or negative and provide a detailed analysis of the sentiment in JSON format.
```
SELECT review_id, llm_complete_json('detailed_sentiment_analysis', 'default', {'text': review_text}, {'max_tokens': 200}) AS analysis 
FROM product_reviews;
```
Result:

review_id analysis

112233 {"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]}

review_id	analysis
112233	{"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]}

3. Filtering with `llm_filter`

Use llm_filter to filter records based on LLM-powered classifications, such as identifying specific sentiments or themes in product reviews.

Example: Filter Reviews Mentioning Specific Features:
Filter reviews to return only those that mention specific product features, such as "battery life" or "performance."
```
SELECT review_id, customer_name, review_text 
FROM product_reviews 
WHERE llm_filter('mentions_feature', 'default', {'text': review_text, 'feature': 'battery life'});
```
Result:

review_id customer_name review_text

56789 John Doe "The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge."

review_id	customer_name	review_text
56789	John Doe	"The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge."

4. Text Embedding with `llm_embedding`

llm_embedding generates vector embeddings for text, which can be used for tasks like semantic similarity, clustering, or advanced search.

Example: Find Reviews Similar to a Target Review:
Generate embeddings for each review and compare them to a target review to find similar reviews based on their semantic content.

WITH input_embedding AS (
    SELECT llm_embedding({'text': 'Comfortable to wear, but the sound quality is just average. Expected better for the price.'},
             'text-embedding-3-small') AS embedding
),
review_embeddings AS (
    SELECT 
        review_id, 
        array_distance(input_embedding.embedding::DOUBLE[1536], 
                       llm_embedding({'text': review_text}, 'text-embedding-3-small')::DOUBLE[1536]) AS similarity
    FROM product_reviews, input_embedding
)
SELECT review_id AS similar_review_id
FROM review_embeddings
ORDER BY similarity
LIMIT 1;

Result:

similar_review_id
4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Text Generation with `llm_complete`

2. JSON Output with `llm_complete_json`

3. Filtering with `llm_filter`

4. Text Embedding with `llm_embedding`

Releases: dsg-polymtl/flockmtl-duckdb

FlockMTL v0.1.0 "Schwartz's Deli"

1. Text Generation with llm_complete

2. JSON Output with llm_complete_json

3. Filtering with llm_filter

4. Text Embedding with llm_embedding

1. Text Generation with `llm_complete`

2. JSON Output with `llm_complete_json`

3. Filtering with `llm_filter`

4. Text Embedding with `llm_embedding`