Awesome

FlockMTL Extension

FlockMTL is a DuckDB extension that integrates language model (LLM) capabilities directly into your queries and workflows. This experimental extension enables DuckDB users to add semantic analysis (classification, filtering, completion, all w/ structured output) and embeddings using GPT models—all from within SQL commands. Following the tradition of declarativity, we introduce an administrative view of MODEL(s) and PROMPT(s) akin to TABLE(s). While the extension will load successfully, it requires the environment variable OPENAI_API_KEY in your environment to make requests. If not found, the use of our functions will throw an error regarding obtaining an API key.

FlockMTL Extension
- Table of Contents
  - Installation
  - Documentation

Installation

Install the extension as a Community Extension.

Run:

INSTALL flockmtl FROM community;
LOAD flockmtl;

Documentation

Next, we demonstrate how to use the DuckDB LLM Extension and as an example we would like to analyze product reviews. Specifically, we’ll focus on generating text, classifying reviews as positive or negative, and working with text embeddings using a product_reviews table with attributes review_text, review_id, and customer_name.

1. Text Generation with `llm_complete`

llm_complete generates text based on a given prompt and LLM model. The following examples show how this functionality can be used in real-world scenarios.

Basic Text Generation – Product Description:
Imagine you have a list of product names, and you want to automatically generate product descriptions based on the product name.

SELECT product_name, llm_complete('generate_product_description', 'default', {'product_name': product_name}, {'max_tokens': 150}) AS product_description 
FROM products;

Result:

product_name	product_description
Running Shoes	"These lightweight running shoes offer superior comfort and breathability for long-distance runners. Featuring a cushioned sole and durable upper, they ensure maximum performance on various terrains."

Example: Sentiment Classification – Complex Review:
Classify whether a review is positive based on more than just the review text, such as review length and other metadata. This could help you get a more nuanced classification.
```
SELECT review_id, llm_complete('is_positive', 'default', {'text': review_text, 'length': LENGTH(review_text)}, {'max_tokens': 100}) AS results 
FROM product_reviews;
```
Result:

review_id results
12345 "Positive - Great in-depth feedback, long review indicating high engagement."
Example: Generate Responses to Negative Reviews:
Automatically generate personalized responses to negative reviews, allowing your team to respond quickly and professionally.
```
SELECT review_id, llm_complete('generate_response', 'default', {'review': review_text, 'sentiment': 'negative'}, {'max_tokens': 100}) AS response 
FROM product_reviews
WHERE llm_filter('is_negative', 'default', {'text': review_text});
```
Result:

review_id response
54321 "We're sorry to hear about your experience. We value your feedback and would like to offer a solution. Please contact our support team for further assistance."

review_id	results
12345	"Positive - Great in-depth feedback, long review indicating high engagement."

review_id	response
54321	"We're sorry to hear about your experience. We value your feedback and would like to offer a solution. Please contact our support team for further assistance."

2. JSON Output with `llm_complete_json`

llm_complete_json is useful when you want structured output, such as detailed classification or multi-part answers.

Example: Sentiment and Detailed Analysis:
Classify whether a review is positive or negative and provide a detailed analysis of the sentiment in JSON format.
```
SELECT review_id, llm_complete_json('detailed_sentiment_analysis', 'default', {'text': review_text}, {'max_tokens': 200}) AS analysis 
FROM product_reviews;
```
Result:

review_id analysis
112233 {"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]}

review_id	analysis
112233	{"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]}

Example: Review Summary in JSON Format:
Summarize a review and return a JSON output with sentiment, summary, and key points.

SELECT review_id, llm_complete_json('summarize_review', 'default', {'text': review_text}, {'max_tokens': 150}) AS summary 
FROM product_reviews;

Result:

review_id	summary
332211	{"summary": "The customer experienced a slow response time from support but was satisfied with the product quality overall.", "sentiment": "mixed", "key_points": ["slow response", "good product quality"]}

3. Filtering with `llm_filter`

Use llm_filter to filter records based on LLM-powered classifications, such as identifying specific sentiments or themes in product reviews.

Example: Filter Reviews Mentioning Specific Features:
Filter reviews to return only those that mention specific product features, such as "battery life" or "performance."
```
SELECT review_id, customer_name, review_text 
FROM product_reviews 
WHERE llm_filter('mentions_feature', 'default', {'text': review_text, 'feature': 'battery life'});
```
Result:

review_id customer_name review_text
56789 John Doe "The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge."
Example: Filter Reviews with High Confidence in Sentiment:
Filter reviews based on the confidence level of sentiment classification, showing only those with high confidence in the result.
```
SELECT review_id, customer_name, review_text 
FROM product_reviews 
WHERE llm_filter('is_positive', 'default', {'text': review_text, 'confidence_threshold': 0.9});
```
Result:

review_id customer_name review_text
99887 Jane Smith "Absolutely loved this product! Exceeded all my expectations."

review_id	customer_name	review_text
56789	John Doe	"The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge."

review_id	customer_name	review_text
99887	Jane Smith	"Absolutely loved this product! Exceeded all my expectations."

4. Text Embedding with `llm_embedding`

llm_embedding generates vector embeddings for text, which can be used for tasks like semantic similarity, clustering, or advanced search.

Example: Find Reviews Similar to a Target Review:
Generate embeddings for each review and compare them to a target review to find similar reviews based on their semantic content.

WITH input_embedding AS (
    SELECT llm_embedding({'text': 'Comfortable to wear, but the sound quality is just average. Expected better for the price.'},
             'text-embedding-3-small') AS embedding
),
review_embeddings AS (
    SELECT 
        review_id, 
        array_distance(input_embedding.embedding::DOUBLE[1536], 
                       llm_embedding({'text': review_text}, 'text-embedding-3-small')::DOUBLE[1536]) AS similarity
    FROM product_reviews, input_embedding
)
SELECT review_id AS similar_review_id
FROM review_embeddings
ORDER BY similarity
LIMIT 1;

Result:

similar_review_id
4

You can manage LLM prompts and models dynamically in DuckDB using the following commands:

5. Prompt Management

Get All Prompts:
```
GET PROMPTS;
```
Get Specific Prompt:
Retrieve the content of a specific prompt by name.
```
GET PROMPT <prompt_name>;
```

Create a New Prompt:

CREATE PROMPT(<prompt_name>, <prompt_text>);

Update an Existing Prompt:

UPDATE PROMPT(<prompt_name>, <new_prompt_text>);

Delete a Prompt:
```
DELETE PROMPT <prompt_name>;
```

6. Model Management

Get All Models:
```
GET MODELS;
```
Get Specific Model:
Retrieve the details of a specific model by name.
```
GET MODEL <model_name>;
```

Create a New Model:

CREATE MODEL(<model_name>, <model_type>, <max_tokens>);

Update a Model:

UPDATE MODEL(<model_name>, <model_type>, <max_tokens>);

Delete a Model:
```
DELETE MODEL <model_name>;
```

Awesome

FlockMTL Extension

Table of Contents

Installation

Documentation

1. Text Generation with llm_complete

2. JSON Output with llm_complete_json

3. Filtering with llm_filter

4. Text Embedding with llm_embedding

5. Prompt Management

6. Model Management

1. Text Generation with `llm_complete`

2. JSON Output with `llm_complete_json`

3. Filtering with `llm_filter`

4. Text Embedding with `llm_embedding`