Home

Awesome

hnsqlite

hnsqlite is a text-centric integration of SQLite and Hnswlib to provide a persistent collection of embeddings (strings, vectors, and metadata) and search time filtering based on the metadata.

Classes

Collection

The Collection class represents a combination of a SQLite database and an HNSWLIB index. The purpose of this class is to provide a persistent collection of embeddings (strings, vectors, and metadata) and search time filtering based on the metadata.

Embedding

Embedding is a class that represents an embedding sent to or received from the Collection API.

Attributes:

SearchResponse

SearchResponse is a class derived from the Embedding class, specifically designed for returning search results. A SearchResponse object consists of an embedding along with its distance to the query vector.

Attributes:

Collection Methods

Database classes

The following classes are the internal SqlModel data classes used to persist the embeddings and configuration in sqlite. They are not directly accessed by the user, but will be created as tables in the sqlite database:

Usage

To use hnsqlite, you can create a new collection, add items to it, and perform search operations. Here's an example:

from hnsqlite import Collection
import numpy as np

# Create a new collection
collection = Collection(collection_name="example", dim=128)

# Add items to the collection
vectors = [np.random.rand(128) for _ in range(10)]
texts = [f"Text {i}" for i in range(10)]
collection.add_items(vectors, texts)

# Get the number of items in the collection
item_count = collection.count()
print(f"Number of items in the collection: {item_count}")

# Search for the nearest neighbors of a query vector
query_vector = np.random.rand(128)
results = collection.search(query_vector, k=5)

# Print the search results
for result in results:
    print(f"Item: {result}, Distance: {result.distance}")

Filtering

The filtering function is designed to support metadata filtering similar to MongoDB. It utilizes the hnswlib filtering function to accept or reject nearest neighbot candidates based on the embedding metadata matching a search time filtering criteria.

Supported Metadata

The embedding metadata is a dictionary that stores metadata associated with items in the collection. The keys represent the field names of the metadata, and the supported values are strings, numbers, booleans or lists of strings.

Example of a metadata dictionary:

{
    "author": "John Doe",
    "rating": 4.5,
    "tags": ["python", "database", "search"]
}

Filtering Operations

The search function supports a filter similar to MongoDB.

The following operations are supported:

Usage

filter_dict = {
    "rating": {"$gte": 4},
    "tags": {"$in": ["python", "search"]},
    "$or": [
        {"author": {"$eq": "John Doe"}},
        {"author": {"$eq": "Jane Smith"}}
    ]
}

metadata_dict = {
    "author": "John Doe",
    "rating": 4.5,
    "tags": ["python", "database", "search"]
}

result = filter_item(filter_dict, metadata_dict)

The result will be True if the metadata_dict satisfies the conditions defined in the filter_dict. In this example, the metadata has a rating greater than or equal to 4 and at least one tag from the specified list, the author is either "John Doe" or "Jane Smith", so the result will be True.`

This will create a new collection with 10 random embeddings, get the number of items in the collection, search for the 5 nearest neighbors of a random query vector.