Awesome
sqlite-vss
[!WARNING]
sqlite-vss
is not in active development. Instead, my effort is now going towardssqlite-vec
, which is a similar vector search SQLite extension, but should be much easier to install and use thansqlite-vss
. See this blog post for more info.
sqlite-vss
(SQLite <b><u>V</u></b>ector <b><u>S</u></b>imilarity <b><u>S</u></b>earch) is a SQLite extension that brings vector search capabilities to SQLite, based on Faiss. It can be used to build semantic search engines, recommendations, or questions-and-answering tools.
See Introducing sqlite-vss: A SQLite Extension for Vector Search (February 2023) for more details and a live example!
If your company or organization finds this library useful, consider supporting my work!
Usage
.load ./vector0
.load ./vss0
select vss_version(); -- 'v0.0.1'
sqlite-vss
has a similar API to the fts5
Full-Text Search Extension. Use the vss0
module to create virtual tables that can efficiently store and query your vectors.
-- 384 == number of dimensions for this example
create virtual table vss_articles using vss0(
headline_embedding(384),
description_embedding(384),
);
sqlite-vss
is a Bring-your-own-vectors database, it is compatable with any embedding or vector data you have. Consider using OpenAI's Embeddings API, HuggingFace's Inference API, sentence-transformers
, or any of these open source model. In this example, we are using sentence-transformers/all-MiniLM-L6-v2 to generate embeddings from our text, which have 384 dimensions.
You can insert vectors into vss0
tables as JSON or raw bytes.
insert into vss_articles(rowid, headline_embedding)
select rowid, headline_embedding from articles;
To query for similar vectors ("k nearest neighbors"), use the vss_search
function in the WHERE
clause. Here we are searching for the 100 nearest neighbors to the embedding in row #123 in the articles
table.
select rowid, distance
from vss_articles
where vss_search(
headline_embedding,
(select headline_embedding from articles where rowid = 123)
)
limit 100;
You can INSERT
and DELETE
into these tables as necessary, but UPDATE
operations aren't supported yet. This can be used with triggers for automatically updated indexes. Also note that "small" INSERT
/DELETE
operations that only insert a few rows can be slow, so batch where necessary.
begin;
delete from vss_articles
where rowid between 100 and 200;
insert into vss_articles(rowid, headline_embedding, description_embedding)
values (:rowid, :headline_embedding, :description_embedding)
commit;
You can pass in custom Faiss factory strings for specific columns to control how the Faiss index is stored and queried. By default the factory string is "Flat,IDMap2"
, which can be slow to query as your database grows. Here, we add an inverted file index with 4096 centroids, a non-exhaustive option that makes large database queries much faster.
create virtual table vss_ivf_articles using vss0(
headline_embedding(384) factory="IVF4096,Flat,IDMap2",
description_embedding(384) factory="IVF4096,Flat,IDMap2"
);
This IVF will require training! You can define training data with a INSERT
command in a single transaction, with the special operation="training"
constraint.
insert into vss_ivf_articles(operation, headline_embedding, description_embedding)
select
'training',
headline_embedding,
description_embedding
from articles;
Beware! Indexes that require training can take a long time. With the News Category Dataset (386 dimension over 210k vectors) that this example is based on, the default index would take 8 seconds to build. But with the custom "IVF4096,Flat,IDMap2"
factory, it took 45 minutes to train and 4.5 minutes to insert data! This likely can be reduced with a smaller training set, but the faster queries can be helpful.
Documentation
See docs.md
for a instructions to compile sqlite-vss
yourself, as well as a full SQL API reference.
Installing
The Releases page contains pre-built binaries for Linux x86_64 and MacOS x86_64 (MacOS Big Sur 11 or higher). More pre-compiled targets will be available in the future. Additionally, sqlite-vss
is distributed on common package managers like pip
for Python and npm
for Node.js, see below for details.
Do note that on Linux machines, you'll have to install some packages to make these options work:
sudo apt-get update
sudo apt-get install -y libgomp1 libatlas-base-dev liblapack-dev
Note: The
0
in the filename (vss0.dylib
/vss0.so
) denotes the major version ofsqlite-vss
. Currentlysqlite-vss
is pre v1, so expect breaking changes in future versions.
Language | Install | More Info | |
---|---|---|---|
Python | pip install sqlite-vss | sqlite-vss with Python | |
Datasette | datasette install datasette-sqlite-vss | sqlite-vss with Datasette | |
Node.js | npm install sqlite-vss | sqlite-vss with Node.js | |
Deno | deno.land/x/sqlite_vss | sqlite-vss with Deno | |
Ruby | gem install sqlite-vss | sqlite-vss with Ruby | |
Elixir | hex.pm/packages/sqlite_vss | sqlite-vss with Elixir | |
Go | go get -u github.com/asg017/sqlite-vss/bindings/go | sqlite-vss with Go | |
Rust | cargo add sqlite-vss | sqlite-vss with Rust | |
Github Release |
With the sqlite3
CLI
For using sqlite-vss
with the official SQLite command line shell, download the vector0.dylib
/vss0.dylib
(for MacOS Big Sur 11 or higher) or vector0.so
/vss0.so
(Linux) files from a release and load it into your SQLite environment.
The vector0
extension is a required dependency, so make sure to load that before vss0
.
.load ./vector0
.load ./vss0
select vss_version();
-- v0.0.1
Python
For Python developers, install the sqlite-vss
package with:
pip install sqlite-vss
import sqlite3
import sqlite_vss
db = sqlite3.connect(':memory:')
db.enable_load_extension(True)
sqlite_vss.load(db)
version, = db.execute('select vss_version()').fetchone()
print(version)
See bindings/python
for more details.
Node.js
For Node.js developers, install the sqlite-vss
npm package with:
npm install sqlite-vss
import Database from "better-sqlite3"; // also compatible with node-sqlite3
import * as sqlite_vss from "sqlite-vss";
const db = new Database(":memory:");
sqlite_vss.load(db);
const version = db.prepare("select vss_version()").pluck().get();
console.log(version);
See npm/sqlite-vss/README.md
for more details.
Deno
For Deno developers, use the deno.land/x/sqlite_vss module:
// Requires all permissions (-A) and the --unstable flag
import { Database } from "https://deno.land/x/sqlite3@0.8.0/mod.ts";
import * as sqlite_vss from "https://deno.land/x/sqlite_vss/mod.ts";
const db = new Database(":memory:");
db.enableLoadExtension = true;
sqlite_vss.load(db);
const [version] = db.prepare("select vss_version()").value<[string]>()!;
console.log(version);
See deno/sqlite-vss/README.md
for more details.
Datasette
And for Datasette, install the datasette-sqlite-vss
plugin with:
datasette install datasette-sqlite-vss
See bindings/datasette
for more details.
Disadvantages
- The underlying Faiss indicies are capped at 1GB. Follow #1 for updates.
- Additional filtering on top of KNN searches aren't supported yet. Follow #2 for updates.
- Only CPU Faiss indicies are supported, not GPU yet. Follow #3 for updates.
- mmap'ed indices aren't supported yet, so indicies have to fit in RAM. Follow #4 for updates.
- This extension is written in C++ and doesn't have fuzzy testing yet. Follow #5 for updates.
UPDATE
statements on vss0 virtual tables are not supported, thoughINSERT
andDELETE
statements are. Follow #7 for updates.
Supporting
I (Alex 👋🏼) spent a lot of time and energy on this project and many other open source projects. If your company or organization uses this library (or you're feeling generous), then please consider sponsoring my work, sharing this project with a friend, or hiring me for contract/consulting work!
See Also
sqlite-http
, a SQLite extension for making HTTP requestssqlite-xsv
, a fast SQLite extension for querying CSVssqlite-loadable-rs
, a framework for building SQLite extensions in Rust