Awesome

clip-grams

clip-grams is a tool for creating Faiss knn indices from CLIP embeddings of large text files. It is primarily designed for:

Image tagging
Analyzing CLIP's ability to describe images with different categories of text

We make use of autofaiss to automatically estimate the search parameters.

Quickstart

A colab is available that illustrates the key functions on a small collection of images and text.

Getting started

Install requirements:

pip install -r requirements.txt

Clone CLIP into this project's repository:

git clone https://github.com/openai/CLIP

There are two main functions. Suppose we have a folder with text files from which an index is to be constructed. To compute a Faiss index using autofaiss:

python3 index.py --text_dir=[path to text folder] --index_dir=[folder to store index] --use_line=true

The --use_line argument indicates that each line from each text file is considered an entry. We can also pass arguments to include unigrams, bigrams and trigrams. The --topk_ngrams argument sets an upper bound on the number of n-gram entries. The --filter argument will only include n-grams that occur at least that many times in the corpus. A prefix can be passed for CLIP encoding using the --prefix argument. The --chunk_size argument specifies approximately how many entries each npy file of CLIP embeddings will have. Several arguments are also directly passed to autofaiss for index construction. See index.py for full list of arguments.

If you've already created an index and want to create another without having to re-compute npy files, don't pass anything to --text_dir. It will immediately skip to computing a new index.

Once an index is created, we can use it to tag all images that occur in a directory:

python3 tag.py --image_dir=[path to image folder] --index_dir=[folder where index lives] --knn=5

This will create a new file with a .knn extension for each image in the directory, using the same stem. The --knn argument will return the top-k ranked entries from the index. See tag.py for full list of arguments.

Acknowledgements

Thanks to Christoph S. from EleutherAI for data processing.

The clip-retrieval repo from rom1504, from which a lot of this code has been inspired from.

TODO

Handle more general input types
Batch/Dataset tagging
Multi-GPU inference (low priority for now)