Home

Awesome

pybioclip

PyPI - Version PyPI - Python Version


Command line tool and python package to simplify using BioCLIP, including for taxonomic or other label prediction on (and thus annotation or labeling of) images, as well as for generating semantic embeddings for images. No particular understanding of ML or computer vision is required to use it. It also implements a number of performance optimizations for batches of images or custom class lists, which should be particularly useful for integration into computational workflows.

Table of Contents

Requirements

Installation

pip install pybioclip

If you have any issues with installation, please first upgrade pip by running pip install --upgrade pip.

Python Package Usage

Example Notebooks

Predict species classification

from bioclip import TreeOfLifeClassifier, Rank

classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)

for prediction in predictions:
    print(prediction["species"], "-", prediction["score"])

Output:

Ursus arctos - 0.9356034994125366
Ursus arctos syriacus - 0.05616999790072441
Ursus arctos bruinosus - 0.004126196261495352
Ursus arctus - 0.0024959812872111797
Ursus americanus - 0.0005009894957765937

Output from the predict() method showing the dictionary structure:

[{
    'kingdom': 'Animalia',
    'phylum': 'Chordata',
    'class': 'Mammalia',
    'order': 'Carnivora',
    'family': 'Ursidae',
    'genus': 'Ursus',
    'species_epithet': 'arctos',
    'species': 'Ursus arctos',
    'common_name': 'Kodiak bear'
    'score': 0.9356034994125366
}]

The output from the predict function can be converted into a pandas DataFrame like so:

import pandas as pd
from bioclip import TreeOfLifeClassifier, Rank

classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
df = pd.DataFrame(predictions)

The first argument of the predict() method supports both a single path or a list of paths.

Predict from a list of classes

from bioclip import CustomLabelsClassifier

classifier = CustomLabelsClassifier(["duck","fish","bear"])
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
   print(prediction["classification"], prediction["score"])

Output:

duck 1.0306726583309e-09
fish 2.932403668845507e-12
bear 1.0

Predict from a list of classes with binning

from bioclip import CustomLabelsBinningClassifier
classifier = CustomLabelsBinningClassifier(cls_to_bin={
  'dog': 'small',
  'fish': 'small',
  'bear': 'big',
})
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
   print(prediction["classification"], prediction["score"])

Output:

big 0.99992835521698
small 7.165559509303421e-05

PIL Images

The predict() functions used in all the examples above allow passing a list of paths or a list of PIL Images. When a list of PIL images is passed the index of the image will be filled in for file_name. This is because PIL images may not have an associated file name.

Command Line Usage

bioclip predict [-h] [--format {table,csv}] [--output OUTPUT]
                [--rank {kingdom,phylum,class,order,family,genus,species} | --cls CLS | --bins BINS]
                [--k K] [--device DEVICE] image_file [image_file ...]
bioclip embed [-h] [--device=DEVICE] [--output=OUTPUT] [IMAGE_FILE...]

Commands:
    predict            Use BioCLIP to generate predictions for image files.
    embed              Use BioCLIP to generate embeddings for image files.

Arguments:
  IMAGE_FILE           input image file

Options:
  -h --help
  --format=FORMAT      format of the output (table or csv) for predict mode [default: csv]
  --rank {kingdom,phylum,class,order,family,genus,species}
                        rank of the classification, default: species (when)
  --cls CLS             classes to predict: either a comma separated list or a path to a text file of classes (one per line), when specified the
                        --rank and --bins arguments are not allowed.
  --bins BINS           path to CSV file with two columns with the first being classes and second being bin names, when specified the --cls
                        argument is not allowed.
  --k K                 number of top predictions to show, default: 5
  --device=DEVICE      device to use matrix math (cpu or cuda or mps) [default: cpu]
  --output=OUTFILE     print output to file OUTFILE [default: stdout]

Predict classification

Predict species for an image

The example images used below are Ursus-arctos.jpeg and Felis-catus.jpeg both from the bioclip-demo.

Predict species for an Ursus-arctos.jpeg file:

bioclip predict Ursus-arctos.jpeg

Output:

bioclip predict Ursus-arctos.jpeg
file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.9356034994125366
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.05616999790072441
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004126196261495352
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.0024959812872111797
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.0005009894957765937

Predict species for multiple images saving to a file

To make predictions for files Ursus-arctos.jpeg and Felis-catus.jpeg saving the output to a file named predictions.csv:

bioclip predict --output predictions.csv Ursus-arctos.jpeg Felis-catus.jpeg

The contents of predictions.csv will look like this:

file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.9356034994125366
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.05616999790072441
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004126196261495352
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.0024959812872111797
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.0005009894957765937
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,silvestris,Felis silvestris,European Wildcat,0.7221033573150635
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,catus,Felis catus,Domestic Cat,0.19810837507247925
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,margarita,Felis margarita,Sand Cat,0.02798456884920597
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Lynx,felis,Lynx felis,,0.021829601377248764
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,bieti,Felis bieti,Chinese desert cat,0.010979168117046356

Predict top 3 genera for an image and display output as a table

bioclip predict --format table --k 3 --rank=genus Ursus-arctos.jpeg

Output:

+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
|     file_name     | kingdom  |  phylum  |  class   |    order     |  family  | genus  |         score          |
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia |  Carnivora   | Ursidae  | Ursus  |   0.9994320273399353   |
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Artiodactyla | Cervidae | Cervus | 0.00032594642834737897 |
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Artiodactyla | Cervidae | Alces  | 7.803700282238424e-05  |
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+

Predict from a list of classes

Create predictions for 3 classes (cat, bird, and bear) for image Ursus-arctos.jpeg:

bioclip predict --cls cat,bird,bear Ursus-arctos.jpeg

Output:

file_name,classification,score
Ursus-arctos.jpeg,cat,4.581644930112816e-08
Ursus-arctos.jpeg,bird,3.051998476166773e-08
Ursus-arctos.jpeg,bear,0.9999998807907104                                                                 

Predict from a binning CSV

Create predictions for 3 classes (cat, bird, and bear) with 2 bins (one, two) for image Ursus-arctos.jpeg:

Create a CSV file named bins.csv with the following contents:

cls,bin
cat,one
bird,one
bear,two

The names of the columns do not matter. The first column values will be used as the classes. The second column values will be used for bin names.

Run predict command:

bioclip predict --bins bins.csv Ursus-arctos.jpeg

Output:

Ursus-arctos.jpeg,two,0.9999998807907104
Ursus-arctos.jpeg,one,7.633736487377973e-08

Create embeddings

Create embedding for an image

bioclip embed Ursus-arctos.jpeg

Output:

{
    "model": "hf-hub:imageomics/bioclip",
    "embeddings": {
        "Ursus-arctos.jpeg": [
            -0.23633578419685364,
            -0.28467196226119995,
            -0.4394485652446747,
            ...
        ]
    }
}

View command line help

bioclip --help

Additional Documentation

See pybioclip wiki documentation for additional documentation.

License

pybioclip is distributed under the terms of the MIT license.

Acknowledgments

The prediction code in this repo is based on work by @samuelstevens in bioclip-demo.

Citation

Our code (this repository):

@software{Bradley_pybioclip_2024,
author = {Bradley, John and Lapp, Hilmar and Campolongo, Elizabeth G.},
doi = {10.5281/zenodo.13151194},
month = jul,
title = {{pybioclip}},
version = {1.0.0},
year = {2024}
}

BioCLIP paper:

@inproceedings{stevens2024bioclip,
  title = {{B}io{CLIP}: A Vision Foundation Model for the Tree of Life}, 
  author = {Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2024}
}

Also consider citing the BioCLIP code:

@software{bioclip2023code,
  author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn},
  doi = {10.5281/zenodo.10895871},
  title = {BioCLIP},
  version = {v1.0.0},
  year = {2024}
}