Awesome
pybioclip
Command line tool and python package to simplify using BioCLIP, including for taxonomic or other label prediction on (and thus annotation or labeling of) images, as well as for generating semantic embeddings for images. No particular understanding of ML or computer vision is required to use it. It also implements a number of performance optimizations for batches of images or custom class lists, which should be particularly useful for integration into computational workflows.
Table of Contents
Requirements
- Python compatible with PyTorch
Installation
pip install pybioclip
If you have any issues with installation, please first upgrade pip by running pip install --upgrade pip
.
Python Package Usage
Example Notebooks
- Predict species for images - examples/PredictImages.ipynb <a target="_blank" href="https://colab.research.google.com/github/Imageomics/pybioclip/blob/main/examples/PredictImages.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
- Predict species for iNaturalist images - examples/iNaturalistPredict.ipynb <a target="_blank" href="https://colab.research.google.com/github/Imageomics/pybioclip/blob/main/examples/iNaturalistPredict.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
- Predict using a subset of the TreeOfLife - examples/TOL-Subsetting.ipynb <a target="_blank" href="https://colab.research.google.com/github/Imageomics/pybioclip/blob/main/examples/TOL-Subsetting.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
Predict species classification
from bioclip import TreeOfLifeClassifier, Rank
classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
for prediction in predictions:
print(prediction["species"], "-", prediction["score"])
Output:
Ursus arctos - 0.9356034994125366
Ursus arctos syriacus - 0.05616999790072441
Ursus arctos bruinosus - 0.004126196261495352
Ursus arctus - 0.0024959812872111797
Ursus americanus - 0.0005009894957765937
Output from the predict()
method showing the dictionary structure:
[{
'kingdom': 'Animalia',
'phylum': 'Chordata',
'class': 'Mammalia',
'order': 'Carnivora',
'family': 'Ursidae',
'genus': 'Ursus',
'species_epithet': 'arctos',
'species': 'Ursus arctos',
'common_name': 'Kodiak bear'
'score': 0.9356034994125366
}]
The output from the predict function can be converted into a pandas DataFrame like so:
import pandas as pd
from bioclip import TreeOfLifeClassifier, Rank
classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
df = pd.DataFrame(predictions)
The first argument of the predict()
method supports both a single path or a list of paths.
Predict from a list of classes
from bioclip import CustomLabelsClassifier
classifier = CustomLabelsClassifier(["duck","fish","bear"])
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
print(prediction["classification"], prediction["score"])
Output:
duck 1.0306726583309e-09
fish 2.932403668845507e-12
bear 1.0
Predict from a list of classes with binning
from bioclip import CustomLabelsBinningClassifier
classifier = CustomLabelsBinningClassifier(cls_to_bin={
'dog': 'small',
'fish': 'small',
'bear': 'big',
})
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
print(prediction["classification"], prediction["score"])
Output:
big 0.99992835521698
small 7.165559509303421e-05
PIL Images
The predict() functions used in all the examples above allow passing a list of paths or a list of PIL Images.
When a list of PIL images is passed the index of the image will be filled in for file_name
. This is because PIL images may not have an associated file name.
Command Line Usage
bioclip predict [-h] [--format {table,csv}] [--output OUTPUT]
[--rank {kingdom,phylum,class,order,family,genus,species} | --cls CLS | --bins BINS]
[--k K] [--device DEVICE] image_file [image_file ...]
bioclip embed [-h] [--device=DEVICE] [--output=OUTPUT] [IMAGE_FILE...]
Commands:
predict Use BioCLIP to generate predictions for image files.
embed Use BioCLIP to generate embeddings for image files.
Arguments:
IMAGE_FILE input image file
Options:
-h --help
--format=FORMAT format of the output (table or csv) for predict mode [default: csv]
--rank {kingdom,phylum,class,order,family,genus,species}
rank of the classification, default: species (when)
--cls CLS classes to predict: either a comma separated list or a path to a text file of classes (one per line), when specified the
--rank and --bins arguments are not allowed.
--bins BINS path to CSV file with two columns with the first being classes and second being bin names, when specified the --cls
argument is not allowed.
--k K number of top predictions to show, default: 5
--device=DEVICE device to use matrix math (cpu or cuda or mps) [default: cpu]
--output=OUTFILE print output to file OUTFILE [default: stdout]
Predict classification
Predict species for an image
The example images used below are Ursus-arctos.jpeg
and Felis-catus.jpeg
both from the bioclip-demo.
Predict species for an Ursus-arctos.jpeg
file:
bioclip predict Ursus-arctos.jpeg
Output:
bioclip predict Ursus-arctos.jpeg
file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.9356034994125366
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.05616999790072441
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004126196261495352
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.0024959812872111797
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.0005009894957765937
Predict species for multiple images saving to a file
To make predictions for files Ursus-arctos.jpeg
and Felis-catus.jpeg
saving the output to a file named predictions.csv
:
bioclip predict --output predictions.csv Ursus-arctos.jpeg Felis-catus.jpeg
The contents of predictions.csv
will look like this:
file_name,kingdom,phylum,class,order,family,genus,species_epithet,species,common_name,score
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos,Ursus arctos,Kodiak bear,0.9356034994125366
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos syriacus,Ursus arctos syriacus,syrian brown bear,0.05616999790072441
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctos bruinosus,Ursus arctos bruinosus,,0.004126196261495352
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,arctus,Ursus arctus,,0.0024959812872111797
Ursus-arctos.jpeg,Animalia,Chordata,Mammalia,Carnivora,Ursidae,Ursus,americanus,Ursus americanus,Louisiana black bear,0.0005009894957765937
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,silvestris,Felis silvestris,European Wildcat,0.7221033573150635
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,catus,Felis catus,Domestic Cat,0.19810837507247925
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,margarita,Felis margarita,Sand Cat,0.02798456884920597
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Lynx,felis,Lynx felis,,0.021829601377248764
Felis-catus.jpeg,Animalia,Chordata,Mammalia,Carnivora,Felidae,Felis,bieti,Felis bieti,Chinese desert cat,0.010979168117046356
Predict top 3 genera for an image and display output as a table
bioclip predict --format table --k 3 --rank=genus Ursus-arctos.jpeg
Output:
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
| file_name | kingdom | phylum | class | order | family | genus | score |
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Carnivora | Ursidae | Ursus | 0.9994320273399353 |
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Artiodactyla | Cervidae | Cervus | 0.00032594642834737897 |
| Ursus-arctos.jpeg | Animalia | Chordata | Mammalia | Artiodactyla | Cervidae | Alces | 7.803700282238424e-05 |
+-------------------+----------+----------+----------+--------------+----------+--------+------------------------+
Predict from a list of classes
Create predictions for 3 classes (cat, bird, and bear) for image Ursus-arctos.jpeg
:
bioclip predict --cls cat,bird,bear Ursus-arctos.jpeg
Output:
file_name,classification,score
Ursus-arctos.jpeg,cat,4.581644930112816e-08
Ursus-arctos.jpeg,bird,3.051998476166773e-08
Ursus-arctos.jpeg,bear,0.9999998807907104
Predict from a binning CSV
Create predictions for 3 classes (cat, bird, and bear) with 2 bins (one, two) for image Ursus-arctos.jpeg
:
Create a CSV file named bins.csv
with the following contents:
cls,bin
cat,one
bird,one
bear,two
The names of the columns do not matter. The first column values will be used as the classes. The second column values will be used for bin names.
Run predict command:
bioclip predict --bins bins.csv Ursus-arctos.jpeg
Output:
Ursus-arctos.jpeg,two,0.9999998807907104
Ursus-arctos.jpeg,one,7.633736487377973e-08
Create embeddings
Create embedding for an image
bioclip embed Ursus-arctos.jpeg
Output:
{
"model": "hf-hub:imageomics/bioclip",
"embeddings": {
"Ursus-arctos.jpeg": [
-0.23633578419685364,
-0.28467196226119995,
-0.4394485652446747,
...
]
}
}
View command line help
bioclip --help
Additional Documentation
See pybioclip wiki documentation for additional documentation.
- Using the pybioclip docker container
- Using the pybioclip apptainer/singularity container
- Using a custom model
License
pybioclip
is distributed under the terms of the MIT license.
Acknowledgments
The prediction code in this repo is based on work by @samuelstevens in bioclip-demo.
Citation
Our code (this repository):
@software{Bradley_pybioclip_2024,
author = {Bradley, John and Lapp, Hilmar and Campolongo, Elizabeth G.},
doi = {10.5281/zenodo.13151194},
month = jul,
title = {{pybioclip}},
version = {1.0.0},
year = {2024}
}
BioCLIP paper:
@inproceedings{stevens2024bioclip,
title = {{B}io{CLIP}: A Vision Foundation Model for the Tree of Life},
author = {Samuel Stevens and Jiaman Wu and Matthew J Thompson and Elizabeth G Campolongo and Chan Hee Song and David Edward Carlyn and Li Dong and Wasila M Dahdul and Charles Stewart and Tanya Berger-Wolf and Wei-Lun Chao and Yu Su},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}
Also consider citing the BioCLIP code:
@software{bioclip2023code,
author = {Samuel Stevens and Jiaman Wu and Matthew J. Thompson and Elizabeth G. Campolongo and Chan Hee Song and David Edward Carlyn},
doi = {10.5281/zenodo.10895871},
title = {BioCLIP},
version = {v1.0.0},
year = {2024}
}