Home

Awesome

<div align="center"> <img src="https://i.ibb.co/wsmD5r4/photo-2022-06-06-17-40-52.jpg" width="400px">

Documentation Status PyPI Status Pipi version python python python python

OML is a PyTorch-based framework to train and validate the models producing high-quality embeddings.

Trusted by

<div align="center"> <a href="https://docs.neptune.ai/integrations/community_developed/" target="_blank"><img src="https://security.neptune.ai/api/share/b707f1e8-e287-4f01-b590-39a6fa7e9faa/logo.png" width="100"/></a>ㅤㅤ <a href="https://www.newyorker.de/" target="_blank"><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d8/New_Yorker.svg/1280px-New_Yorker.svg.png" width="100"/></a>ㅤㅤ <a href="https://www.epoch8.co/" target="_blank"><img src="https://i.ibb.co/GdNVTyt/Screenshot-2023-07-04-at-11-19-24.png" width="100"/></a>ㅤㅤ <a href="https://www.meituan.com" target="_blank"><img src="https://upload.wikimedia.org/wikipedia/commons/6/61/Meituan_English_Logo.png" width="100"/></a>ㅤㅤ <a href="https://constructor.io/" target="_blank"><img src="https://rethink.industries/wp-content/uploads/2022/04/constructor.io-logo.png" width="100"/></a>ㅤㅤ <a href="https://edgify.ai/" target="_blank"><img src="https://edgify.ai/wp-content/uploads/2024/04/new-edgify-logo.svg" width="100" height="30"/></a>ㅤㅤ <a href="https://inspector-cloud.ru/" target="_blank"><img src="https://thumb.tildacdn.com/tild6533-6433-4137-a266-613963373637/-/resize/540x/-/format/webp/photo.png" width="150" height="30"/></a>ㅤㅤ <a href="https://yango-tech.com/" target="_blank"><img src="https://yango-backend.sborkademo.com/media/pages/home/205f66f309-1717169752/opengr4-1200x630-crop-q85.jpg" width="100" height="30"/></a>ㅤㅤ <a href="https://www.adagrad.ai/" target="_blank"><img src="https://assets-global.website-files.com/619cafd224a31d1835ece5bd/61de7f23546e9662e51605ba_Adagrad_logo_footer-2022.png" width="100" height="30"/></a>

<a href="https://www.ox.ac.uk/" target="_blank"><img src="https://i.ibb.co/zhWL6tD/21-05-2019-16-08-10-6922268.png" width="120"/></a>ㅤㅤ <a href="https://www.hse.ru/en/" target="_blank"><img src="https://www.hse.ru/data/2020/11/16/1367274044/HSE_University_blue.jpg.(230x86x123).jpg" width="100"/></a>

There is a number of people from Oxford and HSE universities who have used OML in their theses. [1] [2] [3]

<div align="left"> <details> <summary><b>OML 3.0 has been released!</b></summary> <p>

The update focuses on several components:

Migration from OML 2.* [Python API]:

The easiest way to catch up with changes is to re-read the examples!

Migration from OML 2.* [Pipelines]:

</p> </details>

Documentation

<details> <summary>FAQ</summary> <details> <summary>Why do I need OML?</summary> <p>

You may think "If I need image embeddings I can simply train a vanilla classifier and take its penultimate layer". Well, it makes sense as a starting point. But there are several possible drawbacks:

</p> </details> <details> <summary>What is the difference between Open Metric Learning and PyTorch Metric Learning?</summary> <p>

PML is the popular library for Metric Learning, and it includes a rich collection of losses, miners, distances, and reducers; that is why we provide straightforward examples of using them with OML. Initially, we tried to use PML, but in the end, we came up with our library, which is more pipeline / recipes oriented. That is how OML differs from PML:

We believe that having Pipelines, laconic examples, and Zoo of pretrained models sets the entry threshold to a really low value.

</p> </details> <details> <summary>What is Metric Learning?</summary> <p>

Metric Learning problem (also known as extreme classification problem) means a situation in which we have thousands of ids of some entities, but only a few samples for every entity. Often we assume that during the test stage (or production) we will deal with unseen entities which makes it impossible to apply the vanilla classification pipeline directly. In many cases obtained embeddings are used to perform search or matching procedures over them.

Here are a few examples of such tasks from the computer vision sphere:

</p> </details> <details> <summary>Glossary (Naming convention) </summary> <p> </p> </details> <details> <summary>How good may be a model trained with OML? </summary> <p>

It may be comparable with the current (2022 year) SotA methods, for example, Hyp-ViT. (Few words about this approach: it's a ViT architecture trained with contrastive loss, but the embeddings were projected into some hyperbolic space. As the authors claimed, such a space is able to describe the nested structure of real-world data. So, the paper requires some heavy math to adapt the usual operations for the hyperbolical space.)

We trained the same architecture with triplet loss, fixing the rest of the parameters: training and test transformations, image size, and optimizer. See configs in Models Zoo. The trick was in heuristics in our miner and sampler:

Here are CMC@1 scores for 2 popular benchmarks. SOP dataset: Hyp-ViT — 85.9, ours — 86.6. DeepFashion dataset: Hyp-ViT — 92.5, ours — 92.1. Thus, utilising simple heuristics and avoiding heavy math we are able to perform on SotA level.

</p> </details> <details> <summary>What about Self-Supervised Learning?</summary> <p>

Recent research in SSL definitely obtained great results. The problem is that these approaches required an enormous amount of computing to train the model. But in our framework, we consider the most common case when the average user has no more than a few GPUs.

At the same time, it would be unwise to ignore success in this sphere, so we still exploit it in two ways:

</p> </details> <details> <summary>Do I need to know other frameworks to use OML?</summary> <p>

No, you don't. OML is a framework-agnostic. Despite we use PyTorch Lightning as a loop runner for the experiments, we also keep the possibility to run everything on pure PyTorch. Thus, only the tiny part of OML is Lightning-specific and we keep this logic separately from other code (see oml.lightning). Even when you use Lightning, you don't need to know it, since we provide ready to use Pipelines.

The possibility of using pure PyTorch and modular structure of the code leaves a room for utilizing OML with your favourite framework after the implementation of the necessary wrappers.

</p> </details> <details> <summary>Can I use OML without any knowledge in DataScience?</summary> <p>

Yes. To run the experiment with Pipelines you only need to write a converter to our format (it means preparing the .csv table with a few predefined columns). That's it!

Probably we already have a suitable pre-trained model for your domain in our Models Zoo. In this case, you don't even need to train it.

</p> </details> <details> <summary>Can I export models to ONNX?</summary> <p>

Currently, we don't support exporting models to ONNX directly. However, you can use the built-in PyTorch capabilities to achieve this. For more information, please refer to this issue.

</p> </details> </details>

DOCUMENTATION

TUTORIAL TO START WITH: English | Russian | Chinese

<details> <summary>MORE</summary> </details>

Installation

pip install -U open-metric-learning; # minimum dependencies
pip install -U open-metric-learning[nlp]
pip install -U open-metric-learning[audio]
<details><summary>DockerHub</summary>
docker pull omlteam/oml:gpu
docker pull omlteam/oml:cpu
</details>

OML features

<div style="overflow-x: auto;"> <table style="width: 100%; border-collapse: collapse; border-spacing: 0; margin: 0; padding: 0;"> <tr> </tr> <tr> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/contents/losses.html"> <b>Losses</b></a> | <a href="https://open-metric-learning.readthedocs.io/en/latest/contents/miners.html"> <b>Miners</b></a>
miner = AllTripletsMiner()
miner = NHardTripletsMiner()
miner = MinerWithBank()
...
criterion = TripletLossWithMiner(0.1, miner)
criterion = ArcFaceLoss()
criterion = SurrogatePrecision()
</td> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/contents/samplers.html"> <b>Samplers</b></a>
labels = train.get_labels()
l2c = train.get_label2category()


sampler = BalanceSampler(labels)
sampler = CategoryBalanceSampler(labels, l2c)
sampler = DistinctCategoryBalanceSampler(labels, l2c)
</td> </tr> <tr> </tr> <tr> <td style="text-align: left;"> <a href="https://github.com/OML-Team/open-metric-learning/tree/main/pipelines/"><b>Configs support</b></a>
max_epochs: 10
sampler:
  name: balance
  args:
    n_labels: 2
    n_instances: 2
</td> <td style="text-align: left;"> <a href="https://github.com/OML-Team/open-metric-learning?tab=readme-ov-file#zoo"><b>Pre-trained models</b></a>
model_hf = AutoModel.from_pretrained("roberta-base")
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
extractor_txt = HFWrapper(model_hf)

extractor_img = ViTExtractor.from_pretrained("vits16_dino")
transforms, _ = get_transforms_for_pretrained("vits16_dino")
</td> </tr> <tr> </tr> <tr> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/postprocessing/algo_examples.html"><b>Post-processing</b></a>
emb = inference(extractor, dataset)
rr = RetrievalResults.from_embeddings(emb, dataset)

postprocessor = AdaptiveThresholding()
rr_upd = postprocessor.process(rr, dataset)
</td> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/postprocessing/siamese_examples.html"><b>Post-processing by NN</b></a> | <a href="https://github.com/OML-Team/open-metric-learning/tree/main/pipelines/postprocessing/pairwise_postprocessing"><b>Paper</b></a>
embeddings = inference(extractor, dataset)
rr = RetrievalResults.from_embeddings(embeddings, dataset)

postprocessor = PairwiseReranker(ConcatSiamese(), top_n=3)
rr_upd = postprocessor.process(rr, dataset)
</td> </tr> <tr> </tr> <tr> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/oml/logging.html#"><b>Logging</b></a><br>
logger = TensorBoardPipelineLogger()
logger = NeptunePipelineLogger()
logger = WandBPipelineLogger()
logger = MLFlowPipelineLogger()
logger = ClearMLPipelineLogger()
</td> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/feature_extraction/python_examples.html#usage-with-pytorch-metric-learning"><b>PML</b></a><br>
from pytorch_metric_learning import losses

criterion = losses.TripletMarginLoss(0.2, "all")
pred = ViTExtractor()(data)
criterion(pred, gts)
</td> </tr> <tr> </tr> <tr> <td style="text-align: left;"><a href="https://open-metric-learning.readthedocs.io/en/latest/feature_extraction/python_examples.html#handling-categories"><b>Categories support</b></a>
# train
loader = DataLoader(CategoryBalanceSampler())

# validation
rr = RetrievalResults.from_embeddings()
m.calc_retrieval_metrics_rr(rr, query_categories)
</td> <td style="text-align: left;"><a href="https://open-metric-learning.readthedocs.io/en/latest/contents/metrics.html"><b>Misc metrics</b></a>
embeddigs = inference(model, dataset)
rr = RetrievalResults.from_embeddings(embeddings, dataset)

m.calc_retrieval_metrics_rr(rr, precision_top_k=(5,))
m.calc_fnmr_at_fmr_rr(rr, fmr_vals=(0.1,))
m.calc_topological_metrics(embeddings, pcf_variance=(0.5,))
</td> </tr> <tr> </tr> <tr> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/feature_extraction/python_examples.html#usage-with-pytorch-lightning"><b>Lightning</b></a><br>
import pytorch_lightning as pl

model = ViTExtractor.from_pretrained("vits16_dino")
clb = MetricValCallback(EmbeddingMetrics(dataset))
module = ExtractorModule(model, criterion, optimizer)

trainer = pl.Trainer(max_epochs=3, callbacks=[clb])
trainer.fit(module, train_loader, val_loader)
</td> <td style="text-align: left;"> <a href="https://open-metric-learning.readthedocs.io/en/latest/feature_extraction/python_examples.html#usage-with-pytorch-lightning"><b>Lightning DDP</b></a><br>
clb = MetricValCallback(EmbeddingMetrics(val))
module = ExtractorModuleDDP(
    model, criterion, optimizer, train, val
)

ddp = {"devices": 2, "strategy": DDPStrategy()}
trainer = pl.Trainer(max_epochs=3, callbacks=[clb], **ddp)
trainer.fit(module)
</td> </tr> </table> </div>

Examples

Here is an example of how to train, validate and post-process the model on a tiny dataset of images or texts. See more details on dataset format.

<div style="overflow-x: auto;"> <table style="width: 100%; border-collapse: collapse; border-spacing: 0; margin: 0; padding: 0;"> <tr> </tr> <tr> <td style="text-align: left; padding: 0;"><b>IMAGES</b></td> <td style="text-align: left; padding: 0;"><b>TEXTS</b></td> </tr> <tr> </tr> <tr> <td>
from torch.optim import Adam
from torch.utils.data import DataLoader

from oml import datasets as d
from oml.inference import inference
from oml.losses import TripletLossWithMiner
from oml.metrics import calc_retrieval_metrics_rr
from oml.miners import AllTripletsMiner
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained
from oml.retrieval import RetrievalResults, AdaptiveThresholding
from oml.samplers import BalanceSampler
from oml.utils import get_mock_images_dataset

model = ViTExtractor.from_pretrained("vits16_dino").to("cpu").train()
transform, _ = get_transforms_for_pretrained("vits16_dino")

df_train, df_val = get_mock_images_dataset(global_paths=True)
train = d.ImageLabeledDataset(df_train, transform=transform)
val = d.ImageQueryGalleryLabeledDataset(df_val, transform=transform)

optimizer = Adam(model.parameters(), lr=1e-4)
criterion = TripletLossWithMiner(0.1, AllTripletsMiner(), need_logs=True)
sampler = BalanceSampler(train.get_labels(), n_labels=2, n_instances=2)


def training():
    for batch in DataLoader(train, batch_sampler=sampler):
        embeddings = model(batch["input_tensors"])
        loss = criterion(embeddings, batch["labels"])
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        print(criterion.last_logs)


def validation():
    embeddings = inference(model, val, batch_size=4, num_workers=0)
    rr = RetrievalResults.from_embeddings(embeddings, val, n_items=3)
    rr = AdaptiveThresholding(n_std=2).process(rr)
    rr.visualize(query_ids=[2, 1], dataset=val, show=True)
    print(calc_retrieval_metrics_rr(rr, map_top_k=(3,), cmc_top_k=(1,)))


training()
validation()
</td> <td>
from torch.optim import Adam
from torch.utils.data import DataLoader
from transformers import AutoModel, AutoTokenizer

from oml import datasets as d
from oml.inference import inference
from oml.losses import TripletLossWithMiner
from oml.metrics import calc_retrieval_metrics_rr
from oml.miners import AllTripletsMiner
from oml.models import HFWrapper
from oml.retrieval import RetrievalResults, AdaptiveThresholding
from oml.samplers import BalanceSampler
from oml.utils import get_mock_texts_dataset

model = HFWrapper(AutoModel.from_pretrained("bert-base-uncased"), 768).to("cpu").train()
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

df_train, df_val = get_mock_texts_dataset()
train = d.TextLabeledDataset(df_train, tokenizer=tokenizer)
val = d.TextQueryGalleryLabeledDataset(df_val, tokenizer=tokenizer)

optimizer = Adam(model.parameters(), lr=1e-4)
criterion = TripletLossWithMiner(0.1, AllTripletsMiner(), need_logs=True)
sampler = BalanceSampler(train.get_labels(), n_labels=2, n_instances=2)


def training():
    for batch in DataLoader(train, batch_sampler=sampler):
        embeddings = model(batch["input_tensors"])
        loss = criterion(embeddings, batch["labels"])
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        print(criterion.last_logs)


def validation():
    embeddings = inference(model, val, batch_size=4, num_workers=0)
    rr = RetrievalResults.from_embeddings(embeddings, val, n_items=3)
    rr = AdaptiveThresholding(n_std=2).process(rr)
    rr.visualize(query_ids=[2, 1], dataset=val, show=True)
    print(calc_retrieval_metrics_rr(rr, map_top_k=(3,), cmc_top_k=(1,)))


training()
validation()
</td> </tr> <tr> </tr> <tr> <td> <details style="padding-bottom: 10px"> <summary>Output</summary>
{'active_tri': 0.125, 'pos_dist': 82.5, 'neg_dist': 100.5}  # batch 1
{'active_tri': 0.0, 'pos_dist': 36.3, 'neg_dist': 56.9}     # batch 2

{'cmc': {1: 0.75}, 'precision': {5: 0.75}, 'map': {3: 0.8}}

<img src="https://i.ibb.co/MVxBf80/retrieval-img.png" height="200px"> </details>

Open In Colab

</td> <td> <details style="padding-bottom: 10px"> <summary>Output</summary>
{'active_tri': 0.0, 'pos_dist': 8.5, 'neg_dist': 11.0}  # batch 1
{'active_tri': 0.25, 'pos_dist': 8.9, 'neg_dist': 9.8}  # batch 2

{'cmc': {1: 0.8}, 'precision': {5: 0.7}, 'map': {3: 0.9}}

<img src="https://i.ibb.co/HqfXdYd/text-retrieval.png" height="200px"> </details>

Open In Colab

</td> </tr> </table> </div> <br>

Extra illustrations, explanations and tips for the code above.

Retrieval by trained model

Here is an inference time example (in other words, retrieval on test set). The code below works for both texts and images.

<details> <summary><b>See example</b></summary> <p>
from oml.datasets import ImageQueryGalleryDataset
from oml.inference import inference
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained
from oml.utils import get_mock_images_dataset
from oml.retrieval import RetrievalResults, AdaptiveThresholding

_, df_test = get_mock_images_dataset(global_paths=True)
del df_test["label"]  # we don't need gt labels for doing predictions

extractor = ViTExtractor.from_pretrained("vits16_dino").to("cpu")
transform, _ = get_transforms_for_pretrained("vits16_dino")

dataset = ImageQueryGalleryDataset(df_test, transform=transform)
embeddings = inference(extractor, dataset, batch_size=4, num_workers=0)

rr = RetrievalResults.from_embeddings(embeddings, dataset, n_items=5)
rr = AdaptiveThresholding(n_std=3.5).process(rr)
rr.visualize(query_ids=[0, 1], dataset=dataset, show=True)

# you get the ids of retrieved items and the corresponding distances
print(rr)
</details>

Retrieval by trained model: streaming & txt2im

Here is an example where queries and galleries processed separately.

<details> <summary><b>See example</b></summary> <p>
import pandas as pd

from oml.datasets import ImageBaseDataset
from oml.inference import inference
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained
from oml.retrieval import RetrievalResults, ConstantThresholding
from oml.utils import get_mock_images_dataset

extractor = ViTExtractor.from_pretrained("vits16_dino").to("cpu")
transform, _ = get_transforms_for_pretrained("vits16_dino")

paths = pd.concat(get_mock_images_dataset(global_paths=True))["path"]
galleries, queries1, queries2 = paths[:20], paths[20:22], paths[22:24]

# gallery is huge and fixed, so we only process it once
dataset_gallery = ImageBaseDataset(galleries, transform=transform)
embeddings_gallery = inference(extractor, dataset_gallery, batch_size=4, num_workers=0)

# queries come "online" in stream
for queries in [queries1, queries2]:
    dataset_query = ImageBaseDataset(queries, transform=transform)
    embeddings_query = inference(extractor, dataset_query, batch_size=4, num_workers=0)

    # for the operation below we are going to provide integrations with vector search DB like QDrant or Faiss
    rr = RetrievalResults.from_embeddings_qg(
        embeddings_query=embeddings_query, embeddings_gallery=embeddings_gallery,
        dataset_query=dataset_query, dataset_gallery=dataset_gallery
    )
    rr = ConstantThresholding(th=80).process(rr)
    rr.visualize_qg([0, 1], dataset_query=dataset_query, dataset_gallery=dataset_gallery, show=True)
    print(rr)
</details>

Pipelines

Pipelines provide a way to run metric learning experiments via changing only the config file. All you need is to prepare your dataset in a required format.

See Pipelines folder for more details:

Zoo

How to use text models?

Here is a lightweight integration with HuggingFace Transformers models. You can replace it with other arbitrary models inherited from IExtractor.

Note, we don't have our own text models zoo at the moment.

<details style="padding-bottom: 15px"> <summary><b>See example</b></summary> <p>
pip install open-metric-learning[nlp]
from transformers import AutoModel, AutoTokenizer

from oml.models import HFWrapper

model = AutoModel.from_pretrained('bert-base-uncased').eval()
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
extractor = HFWrapper(model=model, feat_dim=768)

inp = tokenizer(text="Hello world", return_tensors="pt", add_special_tokens=True)
embeddings = extractor(inp)
</p> </details>

How to use image models?

You can use an image model from our Zoo or use other arbitrary models after you inherited it from IExtractor.

<details style="padding-bottom: 15px"> <summary><b>See example</b></summary> <p>
from oml.const import CKPT_SAVE_ROOT as CKPT_DIR, MOCK_DATASET_PATH as DATA_DIR
from oml.models import ViTExtractor
from oml.registry import get_transforms_for_pretrained

model = ViTExtractor.from_pretrained("vits16_dino").eval()
transforms, im_reader = get_transforms_for_pretrained("vits16_dino")

img = im_reader(DATA_DIR / "images" / "circle_1.jpg")  # put path to your image here
img_tensor = transforms(img)
# img_tensor = transforms(image=img)["image"]  # for transforms from Albumentations

features = model(img_tensor.unsqueeze(0))

# Check other available models:
print(list(ViTExtractor.pretrained_models.keys()))

# Load checkpoint saved on a disk:
model_ = ViTExtractor(weights=CKPT_DIR / "vits16_dino.ckpt", arch="vits16", normalise_features=False)
</p> </details>

Image models zoo

Models, trained by us. The metrics below are for 224 x 224 images:

modelcmc1datasetweightsexperiment
ViTExtractor.from_pretrained("vits16_inshop")0.921DeepFashion Inshoplinklink
ViTExtractor.from_pretrained("vits16_sop")0.866Stanford Online Productslinklink
ViTExtractor.from_pretrained("vits16_cars")0.907CARS 196linklink
ViTExtractor.from_pretrained("vits16_cub")0.837CUB 200 2011linklink

Models, trained by other researchers. Note, that some metrics on particular benchmarks are so high because they were part of the training dataset (for example unicom). The metrics below are for 224 x 224 images:

modelStanford Online ProductsDeepFashion InShopCUB 200 2011CARS 196
ViTUnicomExtractor.from_pretrained("vitb16_unicom")0.7000.7340.8470.916
ViTUnicomExtractor.from_pretrained("vitb32_unicom")0.6900.7220.7960.893
ViTUnicomExtractor.from_pretrained("vitl14_unicom")0.7260.7900.8680.922
ViTUnicomExtractor.from_pretrained("vitl14_336px_unicom")0.7450.8100.8750.924
ViTCLIPExtractor.from_pretrained("sber_vitb32_224")0.5470.5140.4480.618
ViTCLIPExtractor.from_pretrained("sber_vitb16_224")0.5650.5650.5240.648
ViTCLIPExtractor.from_pretrained("sber_vitl14_224")0.5120.5550.6060.707
ViTCLIPExtractor.from_pretrained("openai_vitb32_224")0.6120.4910.5600.693
ViTCLIPExtractor.from_pretrained("openai_vitb16_224")0.6480.6060.6650.767
ViTCLIPExtractor.from_pretrained("openai_vitl14_224")0.6700.6750.7450.844
ViTExtractor.from_pretrained("vits16_dino")0.6480.5090.6270.265
ViTExtractor.from_pretrained("vits8_dino")0.6510.5240.6610.315
ViTExtractor.from_pretrained("vitb16_dino")0.6580.5140.5410.288
ViTExtractor.from_pretrained("vitb8_dino")0.6890.5990.5060.313
ViTExtractor.from_pretrained("vits14_dinov2")0.5660.3340.7970.503
ViTExtractor.from_pretrained("vits14_reg_dinov2")0.5660.3320.7950.740
ViTExtractor.from_pretrained("vitb14_dinov2")0.5650.3420.8420.644
ViTExtractor.from_pretrained("vitb14_reg_dinov2")0.5570.3240.8330.828
ViTExtractor.from_pretrained("vitl14_dinov2")0.5760.3520.8440.692
ViTExtractor.from_pretrained("vitl14_reg_dinov2")0.5710.3400.8400.871
ResnetExtractor.from_pretrained("resnet50_moco_v2")0.4930.2670.2640.149
ResnetExtractor.from_pretrained("resnet50_imagenet1k_v1")0.5150.2840.4550.247

The metrics may be different from the ones reported by papers, because the version of train/val split and usage of bounding boxes may differ.

Contributing guide

We welcome new contributors! Please, see our:

Acknowledgments

<a href="https://github.com/catalyst-team/catalyst" target="_blank"><img src="https://raw.githubusercontent.com/catalyst-team/catalyst-pics/master/pics/catalyst_logo.png" width="100"/></a>

The project was started in 2020 as a module for Catalyst library. I want to thank people who worked with me on that module: Julia Shenshina, Nikita Balagansky, Sergey Kolesnikov and others.

I would like to thank people who continue working on this pipeline when it became a separate project: Julia Shenshina, Misha Kindulov, Aron Dik, Aleksei Tarasov and Verkhovtsev Leonid.

<a href="https://www.newyorker.de/" target="_blank"><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d8/New_Yorker.svg/1280px-New_Yorker.svg.png" width="100"/></a>

I also want to thank NewYorker, since the part of functionality was developed (and used) by its computer vision team led by me.