Home

Awesome

<h1 align="center"> CosyPose: Consistent multi-view multi-object 6D pose estimation </h1> <div align="center"> <h3> <a href="http://ylabbe.github.io">Yann Labbé</a>, <a href="https://jcarpent.github.io/">Justin Carpentier</a>, <a href="http://imagine.enpc.fr/~aubrym/">Mathieu Aubry</a>, <a href="http://www.di.ens.fr/~josef/">Josef Sivic</a> <br> <br> ECCV: European Conference on Computer Vision, 2020 <br> <br> <a href="https://arxiv.org/abs/2008.08465">[Paper]</a> <a href="https://www.di.ens.fr/willow/research/cosypose/">[Project page]</a> <a href="https://youtu.be/4QYyEvnrC_o">[Video (1 min)]</a> <a href="https://youtu.be/MNH_Ez7bcP0">[Video (10 min)]</a> <a href="https://docs.google.com/presentation/d/1APHpaKKnkIvmquNJUVqERiMN4gEQ10Jt4IY7wTfIVgE/edit?usp=sharing">[Slides]</a> <br> <br> Winner of the <a href="https://bop.felk.cvut.cz/challenges/bop-challenge-2020/">BOP Challenge 2020 </a> at ECCV'20 <a href="https://docs.google.com/presentation/d/1jZDu4mw-uNcwzr5jMFlqEddZsb7SjQozXVG3dT6-1M0/edit?usp=sharing">[slides]</a> <a href="https://arxiv.org/abs/2009.07378"> [BOP challenge paper] </a> </h3> </div>

Important Note

<body> <h1 align="center"> <p style="color:red;font-size:30px">This repository is no longer being maintained. <p style="color:red;font-size:30px">If you are looking to use CosyPose and encounter any issues, please see: <p style="color:red;font-size:30px"> <a href="https://github.com/Simple-Robotics/cosypose">https://github.com/Simple-Robotics/cosypose</a> </h1> </body>

Citation

If you use this code in your research, please cite the paper:

@inproceedings{labbe2020,
title= {CosyPose: Consistent multi-view multi-object 6D pose estimation}
author={Y. {Labbe} and J. {Carpentier} and M. {Aubry} and J. {Sivic}},
booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
year={2020}}

News

<!-- # TODO --> <!-- - Add the script for visualization. --> <!-- - Upload the BOP zip files to gdrive. -->

Table of content

Overview

This repository contains the code for the full CosyPose approach, including:

Single-view single-object 6D pose estimator

Single view predictions Given an RGB image and a 2D bounding box of an object with known 3D model, the 6D pose estimator predicts the full 6D pose of the object with respect to the camera. Our method is inspired from DeepIM with several simplications and technical improvements. It is fully implemented in pytorch and achieve single-view state-of-the-art on YCB-Video and T-LESS. We provide pre-trained models used in our experiments on both datasets. We make the training code that we used to train them available. It can be parallelized on multiple GPUs and multiple nodes.

Synthetic data generation

Synthetic images The single-view 6D pose estimation models are trained on a mix of synthetic and real images. We provide the code for generating the additionnal synthetic images.

Multi-view multi-object scene reconstruction

Multiview

Single-view object-level reconstruction of a scene often fails because of detection mistakes, pose estimation errors and occlusions; which makes it inpractical for real applications. Our multi-view approach, CosyPose, addresseses these single-view limitations and helps improving 6D pose accuracy by leveraging information from multiple cameras with unknown positions. We provide the full code, including robust object-level multi-view matching and global scene refinement. The method is agnostic to the 6D pose estimator used, and can therefore be combined with many other existing single-view object pose estimation method to solve problems on other datasets, or in real scenarios. We provide a utility for running CosyPose given a set of input 6D object candidates in each image.

BOP challenge 2020: single-view 2D detection + 6D pose estimation models

BOP We used our {coarse+refinement} single-view 6D pose estimation method in the BOP challenge 2020. In addition, we trained a MaskRCNN detector (torchvision's implementation) on each of the 7 core datasets (LM-O, T-LESS, TUD-L, IC-BIN, ITODD, HB, YCB-V). We provide 2D detectors and 6D pose estimation models for these datasets. All training (including 2D detector), inference and evaluation code are available in this repository. It can be easily used for another dataset in the BOP format.

Installation

git clone --recurse-submodules https://github.com/ylabbe/cosypose.git
cd cosypose
conda env create -n cosypose --file environment.yaml
conda activate cosypose
git lfs pull
python setup.py install

The installation may take some time as several packages must be downloaded and installed/compiled. If you plan to change the code, run python setup.py develop.

Notes:

Downloading and preparing data

<details> <summary>Click for details...</summary>

All data used (datasets, models, results, ...) are stored in a directory local_data at the root of the repository. Create it with mkdir local_data or use a symlink if you want the data to be stored at a different place. We provide the utility cosypose/scripts/download.py for downloading required data and models. All of the files can also be downloaded manually.

BOP Datasets

For both T-LESS and YCB-Video, we use the datasets in the BOP format. If you already have them on your disk, place them in local_data/bop_datasets. Alternatively, you can download it using :

python -m cosypose.scripts.download --bop_dataset=ycbv
python -m cosypose.scripts.download --bop_dataset=tless

Additionnal files that contain informations about the datasets used to fairly compare with prior works on both datasets.

python -m cosypose.scripts.download --bop_extra_files=ycbv
python -m cosypose.scripts.download --bop_extra_files=tless

We use pybullet for rendering images which requires object models to be provided in the URDF format. We provide converted URDF files, they can be downloaded using:

python -m cosypose.scripts.download --urdf_models=ycbv
python -m cosypose.scripts.download --urdf_models=tless.cad

In the BOP format, the YCB objects 002_master_chef_can and 040_large_marker are considered symmetric, but not by previous works such as PoseCNN, PVNet and DeepIM. To ensure a fair comparison (using ADD instead of ADD-S for ADD-(S) for these objects), these objects must not be considered symmetric in the evaluation. To keep the uniformity of the models format, we generate a set of YCB objects models_bop-compat_eval that can be used to fairly compare our approach against previous works. You can download them directly:

python -m cosypose.scripts.download --ycbv_compat_models

Notes:

python -m cosypose.scripts.convert_models_to_urdf --models=ycbv
python -m cosypose.scripts.convert_models_to_urdf --models=tless.cad
python -m cosypose.scripts.make_ycbv_compat_models

Pre-trained models

The pre-trained models of the single-view pose estimator can be downloaded using:

# YCB-V Single-view refiner
python -m cosypose.scripts.download --model=ycbv-refiner-finetune--251020

# YCB-V Single-view refiner trained on synthetic data only 
# Only download this if you are interested in retraining the above model 
python -m cosypose.scripts.download --model=ycbv-refiner-syntonly--596719

# T-LESS coarse and refiner models 
python -m cosypose.scripts.download --model=tless-coarse--10219
python -m cosypose.scripts.download --model=tless-refiner--585928

2D detections

To ensure a fair comparison with prior works on both datasets, we use the same detections as DeepIM (from PoseCNN) on YCB-Video and the same as Pix2pose (from a RetinaNet model) on T-LESS. Download the saved 2D detections for both datasets using

python -m cosypose.scripts.download --detections=ycbv_posecnn

# SiSo detections: 1 detection with highest per score per class per image on all images
# Available for each image of the T-LESS dataset (primesense sensor)
# These are the same detections as used in Pix2pose's experiments
python -m cosypose.scripts.download --detections=tless_pix2pose_retinanet_siso_top1

# ViVo detections: All detections for a subset of 1000 images of T-LESS.
# Used in our multi-view experiments.
python -m cosypose.scripts.download --detections=tless_pix2pose_retinanet_vivo_all

If you are interested in re-training a detector, please see the BOP 2020 section.

Notes:

</details>

Note on GPU parallelization

<details> <summary>Click for details...</summary>

Training and evaluation code can be parallelized across multiple gpus and multiple machines using vanilla torch.distributed. This is done by simply starting multiple processes with the same arguments and assigning each process to a specific GPU via CUDA_VISIBLE_DEVICES. To run the processes on a local machine or on a SLUMR cluster, we use our own utility job-runner but other similar tools such as dask-jobqueue or submitit could be used. We provide instructions for single-node multi-gpu training, and for multi-gpu multi-node training on a SLURM cluster.

Single gpu on a single node

# CUDA ID of GPU you want to use
export CUDA_VISIBLE_DEVICES=0
python -m cosypose.scripts.example_multigpu

where scripts.example_multigpu can be replaced by scripts.run_pose_training or scripts.run_cosypose_eval (see below for usage of training/evaluation scripts).

Configuration of job-runner for multi-gpu usage

Change the path to the code directory, anaconda location and specify a temporary directory for storing job logs by modifying `job-runner-config.yaml'. If you have access to a SLURM cluster, specify the name of the queue, it's specifications (number of GPUs/CPUs per node) and the flags you typically use in a slurm script. Once you are done, run:

runjob-config job-runner-config.yaml

Multi-gpu on a single node

# CUDA IDS of GPUs you want to use
export CUDA_VISIBLE_DEVICES=0,1
runjob --ngpus=2 --queue=local python -m cosypose.scripts.example_multigpu

The logs of the first process will be printed. You can check the logs of the other processes in the job directory.

On a SLURM cluster

runjob --ngpus=8 --queue=gpu_p1  python -m cosypose.scripts.example_multigpu
</details>

Reproducing single-view results

<details> <summary>Click for details...</summary>

YCB-Video

python -m cosypose.scripts.run_cosypose_eval --config ycbv

This will run the inference and evaluation on YCB-Video. We use our own implementation of the evaluation. We have checked that it matches the results from the original matlab implementation for the AUC of ADD-S and AUC of ADD(-S) metrics. For example, you can see that the PoseCNN results are similar to the ones reported in the PoseCNN/DeepIM paper:

PoseCNN/AUC of ADD(-S): 0.613

The YCB-Video results and metrics can be downloaded directly:

python -m cosypose.scripts.download --result_id=ycbv-n_views=1--5154971130

T-LESS

python -m cosypose.scripts.run_cosypose_eval --config tless-siso

This will run inference on the entire T-LESS dataset and print some metrics but not e_vsd<0.3 which is not supported in our code. The results can also be downloaded:

python -m cosypose.scripts.download --result_id=tless-siso-n_views=1--684390594

To measure e_vsd<0.3, we use the BOP Toolkit. You can run it using:

python -m cosypose.scripts.run_bop_eval --result_id=tless-siso-n_views=1--684390594 --method=pix2pose_detections/refiner/iteration=4

This will create a local_data/bop_predictions_csv/cosyposeXXXX-eccv2020_tless-test-primesense.csv file in the BOP format and run evaluation. Intermediate metrics and final scores are saved in local_data/bop_eval_outputs/cosposyXXXX-eccV2020_tless-test-primesense/, where XXXXX correponds to a random number generated by the script.

The T-LESS SiSo results can also be downloaded directly:

python -m cosypose.scripts.download --bop_result_id=cosypose847205-eccv2020_tless-test-primesense

You can check the results match those from the paper:

cat local_data/bop_eval_outputs/cosypose847205-eccv2020_tless-test-primesense/error\=vsd_ntop\=1_delta\=15.000_tau\=20.000/scores_th\=0.300_min-visib\=0.100.json

{
  "gt_count": 69545,
  "mean_obj_recall": 0.6378486071644157,
  "mean_scene_recall": 0.6444110450903551,
  ...
  "recall": 0.632720209307857,
  ...
  "targets_count": 50452,
  "tp_count": 31922
}

Following other works, we reported mean_obj_recall in the paper.

Single-view visualization

You can visualize the single-view predictions using this notebook as example.

</details>

Training the single-view 6D pose estimation models

<details> <summary>Click for details...</summary>

Downloading synthetic images

The pose estimation models are trained on a mix of real images provided with the T-LESS/YCB-Video datasets and a set of images that we generated. For each dataset, we generated 1 million synthetic images. You can download these large datasets:

# 106 GB
python -m cosypose.scripts.download --synt_dataset=tless-1M

# 113 GB
python -m cosypose.scripts.download --synt_dataset=ycbv-1M

We provide below the instructions to generate these dataset locally if you are interested in using our synthetic data generation code.

Synthetic data generation

Textures for domain randomization

The synthetic training images are generated with some domain randomization. It includes adding textures to the background (and objects and T-LESS). We use a set of textures extracted from ShapeNet objects. Download the texture dataset:

python -m cosypose.scripts.download --texture_dataset

Recording a synthetic dataset

The synthetic images are generated using multiple proceses managed by dask. The synthetic training images can be generated using the following commands for both datasets:

export CUDA_VISIBLE_DEVICES=0
python -m cosypose.scripts.run_dataset_recording --config tless --local
python -m cosypose.scripts.run_dataset_recording --config ycbv --local

Make sure that enough space is available on your disk. We generate 1 million images which is around 120GB for each dataset. Note that we use a high number of synthetic images, but it may be possible to use fewer images. Please see directly the script scripts/run_dataset_recording.py for additionnal parameters. It is also possible to use dask-jobqueue to generate the images on a cluster but we do not provide a simple configuration script for this at the moment. If you are interested in generating data using multiple machines on a cluster, you will have to modify dask-jobqueue's Cluster definition here.

Visualizing images of the dataset

You can visualize the images of the generated dataset using this notebook. You can check that the ground truth prvided by a dataset is correct using this notebook.

Background images for data augmentation

We apply data augmentation to the training images. Data augmentation includes pasting random images of the pascal VOC dataset on the background of the scenes. You can download Pascal VOC using the following commands:

cd local_data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_11-May-2012.tar

(If the website is down, which happens periodically, you can alternatively download these files from a mirror at https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar)

Training script

Once you have generated the synthetic data images and downloaded pascal VOC, you can run the training script. On YCB-Video, we train a coarse model on synthetic data only and fine-tune it on the synthetic + real images. On T-LESS, we train a coarse and refinement model and synthetic + provided real images of isolated objects directly from scratch. In our experiments, all models are trained using the same procedure on 32 GPUs.

runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config ycbv-refiner-syntonly
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config ycbv-refiner-finetune
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config tless-coarse
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config tless-refiner

You can visualize the logs of the provided models in this notebook.

Logs

You can add the run_id of each model that are your are training to visualize training metrics.

Notes:

</details>

Reproducing multi-view results

<details> <summary>Click for details...</summary>

The following scripts will run the full CosyPose pipeline (single-view predictions + multi-view scene reconstruction), compute the metrics reported in the paper and save the results to a directory in local_data/results/.

export CUDA_VISIBLE_DEVICES=0
python -m cosypose.scripts.run_cosypose_eval --config tless-vivo --nviews=4
python -m cosypose.scripts.run_cosypose_eval --config tless-vivo --nviews=8
python -m cosypose.scripts.run_cosypose_eval --config ycbv --nviews=5

Note that the inference and evaluation can be sped up using runjob if you have access to multiple GPUs. The mAP@ADD-S<0.1d and AUC of ADD-S metrics are computed using our own code since they are not supported by the BOP toolkit. We refer to the appendix of the main paper for more details on these metrics.

The results can be also downloaded directly:

# YCB-Video 5 views
python -m cosypose.scripts.download --result_id=ycbv-n_views=5--8073381555 

# T-LESS ViVo 4 views
python -m cosypose.scripts.download --result_id=tless-vivo-n_views=4--2731943061

# T-LESS ViVo 8 views
python -m cosypose.scripts.download --result_id=tless-vivo-n_views=8--2322743008

On T-LESS ViVo, the evsd<0.3 and ADD-S<0.1d metrics are computed using the BOP toolkit, for example for computing the multi-view results for ViVo 8 views:

python -m cosypose.scripts.run_bop_eval  --results  tless-vivo-n_views=8--2322743008 --method pix2pose_detections/ba_output+all_cand --vivo

The ba_output+all_cand predictions correspond to the output of CosyPose concatenated to all the single-view candidates as explained in the experiment section of the paper. The single-view candidates have strictly lower score than the multi-view predictions, which means that single-view estimates are used for evaluation only if there are no multi-view predictions for an object, e.g. typically because a camera cannot be placed with respect to the scene because there are too few inlier candidates.

We also provide the BOP evaluation results that we computed and reported in the paper:

# T-LESS ViVo 1 view
python -m cosypose.scripts.download --bop_results=cosypose68486-eccv2020_tless-test-primesense

# T-LESS ViVo 4 views
python -m cosypose.scripts.download --bop_results=cosypose615294-eccv2020_tless-test-primesense

# T-LESS ViVo 8 views
python -m cosypose.scripts.download --bop_result_id=cosypose114533-eccv2020_tless-test-primesense

Multi-view visualization

You can use this notebook to visualize the multi-view results on YCB-Video and T-LESS and generate the 3D visualization GIFs.

plots_cosypose

GIF

</details>

Running CosyPose in a custom scenario

<details> <summary>Click for details...</summary>

Stage 2 and 3 of CosyPose are agnostic to the 6D pose estimator used, and can therefore be combined with many other existing single-view object pose estimation method to solve problems on other datasets, or for real applications. We provide a utility for running CosyPose given a set of input 6D object candidates in each image.

If you are willing to combine CosyPose with your own pose estimator, you will need to provide the following:

<!-- - [Optional], a `urdfs` directory which contains the `models` converted to `urdfs` (using only objs mesh for pybullet). for visualization -->

Use these commands to create a custom scenario with T-LESS objects and run CosyPose on it:

cd local_data
mkdir -p custom_scenarios/example
ln -s $(pwd)/bop_datasets/tless/models custom_scenarios/example

export CUDA_VISIBLE_DEVICES=0
python -m cosypose.scripts.download --example_scenario
python -m cosypose.scripts.run_custom_scenario --scenario=example

This will generate the following files:

<!-- - [If urdfs are provided] `results/subscene=0/visualizations.png` a figure which shows the input candidates and the projected scene in each view. --> <!-- - [If urdfs are provided] `results/subscene=0/visualization.gif` a visualization GIF of the predicted scene in 3D. -->

You can use this as an example to check the different formats in which the informations should be provided.

Notes:

</details>

BOP20 models and results

<details> <summary>Click for details...</summary>

We provide the training code that we used to train single-view single-object pose estimation models on the 7 core datasets (LM-O, TLESS, TUD-L, IC-BIN, ITODD, HB, YCB-V) and pre-trained detector and pose estimation models. Note that these models are different from the ones used in the paper. The differences with the models used in the paper are the following:

Even though the challenge is focused on single-view pose estimation, we also reported multi-view results on YCB-Video, T-LESS and HB for 4 and 8 views.

Downloading BOP datasets

python -m cosypose.scripts.download --bop_dataset=DATASET --pbr_training_images
python -m cosypose.scripts.download --urdf_models=DATASET

for DATASET={hb,icbin,itodd,lm,lmo,tless,tudl,ycbv}. If you are not interested in training the models, you can remove the flag --pbr_training_images and you can omit lm.

Pre-trained models

You can download all the models that we trained for the challenge using our downloading script:

python -m cosypose.scripts.download --model=model_id

where model_id is given by the table below:

DatasetModel typeTraining imagesmodel_id
hbdetectorPBRdetector-bop-hb-pbr--497808
hbcoarsePBRcoarse-bop-hb-pbr--7075
hbrefinerPBRrefiner-bop-hb-pbr--247731
icbindetectorPBRdetector-bop-icbin-pbr--947409
icbincoarsePBRcoarse-bop-icbin-pbr--915044
icbinrefinerPBRrefiner-bop-icbin-pbr--841882
lmodetectorPBRdetector-bop-lmo-pbr--517542
lmocoarsePBRcoarse-bop-lmo-pbr--707448
lmorefinerPBRrefiner-bop-lmo-pbr--325214
itodddetectorPBRdetector-bop-itodd-pbr--509908
itoddcoarsePBRcoarse-bop-itodd-pbr--681884
itoddrefinerPBRrefiner-bop-itodd-pbr--834427
tlessdetectorPBRdetector-bop-tless-pbr--873074
tlesscoarsePBRcoarse-bop-tless-pbr--506801
tlessrefinerPBRrefiner-bop-tless-pbr--233420
tlessdetectorSYNT+REALdetector-bop-tless-synt+real--452847
tlesscoarseSYNT+REALcoarse-bop-tless-synt+real--160982
tlessrefinerSYNT+REALrefiner-bop-tless-synt+real--881314
tudldetectorPBRdetector-bop-tudl-pbr--728047
tudlcoarsePBRcoarse-bop-tudl-pbr--373484
tudlrefinerPBRrefiner-bop-tudl-pbr--487212
tudldetectorSYNT+REALdetector-bop-tudl-synt+real--298779
tudlcoarseSYNT+REALcoarse-bop-tudl-synt+real--610074
tudlrefinerSYNT+REALrefiner-bop-tudl-synt+real--423239
ycbvdetectorPBRdetector-bop-ycbv-pbr--970850
ycbvcoarsePBRcoarse-bop-ycbv-pbr--724183
ycbvrefinerPBRrefiner-bop-ycbv-pbr--604090
ycbvdetectorSYNT+REALdetector-bop-ycbv-synt+real--292971
ycbvcoarseSYNT+REALcoarse-bop-ycbv-synt+real--822463
ycbvrefinerSYNT+REALrefiner-bop-ycbv-synt+real--631598

The detectors are MaskRCNN models with resnet50 FPN backbone. PBR corresponds to training only on provided synthetic images. SYNT+REAL corresponds to training on all available synthetic and real images when available (only for tless, tudl and ycbv). SYNT+REAL models are pre-trained from PBR.

If you want to use all the models for a complete evaluation:

python -m cosypose.scripts.download --all_bop20_models

Running inference

The following commands will reproduce the results that we reported on the leaderboard for all the datasets:

# CosyPose-ECCV20-PBR-1VIEW	
python -m cosypose.scripts.run_bop_inference --config bop-pbr

# CosyPose-ECCV20-SYNT+REAL-1VIEW
python -m cosypose.scripts.run_bop_inference --config bop-synt+real

# CosyPose-ECCV20-SYNT+REAL-1VIEW-ICP	
python -m cosypose.scripts.run_bop_inference --config bop-synt+real --icp

# CosyPose-ECCV20-SYNT+REAL-4VIEWS	
python -m cosypose.scripts.run_bop_inference --config bop-synt+real --nviews=4

# CosyPose-ECCV20-SYNT+REAL-8VIEWS	
python -m cosypose.scripts.run_bop_inference --config bop-synt+real --nviews=8

The inference script is compatible with runjob.

Inference results on all datasets can be downloaded directly:

python -m cosypose.scripts.download --result_id=result_id

where result_id is given by the table below

BOP20 method nameresult_id
CosyPose-ECCV20-PBR-1VIEWbop-pbr--223026
CosyPose-ECCV20-SYNT+REAL-1VIEWbop-synt+real--815712
CosyPose-ECCV20-SYNT+REAL-1VIEW-ICPbop-synt+real-icp--121351
CosyPose-ECCV20-SYNT+REAL-4VIEWSbop-synt+real-nviews=4--419066
CosyPose-ECCV20-SYNT+REAL-8VIEWSbop-synt+real-nviews=8--763684

If you want to download everything:

python -m cosypose.scripts.download --all_bop20_results

Notes:

<!-- ## Visualizing results --> <!-- TODO --> <!-- Inference results can be visualized on each dataset, please see this notebook for an example. -->

Running evaluation

You can run locally the evaluation on the publicly available test sets:

python -m cosypose.scripts.run_bop20_eval_multi --result_id=result_id --method=method

where method is maskrcnn_detections/refiner/iteration=4 for single-view, maskrcnn_detections/icp when ICP is ran, and maskrcnn_detections/multiview for multi-view (n_views > 1).

If you are only interested in generating the bop predictions file suitable for submission to the website, you can run

python -m cosypose.scripts.run_bop20_eval_multi --result_id=result_id --method=method --convert_only

Training details

Detection

We use torchvision's MaskRCNN implementation for the detection. The models were trained using:

runjob --ngpus=32 python -m cosypose.scripts.run_detector_training --config bop-DATASET-TRAINING_IMAGES

where DATASET={lmo,tless,tudl,icbin,itodd,hb,ycbv} and TRAINING_IMAGES={pbr,synt+real} (synt+real only for datasets where real images are available: tless, tudl and ycbv).

Pose estimation

runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config bop-DATASET-TRAINING_IMAGES-MODEL_TYPE

where MODEL_TYPE={coarse,refiner}.

</details> <!-- TODO --> <!-- Training logs are available in [this](notebooks/bop20_training_logs.ipynb) notebook. -->