Awesome
CIRCO Dataset (ICCV 2023)
š„š„ [2024/05/07] Following the extended version of our paper, from now the evaluation server also provides the results divided by semantic category
This is the official repository of the Composed Image Retrieval on Common Objects in context (CIRCO) dataset.
For more details please see our ICCV 2023 paper "Zero-Shot Composed Image Retrieval with Textual Inversion" and its extended version "iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval".
You are currently viewing the dataset repository. If you are looking for more information about our method SEARLE see the repository.
Table of Contents
Overview
CIRCO (Composed Image Retrieval on Common Objects in context) is an open-domain benchmarking dataset for Composed Image Retrieval (CIR) based on real-world images from COCO 2017 unlabeled set. It is the first CIR dataset with multiple ground truths and aims to address the problem of false negatives in existing datasets. CIRCO comprises a total of 1020 queries, randomly divided into 220 and 800 for the validation and test set, respectively, with an average of 4.53 ground truths per query. We evaluate the performance on CIRCO using mAP@K.
Download
CIRCO is structured similarly to CIRR and FashionIQ, two popular datasets for CIR.
Start by cloning the repository:
git clone https://github.com/miccunifi/CIRCO.git
Annotations
The annotations are provided in the annotations
folder. For each split, a JSON file contains the list of the
corresponding annotations. Each annotation comprises the following fields:
reference_img_id
: the id of the reference image;target_img_id
: the id of the target image (the one we used to write the relative caption);relative_caption
: the relative caption of the target image;shared_concept
: the shared concept between the reference and target images (useful to clarify ambiguities);gt_img_ids
: the list of ground truth images;id
: the id of the query.semantic_aspects
: the list of semantic aspects that characterize the query.
{
"reference_img_id": 85932,
"target_img_id": 9761,
"relative_caption": "is held by a little girl on a chair",
"shared_concept": "a teddy bear",
"gt_img_ids": [
9761,
489535,
57541,
375057,
119881
],
"id": 13,
"semantic_aspects": [
"spatial_relations_background",
"direct_addressing"
]
}
</details>
Note that:
target_img_id
,gt_img_ids
andsemantic_aspects
are not available for the test set.target_img_id
always corresponds to the first element ofgt_img_ids
.
Images
CIRCO is based on images taken from the COCO 2017 unlabeled set. Please see the COCO website to download both the images and the corresponding annotations.
Tip: sometimes when clicking on the download link from the COCO website, the download does not start. In this case, copy the download link and paste it into a new browser tab.
Create a folder named COCO2017_unlabeled
in the CIRCO
folder:
cd CIRCO
mkdir COCO2017_unlabeled
After downloading the images and the annotations, unzip and move them to the COCO2017_unlabeled
folder.
Data Structure
After the download, the data structure should be as follows:
CIRCO
āāāā annotations
| test.json
| val.json
āāāā COCO2017_unlabeled
āāāā annotations
| image_info_unlabeled2017.json
āāāā unlabeled2017
| 000000243611.jpg
| 000000535009.jpg
| 000000097553.jpg
| ...
Test Evaluation Server
We do not release the ground truth labels for CIRCO test split. Instead, we host an evaluation server to allow researchers to evaluate their models on the test split. The server is hosted independently, so please email us if the site is unreachable.
Once you have submitted your predictions, you will receive an email with the results.
Submission Format
The evaluation server accepts a JSON file where the keys are the query ids and the values are the lists of the top 50 retrieved images.
Note that:
- the submission file must contain all the queries in the test set;
- to limit the size of the submission file, you must submit only the top 50 retrieved images for each query.
The submission file should be formatted as the following example:
<details> <summary>Click to expand</summary>{
"0": [
9761,
489535,
57541,
375057,
119881,
...
],
"1": [
9761,
489535,
57541,
375057,
119881,
...
],
...
"799": [
9761,
489535,
57541,
375057,
119881,
...
],
}
</details>
Under the submission_examples/
directory, we provide two examples of submission files: one for the validation set and one for the test set.
Since we release the GT labels for the validation set, you do not need to submit a submission file to evaluate your model on the validation set.
We provide the submission file for the validation set only to test the evaluation server code through the evaluation.py
script. See
here for an example of how to generate a submission file.
Utility Scripts
Under the src/
directory, we provide some utility scripts to help you in the usage of the dataset:
dataset.py
: contains theCIRCODataset
class, which is a PyTorch Dataset class that can be used to load the dataset;evaluation.py
: contains the code that is running on the server to evaluate the submitted predictions.
Authors
* Equal contribution. Author ordering was determined by coin flip.
Citation
@misc{baldrati2023zeroshot,
title={Zero-Shot Composed Image Retrieval with Textual Inversion},
author={Alberto Baldrati and Lorenzo Agnolucci and Marco Bertini and Alberto Del Bimbo},
year={2023},
eprint={2303.15247},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Acknowledgements
This work was partially supported by the European Commission under European Horizon 2020 Programme, grant number 101004545 - ReInHerit.
LICENSE
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />All material is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.