Home

Awesome

CoMix: Comics Dataset Framework for Comics Understanding

Package Version

The repo is under development. The codebase is called comix. Please contribute to the project by reporting issues, suggesting improvements or adding new datasets. We are currently working on refactoring the code to support:

Introduction

The purpose of this project is to replicate (on the validation set) the benchmarks presented in:

In particular, one of the main limitation when working in Comics/Manga datasets is the impossibility to share images. To overcome this problem, we have created this framework that allows to use our (validation) annotations, and download the images from the original sources, without breaking the licenses.

The comix is using the following datasets:

Installation

The project is written in Python 3.8. To create a conda environment, consider using:

conda create --name myenv python=3.8
conda activate myenv

and to install the dependencies, run the following command:

pip install -e .

The above command will install the package comix in editable mode, so that you can modify the code and see the changes immediately. In the case of benchmarking the detection and captioning models, we will create separate conda environments to not conflict with the dependencies.

Procedures

In general, this project is divided into the following steps:

Model performances and evaluation

In the benchmarks folder, we have multiple scripts to benchmark the models on the datasets, on various tasks. The detection scripts produce a COCO-format json file which can be used by the comix/evaluators/detection.py script to evaluate the performances of the models. The captioning scripts produce multiple .txt files, which can be postprocess to obtain a captions.csv and objects.csv files, used by the comix/evaluators/captioning.py script to evaluate the performances of the models.

Documentation

The documentation is available in the /docs folder.

In particular:

docs/
├── README.md                   # Project overview, installation, quick start
├── datasets/                   # Dataset documentation
│   └── README.md               # Unified dataset info
└── tasks/                      # Task-specific documentation
    ├── detection.md            # Detection task
    │   ├── README.md           # Overview of detection pipeline
    │   ├── generation.md       # Detection models
    │   └── evaluation.md       # Metrics and evaluation
    └── captioning/             # Captioning task
        ├── README.md           # Overview of captioning pipeline
        ├── generation.md       # VLM caption generation details
        ├── postprocessing.md   # LLaMA post-processing
        └── evaluation.md       # Metrics and evaluation

Here are the most important documents: