Home

Awesome

M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations

Github arXiv GRIP

This is the official repository of the paper: M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations

Giada Zingarini, Davide Cozzolino, Riccardo Corvi, Giovanni Poggi, and Luisa Verdoliva

<p align="center"> <img src="./docs/images.png" alt="preview" width="80%" /> </p>

Overview

The ability to detect manipulated visual content is becoming increasingly important in many application fields, given the rapid advances in image synthesis methods. Of particular concern is the possibility of modifying the content of medical images, altering the resulting diagnoses. Despite its relevance, this issue has received limited attention from the research community. One reason is the lack of large and curated datasets to use for development and benchmarking purposes. Here, we investigate this issue and propose M3Dsynth, a large dataset of manipulated Computed Tomography (CT) lung images. We create manipulated images by injecting or removing lung cancer nodules in real CT scans, using three different methods based on Generative Adversarial Networks (GAN) or Diffusion Models (DM), for a total of 8,577 manipulated samples. Experiments show that these images easily fool automated diagnostic tools. We also tested several state-of-the-art forensic detectors and demonstrated that, once trained on the proposed dataset, they are able to accurately detect and localize manipulated synthetic content, including when training and test sets are not aligned, showing good generalization ability.

Requirements

tqdm pillow numpy pydicom matplotlib

Dataset

M3Dsynth dataset relies on the Computed Tomography (CT) lung scans of the LIDC-IDRI dataset [A]. To download real CT scans in dicom format, see the official web page of LIDC-IDRI dataset here. While, the manipulated CT scans can be downloaded here or using the following script:

bash ./get_M3Dsynth.sh PATH_LIDC_IDRI OUTPUT_DIR_PATH

[A] Armato SG 3rd, et al. "The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A completed reference database of lung nodules on CT scans." Medical Physics, 2011. doi.org/10.1118/1.3528204

License

The license of the datasat can be found in the LICENSE.md file.

Bibtex

@inproceedings{zingarini2024m3dsynth,
  title={M3Dsynth: A dataset of medical 3D images with AI-generated local manipulations},
  author={Zingarini, Giada and Cozzolino, Davide and Corvi, Riccardo and Poggi, Giovanni and Verdoliva, Luisa},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={13176--13180},
  year={2024},
  organization={IEEE}
}