Awesome
Awesome Comics Understanding
This repository contains a curated list of research papers and resources focusing on Comics Understanding.
π₯ One missing piece in Vision and Language: A Survey on Comics Understanding π₯
Authors: Emanuele Vivoli, Andrey Barsky, Mohamed Ali Souibgui, Artemis LlabrΓ©s, Marco Bertini, Dimosthenis Karatzas
π£ Latest News π£
- π§ This repo is a work in progress, please contribute here
14 September 2024
Our survey paper have dropped in arXiv !!
π Table of Contents
Overview of Vision-Language Tasks of the Layers of Comics Understanding. The ranking is based on input and output modalities and dimensions, as illustrated in the paper.
<p align="center"> <img src="imgs/locu.png" style="max-width:1000px"> </p>Layers of Comics Understanding
Every survey worthy of the name includes illustrative visuals to enhance understanding. We've followed this approach by providing examples for each task in the Layer of Comics Understanding.
Go check every Layer's tasks image β¬οΈ.
Layer 1: Tagging and Augmentation
-
Tagging
<p align="left"> <img src="imgs-tasks/1.tagging.png" style="max-width:500px"> </p>-
<details>
<summary>Image classification</summary>
Year Conference / Journal Title Authors Links 2023 TIP Panel-Page-Aware Comic Genre Understanding Xu, Chenshu et al. π Paper 2019 ICDAR Workshop Analysis Based on Distributed Representations of Various Parts Images in Four-Scene Comics Story Dataset Terauchi, Akira et al. π Paper 2018 TPAMI Learning Consensus Representation for Weak Style Classification Jiang, Shuhui et al. π Paper 2018 ICDAR Comic Story Analysis Based on Genre Classification Daiku, Yuki et al. π Paper 2017 ICDAR Histogram of Exclamation Marks and Its Application for Comics Analysis Hiroe, Sotaro et al. π Paper 2014 ACM Multimedia Line-Based Drawing Style Description for Manga Classification Chu, Wei-Ta et al. π Paper
-
<details>
<summary>Emotion classification</summary>
Year Conference / Journal Title Authors Links 2023 MMM Manga Text Detection with Manga-Specific Data Augmentation and Its Applications on Emotion Analysis Yang, Yi-Ting et al. π Paper 2021 ICDAR Competition on Multimodal Emotion Recognition on Comics Scenes Nguyen, Nhu-Van et al. π Paper, π¨βπ» Code 2016 MANPU (ICPR) Manga Content Analysis Using Physiological Signals Sanches, Charles Lima et al. π Paper 2015 IIAI-AAI Relation Analysis between Speech Balloon Shapes and Their Serif Descriptions in Comic Tanaka, Hideki et al. π Paper
-
<details>
<summary>Action Detection</summary>
Year Conference / Journal Title Authors Links 2024 Arxiv MangaUB: A Manga Understanding Benchmark for Large Multimodal Models Ikuta, Hikaru et al. π Paper 2024 MANPU (ICDAR) ComicBERT: A Transformer Model and Pre-training Strategy for Contextual Understanding in Comics Soykan, Gurkan et al. π Paper, π¨βπ» Code 2024 ICDAR Multimodal Transformer for Comics Text-Cloze Vivoli, Emanuele et al. π Paper 2020 Arxiv A Comprehensive Study of Deep Video Action Recognition Zhu, Yi et al. π Paper, π¨βπ» Code 2017 CVPR The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives Iyyer, Mohit et al. π Paper
-
<details>
<summary>Page Stream Segmentation</summary>
Year Conference / Journal Title Authors Links 2022 ICPR Semantic Parsing of Interpage Relations DemirtaΕ, Mehmet Arif et al. π Paper 2018 LREC Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features Wiedemann, Gregor et al. π Paper 2013 ICDAR Document Classification and Page Stream Segmentation for Digital Mailroom Applications Gordo, Albert et al. π Paper
-
<details>
<summary>Image classification</summary>
-
Augmentation ( Image-2-Image )
<p align="left"> <img src="imgs-tasks/2.augmentation.png" style="max-width:500px"> </p>-
<details>
<summary>Image Super-Resolution</summary>
Year Conference / Journal Title Authors Links 2023 MTA Automatic Dewarping of Camera-Captured Comic Document Images Garai, Arpan et al. π Paper
-
<details>
<summary>Style Transfer</summary>
Year Conference / Journal Title Authors Links 2023 Arxiv Inkn'hue: Enhancing Manga Colorization from Multiple Priors with Alignment Multi-Encoder VAE Jiramahapokee, Tawin π Paper, π¨βπ» Code 2023 IEEE Access Robust Manga Page Colorization via Coloring Latent Space Golyadkin, Maksim et al. π Paper 2023 TVCG Shading-Guided Manga Screening from Reference Wu, Huisi et al. π Paper 2022 Arxiv DASS-Detector: Domain-Adaptive Self-Supervised Pre-Training for Face & Body Detection in Drawings Topal, BarΔ±Ε Batuhan et al. π Paper, π¨βπ» Code 2021 CVPR Generating Manga from Illustrations via Mimicking Manga Creation Workflow Zhang, LM et al.* π Paper, π¨βπ» Code 2021 CVPR Unbiased Mean Teacher for Cross-domain Object Detection Deng, Jinhong et al. π Paper, π¨βπ» Code 2021 CVPR Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation Richardson, Elad et al. π Paper, π¨βπ» Code 2021 AAAI MangaGAN: Unpaired Photo-to-Manga Translation Based on The Methodology of Manga Drawing Su, Hao et al. π Paper 2019 ISM Synthesis of Screentone Patterns of Manga Characters Tsubota, K. et al. π Paper, π¨βπ» Code 2018 SciVis Color Interpolation for Non-Euclidean Color Spaces Zeyen, Max et al. π Paper 2017 ACM-SIGGRAPH Asia Comicolorization: Semi-automatic Manga Colorization Furusawa, Chie et al. π Paper, π¨βπ» Code 2017 ICDAR CGAN-Based Manga Colorization Using a Single Training Image Hensman, Paulina et al. π Paper, π¨βπ» Code 2017 CVPR Image-to-Image Translation with Conditional Adversarial Networks Isola, Phillip et al. π Paper 2017 ACM-TG Deep Extraction of Manga Structural Lines Li, Chengze et al. π Paper, π¨βπ» Code
-
<details>
<summary>Vectorization</summary>
Year Conference / Journal Title Authors Links 2023 TCSVT MARVEL: Raster Gray-level Manga Vectorization via Primitive-wise Deep Reinforcement Learning H. Su et al. π Paper, π¨βπ» Code 2022 CVPR Towards Layer-wise Image Vectorization Ma, Xu et al. π Paper, π¨βπ» Code 2017 ACM-TG Deep Extraction of Manga Structural Lines Li, Chengze et al. π Paper, π¨βπ» Code 2017 TVCG Manga Vectorization and Manipulation with Procedural Simple Screentone Yao, Chih-Yuan et al. π Paper 2011 ACM-SIGGRAPH Depixelizing Pixel Art Kopf, Johannes et al. π Paper 2003 N/A Potrace : A Polygon-Based Tracing Algorithm Selinger, Peter π Paper, π¨βπ» Code
-
<details>
<summary>Depth Estimation</summary>
Year Conference / Journal Title Authors Links 2023 CVPR Workshop Dense Multitask Learning to Reconfigure Comics Bhattacharjee, Deblina et al. π Paper 2022 WACV Estimating Image Depth in the Comics Domain Bhattacharjee, Deblina et al. π Paper, π¨βπ» Code 2022 CVPR MulT: An End-to-End Multitask Learning Transformer Bhattacharjee, Deblina et al. π Paper, π¨βπ» Code
-
<details>
<summary>Image Super-Resolution</summary>
Layer 2: Grounding, Analysis and Segmentation
-
Grounding
<p align="left"> <img src="imgs-tasks/3.grounding.png" style="max-width:500px"> </p>-
<details>
<summary>Object detection</summary>
Year Conference / Journal Title Authors Links 2024 MANPU (ICDAR) Comics Datasets Framework: Mix of Comics datasets for detection benchmarking Vivoli, Emanuele et al. π Paper 2024 MANPU (ICDAR) A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition Gurkan Soykan et al. π Paper, π¨βπ» Code 2023 MMM Manga Text Detection with Manga-Specific Data Augmentation and Its Applications on Emotion Analysis Yang, \relax YT et al. π Paper 2023 CSNT CPD: Faster RCNN-based DragonBall Comic Panel Detection Sharma, Rishabh et al. π Paper 2022 IJDAR BCBId: First Bangla Comic Dataset and Its Applications Dutta, Arpita et al. π Paper 2022 ECCV COO/ Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts Baek, Jeonghun et al. π Paper, π¨βπ» Code 2019 ICDAR Workshop What Do We Expect from Comic Panel Extraction? Nguyen Nhu, Van et al. π Paper 2019 ICDAR Workshop CNN Based Extraction of Panels/Characters from Bengali Comic Book Page Images Dutta, Arpita et al. π Paper 2018 VCIP Text Detection in Manga by Deep Region Proposal, Classification, and Regression Chu, Wei-Ta et al. π Paper 2018 IWAIT A Study on Object Detection Method from Manga Images Using CNN Yanagisawa, Hideaki et al. π Paper 2018 IWAIT A Study on Object Detection Method from Manga Images Using CNN Yanagisawa, Hideaki et al. π Paper 2017 ICDAR A Faster R-CNN Based Method for Comic Characters Face Detection Qin, Xiaoran et al. π Paper 2016 IJCG Text-Aware Balloon Extraction from Manga Liu, Xueting et al. π Paper 2016 IJCNN Line-Wise Text Identification in Comic Books: A Support Vector Machine-Based Approach Pal, Srikanta et al. π Paper 2016 ICIP Text Detection in Manga by Combining Connected-Component-Based and Region-Based Classifications Aramaki, Yuji et al. π Paper 2015 ICIAP Panel Tracking for the Extraction and the Classification of Speech Balloons Jomaa, Hadi S. et al. π Paper 2012 DAS Panel and Speech Balloon Extraction from Comic Books Ho, Anh Khoi Ngo et al. π Paper 2011 IJI Method for Real Time Text Extraction of Digital Manga Comic Arai, Kohei et al. π Paper 2011 ICDAR Recognizing Text Elements for SVG Comic Compression and Its Novel Applications Su, Chung-Yuan et al. π Paper 2010 ICIT Method for Automatic E-Comic Scene Frame Extraction for Reading Comic on Mobile Devices Arai, Kohei et al. π Paper 2009 IJHCI Enhancing the Accessibility for All of Digital Comic Books Ponsard, Christophe π Paper
-
<details>
<summary>Character Re-Identification</summary>
Year Conference / Journal Title Authors Links 2024 Arxiv Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Ragav Sachdeva et al. π Paper, π¨βπ» Code 2024 NeurIPS CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding Emanuele Vivoli et al. π Paper, π¨βπ» Code 2024 CVPR The Manga Whisperer: Automatically Generating Transcriptions for Comics Sachdeva, Ragav et al. π Paper, π¨βπ» Code 2023 IET Image Processing Toward Cross-Domain Object Detection in Artwork Images Using Improved YoloV5 and XGBoosting Ahmad, Tasweer et al. π Paper 2023 Arxiv Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification Soykan, GΓΌrkan et al. π Paper 2023 ACM-MM Asia Occlusion-Aware Manga Character Re-Identification with Self-Paced Contrastive Learning Zhang, Ci-Yin et al. π Paper 2022 Arxiv Unsupervised Manga Character Re-Identification via Face-Body and Spatial-Temporal Associated Clustering Zhang, Z et al. π Paper 2022 ICIR CAST: Character Labeling in Animation Using Selfβsupervision by Tracking Nir, Oron et al. π Paper 2020 ICPR Dual Loss for Manga Character Recognition with Imbalanced Training Data Li, Yonggang et al. π Paper 2020 ICML A Simple Framework for Contrastive Learning of Visual Representations Chen, Ting et al. π Paper 2015 ACPR Similarity Learning Based on Pool-Based Active Learning for Manga Character Retrieval Iwata, Motoi et al. π Paper 2014 DAS A Study to Achieve Manga Character Retrieval Method for Manga Images Iwata, M. et al. π Paper 2012 CVPR Color Attributes for Object Detection Khan, Fahad Shahbaz et al. π Paper 2012 ECCV PHOG Analysis of Self-Similarity in Aesthetic Images Redies, Christoph et al. π Paper 2011 ICDAR Similar Manga Retrieval Using Visual Vocabulary Based on Regions of Interest Sun, Weihan et al. π Paper
-
<details>
<summary>Sentence-based Grounding</summary>
Year Conference / Journal Title Authors Links 2024 AAAI GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection Shen, Haozhan et al. π Paper, π¨βπ» Code 2024 ECCV Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection Liu, Shilong et al. π Paper, π¨βπ» Code 2020 CVPR Workshop Exploring Phrase Grounding without Training: Contextualisation and Extension to Text-Based Image Retrieval Parcalabescu, Letitia et al. π Paper 2019 AAAI Zero-Shot Object Detection with Textual Descriptions Li, Zhihui et al. π Paper
-
<details>
<summary>Object detection</summary>
-
Analysis
<p align="left"> <img src="imgs-tasks/4.analysis.png" style="max-width:500px"> </p>-
<details>
<summary>Text-Character association</summary>
Year Conference / Journal Title Authors Links 2024 Arxiv Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Ragav Sachdeva et al. π Paper, π¨βπ» Code 2024 NeurIPS CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding Emanuele Vivoli et al. π Paper, π¨βπ» Code 2024 MANPU (ICDAR) Spatially Augmented Speech Bubble to Character Association via Comic Multi-task Learning Soykan, Gurkan et al. π Paper π» Code 2024 CVPR The Manga Whisperer: Automatically Generating Transcriptions for Comics Sachdeva, Ragav et al. π Paper, π¨βπ» Code 2023 arXiv Manga109Dialog A Large-scale Dialogue Dataset for Comics Speaker Detection Li, Yingxuan et al. π Paper 2022 IIAI-AAI Algorithms for Estimation of Comic Speakers Considering Reading Order of Frames and Texts Omori, Yuga et al. π Paper 2019 IJDAR Comic MTL: Optimized Multi-Task Learning for Comic Book Image Analysis Nguyen, Nhu-Van et al. π Paper 2015 ICDAR Speech Balloon and Speaker Association for Comics and Manga Understanding Rigaud, Christophe et al. π Paper
-
<details>
<summary>Panel Sorting</summary>
Year Conference / Journal Title Authors Links 2017 ICDAR Story Pattern Analysis Based on Scene Order Information in Four-Scene Comics Ueno, Miki et al. π Paper
-
<details>
<summary>Dialog transcription</summary>
Year Conference / Journal Title Authors Links 2024 Arxiv Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names Ragav Sachdeva et al. π Paper, π¨βπ» Code 2024 NeurIPS CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding Emanuele Vivoli et al. π Paper, π¨βπ» Code 2024 CVPR The Manga Whisperer: Automatically Generating Transcriptions for Comics Sachdeva, Ragav et al. π Paper, π¨βπ» Code 2023 arXiv Manga109Dialog A Large-scale Dialogue Dataset for Comics Speaker Detection Li, Yingxuan et al. π Paper
-
<details>
<summary>Translation</summary>
Year Conference / Journal Title Authors Links 2024 ArXiv Context-Informed Machine Translation of Manga using Multimodal Large Language Models Lippmann, Philip et al. π Paper, π¨βπ» Code 2024 ArXiv Large Language Models as Manga Translators: A Case Study Zhishen Yang et al. paper 2024 ArXiv Generating Visual Stories with Grounded and Coreferent Characters Danyang Liu et al. paper 2024 ICKECS The Future of Graphic Novel Translation: Fully Automated Systems Sandeep Singh et al. paper 2024 ACM Multimedia Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion Yingxuan Li et al. paper 2023 ArXiv Multi-Teacher Knowledge Distillation For Text Image Machine Translation Cong Ma et al. paper 2020 ArXiv Towards Fully Automated Manga Translation Ryota Hinami et al. paper 2014 inTRAlinea Visual adaptation in translated comics Federico, Zanettin paper
-
<details>
<summary>Text-Character association</summary>
-
Segmentation
<p align="left"> <img src="imgs-tasks/5.segmentation.png" style="max-width:500px"> </p>-
<details>
<summary>Instance Segmentation</summary>
Year Conference / Journal Title Authors Links 2024 AI4VA (ECCV) Unlocking Comics: The AI4VA Dataset for Visual Understanding GrΓΆnquist, Peter et al. π Paper,π¨βπ» Code 2024 ICDAR Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding Kouletou, Eleanna et al. π Paper 2022 DataverseNL The Visual Language Research Corpus (VLRC) Project Cohn, Neil π Paper
-
<details>
<summary>Instance Segmentation</summary>
Layer 3: Retrieval and Modification
-
Retrieval
<p align="left"> <img src="imgs-tasks/6.retrieval.png" style="max-width:500px"> </p>-
<details>
<summary>Image-Text Retrieval</summary>
Year Conference / Journal Title Authors Links 2014 DAS A Study to Achieve Manga Character Retrieval Method for Manga Images Iwata, M. et al. π Paper 2011 ICDAR Similar Manga Retrieval Using Visual Vocabulary Based on Regions of Interest Sun, Weihan et al. π Paper 2011 CAVW Comic Character Animation Using Bayesian Estimation Chou, Yun-Feng et al. π Paper 2010 ICGC Searching Digital Political Cartoons Wu, Yejun π Paper
-
<details>
<summary>Text-Image Retrieval</summary>
Year Conference / Journal Title Authors Links 2014 ICIP Sketch2Manga: Sketch-based Manga Retrieval Matsui, Yusuke et al. π Paper
-
<details>
<summary>Composed Image Retrieval</summary>
Year Conference / Journal Title Authors Links 2023 Arxiv MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language Shen, Conghao Tom et al. π Paper 2022 DICTA ComicLib: A New Large-Scale Comic Dataset for Sketch Understanding Wei, Xin et al. π Paper 2021 ICDAR Manga-MMTL: Multimodal Multitask Transfer Learning for Manga Character Analysis Nguyen, Nhu-Van et al. π Paper 2017 ICDAR Sketch-Based Manga Retrieval Using Deep Features Narita, Rei et al. π Paper 2017 Arxiv A Neural Representation of Sketch Drawings Ha, David et al. π Paper 2017 Arxiv Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN Zhang, Lvmin et al. π Paper 2015 MM-TA Sketch-Based Manga Retrieval Using Manga109 Dataset Matsui, Yusuke et al. π Paper
-
<details>
<summary>Personalized Image Retrieval</summary>
Year Conference / Journal Title Authors Links 2022 BMVC Personalised CLIP or: How to Find Your Vacation Videos Korbar, Bruno et al. π Paper
-
<details>
<summary>Image-Text Retrieval</summary>
-
Modification
<p align="left"> <img src="imgs-tasks/7.modification.png" style="max-width:500px"> </p>-
<details>
<summary>Image Impainting and Editing</summary>
Year Conference / Journal Title Authors Links 2022 ACM-UIST CodeToon: Story Ideation, Auto Comic Generation, and Structure Mapping for Code-Driven Storytelling Suh, Sangho et al. π Paper 2022 TVCG Interactive Data Comics Wang, Zezhong et al. π Paper
-
<details>
<summary>Image Impainting and Editing</summary>
Layer 4: Understanding
-
Understanding
<p align="left"> <img src="imgs-tasks/8.understanding.png" style="max-width:500px"> </p>-
<details>
<summary>Visual Entailment</summary>
Year Conference / Journal Title Authors Links
-
<details>
<summary>Visual-Question Answer</summary>
Year Conference / Journal Title Authors Links 2022 WACV Challenges in Procedural Multimodal Machine Comprehension: A Novel Way To Benchmark Sahu, Pritish et al. π Paper 2021 Arxiv Towards Solving Multimodal Comprehension Sahu, Pritish et al. π Paper 2020 MDPI-AS A Survey on Machine Reading ComprehensionβTasks, Evaluation Metrics and Benchmark Datasets Zeng, Changchang et al. π Paper 2017 IIWAS ComicQA: Contextual Navigation Aid by Hyper-Comic Representation Sumi, Yasuyuki et al. π Paper 2016 MANPU (ICPR) Designing a Question-Answering System for Comic Contents Moriyama, Yukihiro et al. π Paper
-
<details>
<summary>Visual-Dialog</summary>
Year Conference / Journal Title Authors Links
-
<details>
<summary>Visual Reasoning</summary>
Year Conference / Journal Title Authors Links
-
<details>
<summary>Visual Entailment</summary>
Layer 5: Generation and Synthesis
-
Generation
<p align="left"> <img src="imgs-tasks/9.generation.png" style="max-width:500px"> </p>-
<details>
<summary>Comics generation from other media</summary>
Year Conference / Journal Title Authors Links 2023 SIGCSE Developing Comic-based Learning Toolkits for Teaching Computing to Elementary School Learners Castro, Francico et al. π Paper 2022 THMS Augmenting Conversations With Comic-Style Word Balloons Zhang, H. et al. π Paper 2022 LACLO Comics as a Pedagogical Tool for Teaching Lima, Antonio Alexandre et al. π Paper 2021 TVCG ChartStory: Automated Partitioning, Layout, and Captioning of Charts into Comic-Style Narratives Zhao, Jian et al. π Paper 2021 SIGCSE Using Comics to Introduce and Reinforce Programming Concepts in CS1 Suh, Sangho et al. π Paper 2021 MM-CCA Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation Yang, Xin et al. π Paper 2018 ACM Comixify: Transform Video into a Comics Pesko, Maciej et al. π Paper 2015 TOMM Content-Aware Video2Comics With Manga-Style Layout Jing, Guangmei et al. π Paper 2012 TOMM Movie2Comics: Towards a Lively Video Content Presentation Wang, Meng et al. π Paper 2012 ACM-TG Automatic Stylistic Manga Layout Cao, Ying et al. π Paper 2012 TOMM Scalable Comic-like Video Summaries and Layout Disturbance Herranz, Luis et al. π Paper 2011 ACM-MM Automatic Preview Generation of Comic Episodes for Digitized Comic Search Hoashi, Keiichiro et al. π Paper 2011 ISPACS Automatic Comic Strip Generation Using Extracted Keyframes from Cartoon Animation Tanapichet, Pakpoom et al. π Paper 2011 ICMLC Caricaturation for Human Face Pictures Chang, I-Cheng et al. π Paper 2010 SICE Comic Live Chat Communication Tool Based on Concept of Downgrading Matsuda, Misaki et al. π Paper 2010 CAIDCD Research and Development of the Generation in Japanese Manga Based on Frontal Face Image Xuexiong, Deng et al. π Paper
-
<details>
<summary>Comics to Scene graph</summary>
Year Conference / Journal Title Authors Links
-
<details>
<summary>Image-2-Text Generation</summary>
Year Conference / Journal Title Authors Links 2024 NeurIPS Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions Hu, Zhe et al. π Paper 2024 AI4VA (ECCV) ComiCap: A VLMs pipeline for dense captioning of Comic Panels Vivoli, Emanuele et al. π Paper 2024 ICDAR Multimodal Transformer for Comics Text-Cloze Vivoli, Emanuele et al. π Paper 2024 MANPU (ICDAR) Toward Accessible Comics for Blind and Low Vision Readers Rigaud, Christophe et al. π Paper 2024 CVPR The Manga Whisperer: Automatically Generating Transcriptions for Comics Sachdeva, Ragav et al. π Paper, π¨βπ» Code 2023 Arxiv Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips Ramaprasad, Reshma et al. π Paper 2023 ACL Multimodal Persona Based Generation of Comic Dialogs Agrawal, Harsh et al. π Paper 2023 Arxiv M2C: Towards Automatic Multimodal Manga Complement Guo, Hongcheng et al. π Paper
-
<details>
<summary>Text-2-Image Generation</summary>
Year Conference / Journal Title Authors Links 2023 ICCV Diffusion in Style Everaert, Martin Nicolas et al. π Paper, π¨βπ» Code 2023 MDPI-AS A Study on Generating Webtoons Using Multilingual Text-to-Image Models Yu, Kyungho et al. π Paper 2023 Arxiv Generating Coherent Comic with Rich Story Using ChatGPT and Stable Diffusion Jin, Ze et al. π Paper 2022 ISM Conditional GAN for Small Datasets Hiruta, Komei et al. π Paper 2021 NAACL Improving Generation and Evaluation of Visual Stories via Semantic Consistency Maharana, Adyasha et al. π Paper, π¨βπ» Code 2021 CoRR Integrating Visuospatial, Linguistic and Commonsense Structure intoStory Visualization Maharana, Adyasha et al. π Paper, π¨βπ» Code 2021 ICCC A Deep Learning Pipeline for the Synthesis of Graphic Novels Melistas, Thomas et al. π Paper 2021 Arxiv ComicGAN: Text-to-Comic Generative Adversarial Network Proven-Bessel, Ben et al. π Paper, π¨βπ» Code 2019 CVPR StoryGAN: A Sequential Conditional GAN for Story Visualization Li, Yitong et al. π Paper, π¨βπ» Code 2018 CVPR Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation Inoue, Naoto et al. π Paper, π¨βπ» Code 2017 Arxiv Towards the Automatic Anime Characters Creation with Generative Adversarial Networks Jin, Yanghua et al. π Paper, π¨βπ» Code
-
<details>
<summary>Scene-graph Generation for captioning</summary>
Year Conference / Journal Title Authors Links
-
<details>
<summary>Sound generation</summary>
Year Conference / Journal Title Authors Links 2023 ACM-TAC AccessComics2: Understanding the User Experience of an Accessible Comic Book Reader for Blind People with Textual Sound Effects Lee, Yun Jung et al. π Paper 2019 ACM-TG Comic-Guided Speech Synthesis Wang, Yujia et al. π Paper
-
<details>
<summary>Comics generation from other media</summary>
-
Synthesis
<p align="left"> <img src="imgs-tasks/10.synthesis.png" style="max-width:500px"> </p>-
<details>
<summary>3D Generation from Images</summary>
Year Conference / Journal Title Authors Links 2023 ECCV AnimeCeleb: Large-Scale Animation CelebHeads Dataset for Head Reenactment Kim, Kangyeol et al. π Paper, π¨βπ» Code 2023 IJCAI Collaborative Neural Rendering Using Anime Character Sheets Lin, Zuzeng et al. π Paper, π¨βπ» Code 2023 CVPR PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters Chen, Shuhong et al. π Paper 2023 Arxiv Sketch-A-Shape: Zero-Shot Sketch-to-3D Shape Generation Sanghi, Aditya et al. π Paper 2021 N/A Talking Head Anime from a Single Image 2: More Expressive Khungurn, Pramook et al. π¨βπ» Code 2020 ICLR U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation Kim, Junho et al. π Paper, π¨βπ» Code 2017 3DV 3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks Lun, Zhaoliang et al. π Paper
-
<details>
<summary>Video generation</summary>
Year Conference / Journal Title Authors Links 2023 Arxiv DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance Wang, Cong et al. π Paper, π¨βπ» Code 2023 Arxiv Photorealistic Video Generation with Diffusion Models Gupta, Agrim et al. π Paper 2023 Arxiv Motion-Conditioned Image Animation for Video Editing Yan, Wilson et al. π Paper, π¨βπ» Code 2021 ICDAR C2VNet: A Deep Learning Framework Towards Comic Strip to Audio-Visual Scene Synthesis Gupta, Vaibhavi et al. π Paper, π¨βπ» Code 2016 TOMM Dynamic Manga: Animating Still Manga via Camera Movement Cao, Ying et al. π Paper
-
<details>
<summary>Narrative-based complex scene generation</summary>
Year Conference / Journal Title Authors Links 2024 WACV Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models Pan, Xichen et al. π Paper, π¨βπ» Code 2023 CVPR Make-A-Story: Visual Memory Conditioned Consistent Story Generation Rahman, Tanzila et al. π Paper 2023 NeurIPS Workshop Personalized Comic Story Generation Peng, Wenxuan et al. π Paper 2022 ECCV StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation Maharana, Adyasha et al. π Paper, π¨βπ» Code 2022 EMNLP Character-Centric Story Visualization via Visual Planning and Token Alignment Chen, Hong et al. π Paper, π¨βπ» Code 2021 NAACL Improving Generation and Evaluation of Visual Stories via Semantic Consistency Maharana, Adyasha et al. π Paper, π¨βπ» Code 2018 CoRR StoryGAN: A Sequential Conditional GAN for Story Visualization Yitong Li et al. π Paper
-
<details>
<summary>3D Generation from Images</summary>
Datasets & Benchmarks ππ
-
<details>
<summary>Datasets</summary>
Overview of Comic/Manga Datasets and Tasks
This table provides an overview of Comic/Manga datasets and tasks, including information on their availability, published year, source, and properties such as languages, number of comic/manga books, and pages. The rows are repeated according to the supported tasks. Accessibility is indicated with β οΈ for no longer existing datasets, β indicates existing but not accessible, and β means existing and accessible. The link [proj] directs to the project websites, while [data] directs to dataset websites. For CoMix, mix means that it inherits from a mixture of four datasets.
Task Name Year Access. Language Origin # books # pages Image Classification Sequencity [proj] 2017 β οΈ EN, JP - - 140000 BAM! [proj] 2017 β οΈ - - - 2500000 Manga109 [proj][data] 2018 β JP 1970-2010 109 21142 EmoRecCom [proj][data] 2021 β EN 1938-1954 - - Object Detection Fahad18 [proj] 2012 β - - - 586 eBDtheque [proj][data] 2013 β EN, FR, JP 1905-2012 25 100 sun70 [proj] 2013 β FR - 6 60 COMICS [proj][data] 2017 β EN 1938-1954 3948 198657 BAM! [proj] 2017 β οΈ - - - 2500000 JC2463 [proj] 2017 β JP - 14 2463 AEC912 [proj] 2017 β EN, FR - - 912 GCN [proj][data] 2017 β EN, JP 1978-2013 253 38000 Sequencity612 [proj] 2017 β οΈ EN, JP - - 612 SSGCI [proj][data] 2016 β EN, FR, JP 1905-2012 - 500 Comics3w [proj] 2017 β JP, EN - 103 29845 comics2k [proj][data] 2018 β οΈ - - - - DCM772 [proj][data] 2018 β EN 1938-1954 27 772 Manga109 [proj][data] 2018 β JP 1970-2010 109 21142 BCBId [proj][data] 2022 β BN - 64 3327 COO [proj][data] 2022 β JP 1970-2010 109 10602 COMICS-Text+ [proj][data] 2022 β EN 1938-1954 3948 198657 PopManga [proj][data] 2024 β EN 1990-2020 25 1925 CoMix [proj][data] 2024 β EN, FR 1938-2023 100 3800 Re-Identification Fahad18 [proj] 2012 β - - - 586 Ho42 2013 β - - - 42 Manga109 [proj][data] 2018 β JP 1970-2010 109 21142 PopManga [proj][data] 2024 β EN 1990-2020 25 1925 CoMix [proj][data] 2024 β EN, FR 1938-2023 100 3800 Linking eBDtheque [proj][data] 2013 β EN, FR, JP 1905-2012 25 100 sun70 2013 β FR - 6 60 GCN [proj][data] 2017 β EN, JP 1978-2013 253 38000 Manga109 [proj][data] 2018 β JP 1970-2010 109 21142 PopManga [proj][data] 2024 β EN 1990-2020 25 1925 CoMix [proj][data] 2024 β EN, FR 1938-2023 100 3800 Segmentation Sequencity4k [proj] 2020 β οΈ EN, FR, JP - - 4479 Dialog Generation PopManga [proj][data] 2024 β EN 1990-2020 25 1925 CoMix [proj][data] 2024 β EN, FR 1938-2023 100 3800 Unknown VLRC [proj][data] 2023 β JP, FR, EN, 6+ 1940-present 376 7773
Venues
-
<details>
<summary>Journals</summary>
- TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence
- TIP: IEEE Transactions on Image Processing
- TOMM: IEEE Transactions on Multimedia
- TVCG: IEEE Transactions on Visualization and Computer Graphics
- TCSVT: IEEE Transactions on Circuits and Systems for Video Technology
- THMS: IEEE Transactions on Human-Machine Systems
- ACM-TG: Transactions on Graphics
- ACM-TAC: Transactions on Accessible Computing
- IJHCI: International Journal on Human-Computer Interaction
- IJI: The International Journal on the Image
- IJCG: The Visual Computer: International Journal of Computer Graphics
- IJDAR: International Journal on Document Analysis and Recognition
- MM-CCA: Transaction on Multimedia Computing, Communication and Applications
-
<details>
<summary>Conferences</summary>
- NeurIPS: Neural Information Processing Systems
- ICML: International Conference on Machine Learning
- CVPR: IEEE/CVF Conference on Computer Vision and Pattern Recognition
- ICCV: IEEE/CVF International Conference of Computer Vision
- ECCV: IEEE/CVF European Conference of Computer Vision
- WACV: IEEE/CVF Winter Conference on Applications of Computer Vision
- SciVis: IEEE Scientific Visualization Conference
- ICIP: IEEE International Conference on Image Processing
- VCIP: IEEE International Conference Visual Communication Image Process
- CSNT: IEEE International Conference on Communication Systems and Network Technologies
- CAIDCD: IEEE International Conference on Computer-Aided Industrial Design and Conceptual Design
- ACM: Association for Computing Machinery
- ICDAR: IAPR International Conference on Document Analysis and Recognition
- ICPR: International Conference on Pattern Recognition
- ICIR: International Conference on Intelligent Reality
- IIAI-AAI: International Congress on Advanced Applied Informatics
- MMM: Multimedia Modeling
- LREC: International Conference on Language Resources and Evaluation
- MTA: Multimedia Tools and Applications
- ICIT: International Conference on Information Technology
- ICIAP: International Conference on Image Analysis and Processing
- IJCNN: International Joint Conference on Neural Networks
- ACPR: IAPR Asian Conference on Pattern Recognition
- ICGC: IEEE International Conference on Granular Computing
- CAVW: Computer Animation and Virtual Worlds
- MM-TA: Multimedia Tools and Applications
- DICTA: International Conference on Digital Image Computing: Techniques and Applications
- UIST: ACM Symposium on User Interface Software and Technology
- EMNLP: ACM Conference on Empirical Methods in Natural Language Processing
- IIWAS: International Conference on Information Integration and Web-based Applications and Services
- MDPI-AS: MDPI Applied Science
- ICMLC: International Conference on Machine Learning and Cybernetics
- LACLO: Latin American Conference on Learning Technologies
- ACL: Association for Computational Linguistics
- ICCC: International Conference on Computational Creativity
- 3DV: International Conference on 3D Vision
-
<details>
<summary>Workshops</summary>
- MANPU: IAPR International Workshop on Comics Analysis, Processing and Understanding
- DAS: IAPR International Workshop on Document Analysis Systems
- IWAIT: International Workshop on Advanced Image Technology
- ISPACS: Symposium on Intelligent Signal Processing and Communication Systems
- SIGCSE: ACM Technical Symposium on Computer Science Education
- ISM: IEEE International Symposium in Multimedia
Links
π§ Tools & Repositories
- CoMix - Framework for managing and benchmarking comics datasets
- ImageTrans - Image translation for manga
- Manga Image Translator - Manga image translation
- Comic Translate - Comic translation
How to Contribute π
You can contribute in two ways:
- The easiest is to open an Issue (see an example in issue #1) and we can discuss if there are missing papers, wrong associations or links, or misspelled venues.
- The second one is making a pull request with the implemented changes, following the steps:
- Fork this repository and clone it locally.
- Create a new branch for your changes:
git checkout -b feature-name
. - Make your changes and commit them:
git commit -m 'Description of the changes'
. - Push to your fork:
git push origin feature-name
. - Open a pull request on the original repository by providing a description of your changes.
This project is in constant development, and we welcome contributions to include the latest research papers in the field or report issues π₯π₯.
Star History β
Acknowledge
Many thanks to my co-authors for taking the time to help me with the various refactoring of the survey. Thanks to Beppe Folder for its Awesome Human Visual Attention repo that inspired the β¨styleβ¨ of this repository.