Awesome
Detect-LAIM-generated-Multimedia-Survey
This repository contains a collection of resources and papers on Detecting Multimedia Generated by Large AI Models: A Survey
<figure> <img src="assets/timeline.png" alt="timeline"> <figcaption>A cat-and-mouse game between generating and detecting multimedia (<span style="color:textcolor;background-color: #97D077; padding: 2px 4px;">text</span>, <span style="color: imagecolor; background-color: #FF9999; padding: 2px 4px;">image</span>, <span style="color: videocolor; background-color: #FF8000; padding: 2px 4px;">video</span>, <span style="color: audiocolor; background-color: #CDA2BE; padding: 2px 4px;">audio</span>, and <span style="color: mmcolor; background-color: #FFCE9F; padding: 2px 4px;">multimodal</span>) using LAIMs, showcasing only representative works. Q1 represents from Jan to Mar, Q2: Apr-Jun, Q3: Jul-Sep, Q4: Oct-Dec.</figcaption> </figure>Please let us know if you find a mistake, or if we have missed your wonderful work by e-mail: lin1785@purdue.edu, hu968@purdue.edu, gupt1031@purdue.edu
If you find our survey useful for your research, please cite the following Paper
@article{lin2024detecting,
title={Detecting Multimedia Generated by Large AI Models: A Survey},
author={Lin, Li and Gupta, Neeraj and Zhang, Yue and Ren, Hainan and Liu, Chun-Hao and Ding, Feng and Wang, Xin and Li, Xin and Verdoliva, Luisa and Hu, Shu},
journal={arXiv preprint arXiv:2402.00045},
year={2024}
}
💻 Contents
- Related Works
- Generation
- Detection
- Detection Tools
📈 Related Work
- A Survey on Detection of LLMs-Generated Content Paper GitHub
- A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions Paper GitHub
- Towards possibilities & impossibilities of ai-generated text detection: A survey Paper
- Machine-generated text: A comprehensive survey of threat models and detection methods Paper
- The Age of Synthetic Realities: Challenges and Opportunities Paper
- GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models Paper
Generation
<figure> <img src="assets/generation.png" alt="Generation Processes"> <figcaption>Illustrations of different types of multimedia generation process based on LAIMs.</figcaption> </figure>Public Datasets for Detection
Please read the column I20(Input-to-Output) with these abbreviations:
- T2T: Text-to-Text
- V2T: Video-to-Text
- T2I: Text-to-Image
- I2I: Image-to-Image
- T2A: Text-to-Audio
- I.A2V: (Image conditioned with Audio)-to-Video
Modality | Dataset | Content | Link | I2O | #Real | #Generated | Source of Real Media | Generation Method | Year |
---|---|---|---|---|---|---|---|---|---|
Text | Student Essays | Essays | Link | T2T | 1,000 | 6,000 | IvyPanda | ChatGPT | 2023 |
Text | Creative Writing | Essays | Link | T2T | 1,000 | 6,000 | Reddit WritingPrompts | ChatGPT | 2023 |
Text | News Articles | Essays | Link | T2T | 1,000 | 6,000 | Reuters 50-50 | ChatGPT | 2023 |
Text | Paraphrase | Essays | Link | T2T | 98,280 | 163,710 | Arxiv, Wikipedia, Theses | GPT-3, T5 | 2022 |
Text | Authorship Attribution | Essays | Link | T2T | 1,064 | 8,512 | News Media | Various GPT, CTRL, GROVER, etc. | 2020 |
Text | OUTFOX | Essays | Link | T2T | 15,400 | 15,400 | Feedback Prize | ChatGPT, GPT-3.5, T5 | 2023 |
Text | MULTITuDE | News | Link | T2T | 7,992 | 66,089 | MassiveSumm | GPT-3, GPT-4, ChatGPT | 2023 |
Text | TuringBench | News | Link | T2T | 8,854 | 159,758 | News Media | Various GPT, CTRL, GROVER, etc. | 2021 |
Text | GPT-3.5 unmixed | News | Link | T2T | 5,454 | 5,454 | News Media | GPT-3.5 | 2023 |
Text | GPT-3.5 mixed | News | Link | T2T | 5,032 | 5,032 | News Media | GPT-3.5 | 2023 |
Text | GPABenchmark | Writing | Link | T2T | 150,000 | 450,000 | Arxiv | GPT-3.5 | 2023 |
Text | HPPT | Abstracts | Link | T2T | 6,050 | 6,050 | ACL Anthology | ChatGPT | 2023 |
Text | TweepFake | Tweets | Link | T2T | 12,786 | 12,786 | GitHub, Twitter | GPT-2, RNN, LSTM | 2021 |
Text | SynSciPass | Passages | Link | T2T | 99,989 | 10,485 | Scientific papers | GPT-2, BLOOM | 2022 |
Text | Deepfake-TextDetect | General | Link | T2T | 154,078 | 294,381 | Various sources including Reddit, ELI5, Yelp, etc. | Various including GPT, GLM, LLAIMA, T5, OPT | 2022 |
Text | HC-Var | General | Link | T2T | 90,096 | 45,000 | Various including XSum, IMDb, Yelp, Reddit, etc. | ChatGPT | 2023 |
Text | HC3 | General | Link | T2T | 26,903 | 58,546 | Various including FiQA, Wiki, ELI5, etc. | ChatGPT | 2023 |
Text | M4 | General | Link | T2T | 32,798 | 89,683 | Various including Wikipedia, WikiHow, Arxiv, etc. | Various including ChatGPT, GTP-3.5, LLAIMA, T5, Dolly-v2, etc. | 2023 |
Text | MixSet | General | Link | T2T | 300 | 3,600 | Enron Email, Steam Reviews, BBC News, ArXiv-10, TED Talk, Blog | LLaMA2, GPT-4 | 2024 |
Text | InternVid | Captions | Link | V2T | 7,000,000 | 234,000,000 | YouTube | ViCLIP | 2023 |
Image | DFF | Face | Link | T2I/I2I | 30,000 | 90,000 | IMDB-WIKI | SDMs, InsightFace | 2023 |
Image | DiFF | Face | Link | T2I/I2I | 2,500 | 500,000 | CelebA, Prompts | 16 DMs | 2024 |
Image | GANDiffFace | Face | Link | T/I2I | - | 73293 | FFHQ | StyleGAN3, DreamBooth | 2023 |
Image | RealFaces | Face | Link | T2I | 258 | 25,800 | Prompts | SDMs | 2023 |
Image | DCFace | Face | Link | I2I | - | 1,200,000 | FFHQ, CASIA-WebFace | DDPM | 2023 |
Image | IDiff-Face | Face | Link | I2I | - | 500,000 | FFHQ | DDPM | 2023 |
Image | OverheadImg | Overhead | Link | T2I/I2I | 6,475 | 6,675 | MapBox, Google Maps | GLIDE, DDPM | 2023 |
Image | Synthbuster | General | Link | T2I | - | 9,000 | Raise-1k | DALL·E 2&3, Firefly, Midjourney, SDMs, SDMs | 2023 |
Image | GenImage | General | Link | T2I/I2I | 1,331,167 | 1,350,000 | ImageNet | Various methods including SDMs, Midjourney, BigGAN, VQDM | 2023 |
Image | CIFAKE | General | Link | T2I | 60,000 | 60,000 | CIFAR-10 | SD-V1.4 | 2023 |
Image | AutoSplice | General | Link | T2I | 2,273 | 3,621 | Visual News | DALL·E-2 | 2023 |
Image | DiffusionDB | General | Link | T2I | 3,300,000 | 16,000,000 | DiscordChatExporter | SD | 2023 |
Image | ArtiFact | General | Link | T2I/I2I | 964,989 | 1,531,749 | Various sources including AFHQ, CelebAHQ, COCO, etc. | Various methods including SDMs, VQDM, DDPM, LDM, etc. | 2023 |
Image | HiFi-IFDL | General | Link | T2I/I2I | ~600,000 | 1,300,000 | Various sources including FFHQ, AFHQ, CelebAHQ, etc. | Various methods including DDPM, DDIM, GLIDE, LDM, etc. | 2023 |
Image | DiffusionForensics | General | Link | T2I/I2I | 232,000 | 232,000 | LSUN, ImageNet | Various methods including LDM, DDPM, iDDPM, VQDM, ADM, PNDM | 2023 |
Image | CocoGlide | General | Link | T2I | 512 | 512 | COCO | GLIDE | 2023 |
Image | Western Blot | General | Link | I2I | ~14,000 | ~24,000 | Western Blot | DDPM, Pix2pix, CycleGAN | 2022 |
Image | M3Dsynth | General | Link | I2I | 1,018 | 8,577 | LIDC-IDRI | DDPM, Pix2pix, CycleGAN | 2023 |
Image | LSUNDB | General | Link | T2I/I2I | 250,000 | 250,000 | LSUN | Various methods including DDPM, PNDM, LDM, ADM, ProjectedGAN, StyleGAN, DiffusionGAN | 2023 |
Image | UniversalFake | General | Link | T2I | 8,000 | 8,000 | LAION-400M | LDM, GLIDE | 2023 |
Image | REGM | General | Link | T2I/I2I | - | 116,000 | CelebA, LSUN | 116 publicly available GMs | 2023 |
Image | DMimage | General | Link | T2I | 200,000 | 200,000 | COCO, LSUN | LDM | 2022 |
Image | AIGCD | General | Link | T2I/I2I | 360,000 | 508,500 | Various sources including LSUN, ImageNet, CelebA, COCO, FFHQ | Various methods including SDMs, GANs, Midjourney, VQDM, ADM, DALL·E-2, GLIDE, WFIR, Wukong | 2023 |
Image | DIF | General | Link | T2I/I2I | 84,300 | 84,300 | LAION-5B | Various methods including SDMs, DALL·E-2, Midjourney, GLIDE, GANs | 2023 |
Image | Fake2M | General | Link | T2I/I2I | - | 2,300,000 | CC3M | SD-V1.5, IF, StyleGAN3 | 2023 |
Video | Diffused-head | Face | Link | I.A2V | - | 820 | CREMA | Diffused Heads: build on DDPM | 2023 |
Audio | LibriSeVoc | Speech | Link | T2A | 13,201 | 79,206 | LibriTTS | Various methods including DiffWave, WaveNet, WaveRNN, Mel-GAN, WaveGrad | 2023 |
Multi-modal | $DGM^4$ | News | Link | T2T/I2I | 77,426 | 152,574 | Visual News | Various methods including B-GST, StyleCLIP, HFGI, InfoSwap, SimSwap | 2023 |
Multi-modal | COCOFake | General | Link | T2T/T2I | 113,287 | 566,435 | COCO | SDMs | 2023 |
:mag_right: Detection :fire:
<p align="center">:page_facing_up: Text </p>
Pure Detection
<figure> <img src="assets/text_pure.png" alt="text_pure"> <figcaption style="text-align: center;">Illustrations of pure detection methodologies for LAIM-generated text.</figcaption> </figure>♣️ Easy Explainable Methods
▶️ Watermarking
- Distillation-Resistant Watermarking for Model Protection in NLP Paper
- Three bricks to consolidate watermarks for large language models Paper GitHub
- Robust multi-bit natural language watermarking through invariant features Paper
- Undetectable Watermarks for Language Models Paper
- Robust distortion-free watermarks for language models Paper
- Provable robust watermarking for ai-generated text Paper GitHub
- A Private Watermark for Large Language Models Paper
▶️ Artifacts
- Unraveling the mystery of artifacts in machine generated text Paper
▶️ Stylometry/Coherence
- Stylometric detection of ai-generated text in twitter timelines Paper
- CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning Paper
♣️ Hard Explainable Methods
▶️ Perplexity
- HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis Paper
- GPTZero Tool
▶️ Log Probabilities Curvature
- Detectgpt: Zero-shot machine-generated text detection using probability curvature Paper GitHub
▶️ Efficient Perturbations
- Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model Paper
- Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature Paper
- DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text Paper GitHub
▶️ Positive Unlabeled
- Multiscale Positive-Unlabeled Detection of AI-Generated Texts Paper GitHub
Beyond Detection
<figure> <img src="assets/text_beyond.png" alt="text_beyond"> <figcaption style="text-align: center;">Illustrations of beyond detection methodologies for LAIM-generated text. </figcaption> </figure>♣️ Attribution
▶️ Deep-learning Based
- TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation Paper Turingbench
- Whodunit? Learning to Contrast for Authorship Attribution Paper
- Through the looking glass: Learning to attribute synthetic text generated by language models Paper
- TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts Paper
▶️ Stylometric/Statistical
- Authorship attribution for neural text generation Paper GitHub
- Gpt-who: An information density-based machine-generated text detector Paper
▶️ Perplexity
- LLMDet: A Third Party Large Language Models Generated Text Detection Tool Paper GitHub
▶️ Style Representation
- Few-Shot Detection of Machine-Generated Text using Style Representations Paper
▶️ Origin Tracing
- Origin Tracing and Detecting of LLMs Paper
♣️ Generalization
▶️ Structured Search
- Ghostbuster: Detecting Text Ghostwritten by Large Language Models Paper
▶️ Contrastive Learning
- Conda: Contrastive domain adaptation for ai-generated text detection Paper GitHub
♣️ Interpretability
▶️ N-gram Overlaps
- DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text Paper GitHub
▶️ P-values
- A Watermark for Large Language Models Paper GitHub
▶️ Shapley Additive Explanations
- Chatgpt or human? detect and explain. explaining decisions of machine learning model for detecting short chatgpt-generated text Paper
- Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT Paper
▶️ Polish Ratio
- Is chatgpt involved in texts? measure the polish ratio to detect chatgpt-generated text Paper
♣️ Robustness
▶️ Adversarial Data Augmentation
- Is chatgpt involved in texts? measure the polish ratio to detect chatgpt-generated text Paper
- Red Teaming Language Model Detectors with Language Models Paper
- MGTBench: Benchmarking Machine-Generated Text Detection Paper GitHub
▶️ Adversarial Learning
- Radar: Robust ai-text detection via adversarial learning Paper Project Page
- Outfox: Llm-generated essay detection through in-context learning with adversarially generated examples Paper
▶️ Stylistic/Consistency
- J-guard: Journalism guided adversarially robust detection of ai-generated news Paper
- Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts Paper
♣️ Empirical Study
▶️ Generalization/Robustness
- ChatLog: Recording and Analyzing ChatGPT Across Time Paper GitHub
- On the Zero-Shot Generalization of Machine-Generated Text Detectors Paper
- On the Generalization of Training-based ChatGPT Detection Methods Paper
- Supervised Machine-Generated Text Detectors: Family and Scale Matters Paper GitHub
- Deepfake Text Detection in the Wild Paper GitHub
▶️ Human Evaluation
- How close is chatgpt to human experts? comparison corpus, evaluation, and detection Paper GitHub
- Can LLM-Generated Misinformation Be Detected? Paper GitHub
▶️ Attribution
- From Text to Source: Results in Detecting Large Language Model-Generated Content Paper
▶️ Paraphrase Detection
- How large language models are transforming machine-paraphrased plagiarism Paper
- Paraphrase Detection: Human vs. Machine Content Paper
▶️ Sample Complexity
- On the Possibilities of AI-Generated Text Detection Paper
<p align="center"> 📸 Image </p>
Pure Detection
<figure> <img src="assets/image_pure.png" alt="image_pure"> <figcaption style="text-align: center;">Illustrations of pure detection methodologies for LAIM-generated image.</figcaption> </figure>♣️ Physical/Physiological based Methods
- Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes Paper
- Perspective (in) consistency of paint by text Paper
- Lighting (in) consistency of paint by text Paper
♣️ Diffuser Fingerprints based Methods
- Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector Paper
- DIRE for Diffusion-Generated Image Detection Paper GitHub
- Exposing the Fake: Effective Diffusion-Generated Images Detection Paper
♣️ Spatial-based Methods
- Rich and Poor Texture Contrast: A Simple yet Effective Approach for AI-generated Image Detection Paper Project Page
- Unmasking The Artist: Discriminating Human-Drawn And AI-Generated Human Face Art Through Facial Feature Analysis Paper
- Detecting images generated by deep diffusion models using their local intrinsic dimensionality Paper
♣️ Frequency-based Methods
- Wavelet-packets for deepfake image analysis and detection Paper GitHub
- AUSOME: authenticating social media images using frequency analysis Paper
- AI-Generated Image Detection using a Cross-Attention Enhanced Dual-Stream Network Paper
- Synthbuster: Towards Detection of Diffusion Model Generated Images Paper
Beyond Detection
<figure> <img src="assets/image_beyond.png" alt="image_beyond"> <figcaption style="text-align: center;">Illustrations of beyond detection methodologies for LAIM-generated image.</figcaption> </figure>♣️ Attribution and Model Parsing
▶️ Attribution
- Level up the deepfake detection: a method to effectively discriminate images generated by gan architectures and diffusion models Paper
▶️ Model Parsing
- Reverse engineering of generative models: Inferring model hyperparameters from generated images Paper
♣️ Generalization
- Online Detection of AI-Generated Images Paper
- Towards universal fake image detectors that generalize across generative models Paper GitHub
- Raising the Bar of AI-generated Image Detection with CLIP Paper
- Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection Paper
- Fingerprintnet: Synthesized fingerprints for generated image detection Paper
- Detecting Deepfakes Without Seeing Any Paper GitHub
- Improving Synthetically Generated Image Detection in Cross-Concept Settings Paper
- Diffusion Noise Feature: Accurate and Fast Generated Image Detection Paper
♣️ Interpretability
- Interpretable-through-prototypes deepfake detection for diffusion models Paper GitHub
♣️ Localization
▶️ Fully-supervised
- Hierarchical fine-grained image forgery detection and localization Paper GitHub
- Perceptual Artifacts Localization for Image Synthesis Tasks Paper GitHub
- TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization Paper GitHub
▶️ Weakly-supervised
- Weakly-supervised deepfake localization in diffusion-generated images Paper
♣️ Robustness
▶️ Spatial-based
- GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection Paper
- Exposing fake images generated by text-to-image diffusion models Paper
- Local Statistics for Generative Image Detection Paper
▶️ Frequency-based
- D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles Paper
♣️ Empirical Study
- On the detection of synthetic images generated by diffusion models Paper GitHub
- Intriguing properties of synthetic images: from generative adversarial networks to diffusion models Paper
- Towards the detection of diffusion model deepfakes Paper
- Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis Paper
- On the use of Stable Diffusion for creating realistic faces: from generation to detection Paper
- Finding AI-Generated Faces in the Wild Paper
- Forensic analysis of synthetically generated western blot images Paper
- Beyond Human Forgeries: An Investigation into Detecting Diffusion-Generated Handwriting Paper
- Organic or Diffused: Can We Distinguish Human Art from AI-generated Images? Paper
<p align="center">🎞️ Video</p>
<p align="center"> <img src="assets/video_detection.png" alt="Video Detection" width="400">
<span>Illustration of detection methodology in generalization task for LAIM-generated video. </span>
</p>Beyond Detection
♣️ Generalization
- Revisiting Generalizability in Deepfake Detection: Improving Metrics and Stabilizing Transfer Paper
<p align="center">🎵 Audio</p>
Pure Detection
<p align="center"> <img src="assets/audio.png" alt="Audio Detection"><span>The artifacts introduced by DM-based neural vocoders (WaveGrad and DiffWave) to a voice signal. The differences in mel-spectrograms between real and generated ones are illustrated in the third and fifth columns.</span>
</p>♣️ Vocoder-based
- AI-Synthesized Voice Detection Using Neural Vocoder Artifacts Paper GitHub
<p align="center">🍯 Multimodal</p>
Pure Detection
<p align="center"> <img src="assets/multimodal_pure.png" alt="Multimodal Detection" ><span>Illustrations of pure detection methodologies for LAIM-generated multimodal media.</span>
</p>♣️ Text-assisted
- Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images Paper
♣️ Text-image Inconsistency
- Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News Paper GitHub
- Exposing Text-Image Inconsistency Using Diffusion Models Paper
Beyond Detection
<p align="center"> <img src="assets/multimodal_beyond.png" alt="Multimodal Detection"><span>Illustrations of beyond detection methodologies for LAIM-generated multimodal media.</span>
</p>♣️ Attribution
- De-fake: Detection and attribution of fake images generated by text-to-image generation models Paper
♣️ Generalization
▶️ Prompt Tuning
- AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors Paper GitHub
▶️ Contrastive Learning
- Generalizable Synthetic Image Detection via Language-guided Contrastive Learning Paper GitHub
♣️ Interpretability
- Combating Misinformation in the Era of Generative AI Models Paper
♣️ Localization
▶️ Spatial-based
- Detecting and grounding multi-modal media manipulation Paper
- Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding Paper
▶️ Frequency-based
- Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation Paper
♣️ Empirical Study
- Detecting Images Generated by Diffusers Paper GitHub
- CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection Paper
- VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias Paper GitHub
- Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics Paper
Detection Tools
Modality | Tool | Company | Link |
---|---|---|---|
Text | AI Content Detector | Copyleaks | Link |
Text | AI Content Detector, ChatGPT detector | ZeroGPT | Link |
Text | AI Content Detector | Winston AI | Link |
Text | AI Content Detector | Crossplag | Link |
Text | Giant Language model Test Room | GLTR | Link |
Text | The AI Detector | Content at Scale | Link |
Text | AI Checker | Originality ai | Link |
Text | Advanced AI Detector and Humanizer | Undetectable ai | Link |
Text | AI Content Detector | Writer | Link |
Text | AI Content Detector | Conch | Link |
Text | Illuminarty Text | Illuminarty | Link |
Text | AI-Generated Text Detector | Is it AI | Link |
Text | AI Detector Efficacy Research Tool | Originality ai | Link |
Image | AI or Not image | AI or Not | Link |
Image | AI-Generated Image Detector | Is it AI | Link |
Image | Illuminarty Image | Illuminarty | Link |
Image | SynthID | Link | |
Image | Advanced AI Image Detector | Content at Scale | Link |
Image | AI Image Detector | Huggingface | Link |
Audio | AI or Not audio | AI or Not | Link |