Awesome

Detect-LAIM-generated-Multimedia-Survey

This repository contains a collection of resources and papers on Detecting Multimedia Generated by Large AI Models: A Survey

<figure> <img src="assets/timeline.png" alt="timeline"> <figcaption>A cat-and-mouse game between generating and detecting multimedia (text, image, video, audio, and multimodal) using LAIMs, showcasing only representative works. Q1 represents from Jan to Mar, Q2: Apr-Jun, Q3: Jul-Sep, Q4: Oct-Dec.</figcaption> </figure>

Please let us know if you find a mistake, or if we have missed your wonderful work by e-mail: lin1785@purdue.edu, hu968@purdue.edu, gupt1031@purdue.edu

If you find our survey useful for your research, please cite the following Paper

@article{lin2024detecting,
  title={Detecting Multimedia Generated by Large AI Models: A Survey},
  author={Lin, Li and Gupta, Neeraj and Zhang, Yue and Ren, Hainan and Liu, Chun-Hao and Ding, Feng and Wang, Xin and Li, Xin and Verdoliva, Luisa and Hu, Shu},
  journal={arXiv preprint arXiv:2402.00045},
  year={2024}
}

💻 Contents

📈 Related Work

- A Survey on Detection of LLMs-Generated Content Paper GitHub

- A Survey on LLM-generated Text Detection: Necessity, Methods, and Future Directions Paper GitHub

- Towards possibilities & impossibilities of ai-generated text detection: A survey Paper

- Machine-generated text: A comprehensive survey of threat models and detection methods Paper

- The Age of Synthetic Realities: Challenges and Opportunities Paper

- GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models Paper

Generation

<figure> <img src="assets/generation.png" alt="Generation Processes"> <figcaption>Illustrations of different types of multimedia generation process based on LAIMs.</figcaption> </figure>

Public Datasets for Detection

Please read the column I20(Input-to-Output) with these abbreviations:

T2T: Text-to-Text
V2T: Video-to-Text
T2I: Text-to-Image
I2I: Image-to-Image
T2A: Text-to-Audio
I.A2V: (Image conditioned with Audio)-to-Video

Modality	Dataset	Content	Link	I2O	#Real	#Generated	Source of Real Media	Generation Method	Year
Text	Student Essays	Essays	Link	T2T	1,000	6,000	IvyPanda	ChatGPT	2023
Text	Creative Writing	Essays	Link	T2T	1,000	6,000	Reddit WritingPrompts	ChatGPT	2023
Text	News Articles	Essays	Link	T2T	1,000	6,000	Reuters 50-50	ChatGPT	2023
Text	Paraphrase	Essays	Link	T2T	98,280	163,710	Arxiv, Wikipedia, Theses	GPT-3, T5	2022
Text	Authorship Attribution	Essays	Link	T2T	1,064	8,512	News Media	Various GPT, CTRL, GROVER, etc.	2020
Text	OUTFOX	Essays	Link	T2T	15,400	15,400	Feedback Prize	ChatGPT, GPT-3.5, T5	2023
Text	MULTITuDE	News	Link	T2T	7,992	66,089	MassiveSumm	GPT-3, GPT-4, ChatGPT	2023
Text	TuringBench	News	Link	T2T	8,854	159,758	News Media	Various GPT, CTRL, GROVER, etc.	2021
Text	GPT-3.5 unmixed	News	Link	T2T	5,454	5,454	News Media	GPT-3.5	2023
Text	GPT-3.5 mixed	News	Link	T2T	5,032	5,032	News Media	GPT-3.5	2023
Text	GPABenchmark	Writing	Link	T2T	150,000	450,000	Arxiv	GPT-3.5	2023
Text	HPPT	Abstracts	Link	T2T	6,050	6,050	ACL Anthology	ChatGPT	2023
Text	TweepFake	Tweets	Link	T2T	12,786	12,786	GitHub, Twitter	GPT-2, RNN, LSTM	2021
Text	SynSciPass	Passages	Link	T2T	99,989	10,485	Scientific papers	GPT-2, BLOOM	2022
Text	Deepfake-TextDetect	General	Link	T2T	154,078	294,381	Various sources including Reddit, ELI5, Yelp, etc.	Various including GPT, GLM, LLAIMA, T5, OPT	2022
Text	HC-Var	General	Link	T2T	90,096	45,000	Various including XSum, IMDb, Yelp, Reddit, etc.	ChatGPT	2023
Text	HC3	General	Link	T2T	26,903	58,546	Various including FiQA, Wiki, ELI5, etc.	ChatGPT	2023
Text	M4	General	Link	T2T	32,798	89,683	Various including Wikipedia, WikiHow, Arxiv, etc.	Various including ChatGPT, GTP-3.5, LLAIMA, T5, Dolly-v2, etc.	2023
Text	MixSet	General	Link	T2T	300	3,600	Enron Email, Steam Reviews, BBC News, ArXiv-10, TED Talk, Blog	LLaMA2, GPT-4	2024
Text	InternVid	Captions	Link	V2T	7,000,000	234,000,000	YouTube	ViCLIP	2023
Image	DFF	Face	Link	T2I/I2I	30,000	90,000	IMDB-WIKI	SDMs, InsightFace	2023
Image	DiFF	Face	Link	T2I/I2I	2,500	500,000	CelebA, Prompts	16 DMs	2024
Image	GANDiffFace	Face	Link	T/I2I	-	73293	FFHQ	StyleGAN3, DreamBooth	2023
Image	RealFaces	Face	Link	T2I	258	25,800	Prompts	SDMs	2023
Image	DCFace	Face	Link	I2I	-	1,200,000	FFHQ, CASIA-WebFace	DDPM	2023
Image	IDiff-Face	Face	Link	I2I	-	500,000	FFHQ	DDPM	2023
Image	OverheadImg	Overhead	Link	T2I/I2I	6,475	6,675	MapBox, Google Maps	GLIDE, DDPM	2023
Image	Synthbuster	General	Link	T2I	-	9,000	Raise-1k	DALL·E 2&3, Firefly, Midjourney, SDMs, SDMs	2023
Image	GenImage	General	Link	T2I/I2I	1,331,167	1,350,000	ImageNet	Various methods including SDMs, Midjourney, BigGAN, VQDM	2023
Image	CIFAKE	General	Link	T2I	60,000	60,000	CIFAR-10	SD-V1.4	2023
Image	AutoSplice	General	Link	T2I	2,273	3,621	Visual News	DALL·E-2	2023
Image	DiffusionDB	General	Link	T2I	3,300,000	16,000,000	DiscordChatExporter	SD	2023
Image	ArtiFact	General	Link	T2I/I2I	964,989	1,531,749	Various sources including AFHQ, CelebAHQ, COCO, etc.	Various methods including SDMs, VQDM, DDPM, LDM, etc.	2023
Image	HiFi-IFDL	General	Link	T2I/I2I	~600,000	1,300,000	Various sources including FFHQ, AFHQ, CelebAHQ, etc.	Various methods including DDPM, DDIM, GLIDE, LDM, etc.	2023
Image	DiffusionForensics	General	Link	T2I/I2I	232,000	232,000	LSUN, ImageNet	Various methods including LDM, DDPM, iDDPM, VQDM, ADM, PNDM	2023
Image	CocoGlide	General	Link	T2I	512	512	COCO	GLIDE	2023
Image	Western Blot	General	Link	I2I	~14,000	~24,000	Western Blot	DDPM, Pix2pix, CycleGAN	2022
Image	M3Dsynth	General	Link	I2I	1,018	8,577	LIDC-IDRI	DDPM, Pix2pix, CycleGAN	2023
Image	LSUNDB	General	Link	T2I/I2I	250,000	250,000	LSUN	Various methods including DDPM, PNDM, LDM, ADM, ProjectedGAN, StyleGAN, DiffusionGAN	2023
Image	UniversalFake	General	Link	T2I	8,000	8,000	LAION-400M	LDM, GLIDE	2023
Image	REGM	General	Link	T2I/I2I	-	116,000	CelebA, LSUN	116 publicly available GMs	2023
Image	DMimage	General	Link	T2I	200,000	200,000	COCO, LSUN	LDM	2022
Image	AIGCD	General	Link	T2I/I2I	360,000	508,500	Various sources including LSUN, ImageNet, CelebA, COCO, FFHQ	Various methods including SDMs, GANs, Midjourney, VQDM, ADM, DALL·E-2, GLIDE, WFIR, Wukong	2023
Image	DIF	General	Link	T2I/I2I	84,300	84,300	LAION-5B	Various methods including SDMs, DALL·E-2, Midjourney, GLIDE, GANs	2023
Image	Fake2M	General	Link	T2I/I2I	-	2,300,000	CC3M	SD-V1.5, IF, StyleGAN3	2023
Video	Diffused-head	Face	Link	I.A2V	-	820	CREMA	Diffused Heads: build on DDPM	2023
Audio	LibriSeVoc	Speech	Link	T2A	13,201	79,206	LibriTTS	Various methods including DiffWave, WaveNet, WaveRNN, Mel-GAN, WaveGrad	2023
Multi-modal	$DGM^4$	News	Link	T2T/I2I	77,426	152,574	Visual News	Various methods including B-GST, StyleCLIP, HFGI, InfoSwap, SimSwap	2023
Multi-modal	COCOFake	General	Link	T2T/T2I	113,287	566,435	COCO	SDMs	2023

:mag_right: Detection :fire:

:page_facing_up: Text

Pure Detection

<figure> <img src="assets/text_pure.png" alt="text_pure"> <figcaption style="text-align: center;">Illustrations of pure detection methodologies for LAIM-generated text.</figcaption> </figure>

♣️ Easy Explainable Methods

▶️ Watermarking

- Distillation-Resistant Watermarking for Model Protection in NLP Paper

- Three bricks to consolidate watermarks for large language models Paper GitHub

- Robust multi-bit natural language watermarking through invariant features Paper

- Undetectable Watermarks for Language Models Paper

- Robust distortion-free watermarks for language models Paper

- Provable robust watermarking for ai-generated text Paper GitHub

- A Private Watermark for Large Language Models Paper

▶️ Artifacts

- Unraveling the mystery of artifacts in machine generated text Paper

▶️ Stylometry/Coherence

- Stylometric detection of ai-generated text in twitter timelines Paper

- CoCo: Coherence-Enhanced Machine-Generated Text Detection Under Data Limitation With Contrastive Learning Paper

♣️ Hard Explainable Methods

▶️ Perplexity

- HowkGPT: Investigating the Detection of ChatGPT-generated University Student Homework through Context-Aware Perplexity Analysis Paper

- GPTZero Tool

▶️ Log Probabilities Curvature

- Detectgpt: Zero-shot machine-generated text detection using probability curvature Paper GitHub

▶️ Efficient Perturbations

- Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model Paper

- Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature Paper

- DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text Paper GitHub

▶️ Positive Unlabeled

- Multiscale Positive-Unlabeled Detection of AI-Generated Texts Paper GitHub

Beyond Detection

<figure> <img src="assets/text_beyond.png" alt="text_beyond"> <figcaption style="text-align: center;">Illustrations of beyond detection methodologies for LAIM-generated text. </figcaption> </figure>

♣️ Attribution

▶️ Deep-learning Based

- TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation Paper Turingbench

- Whodunit? Learning to Contrast for Authorship Attribution Paper

- Through the looking glass: Learning to attribute synthetic text generated by language models Paper

- TopRoBERTa: Topology-Aware Authorship Attribution of Deepfake Texts Paper

▶️ Stylometric/Statistical

- Authorship attribution for neural text generation Paper GitHub

- Gpt-who: An information density-based machine-generated text detector Paper

▶️ Perplexity

- LLMDet: A Third Party Large Language Models Generated Text Detection Tool Paper GitHub

▶️ Style Representation

- Few-Shot Detection of Machine-Generated Text using Style Representations Paper

▶️ Origin Tracing

- Origin Tracing and Detecting of LLMs Paper

♣️ Generalization

▶️ Structured Search

- Ghostbuster: Detecting Text Ghostwritten by Large Language Models Paper

▶️ Contrastive Learning

- Conda: Contrastive domain adaptation for ai-generated text detection Paper GitHub

♣️ Interpretability

▶️ N-gram Overlaps

- DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text Paper GitHub

▶️ P-values

- A Watermark for Large Language Models Paper GitHub

▶️ Shapley Additive Explanations

- Chatgpt or human? detect and explain. explaining decisions of machine learning model for detecting short chatgpt-generated text Paper

- Check Me If You Can: Detecting ChatGPT-Generated Academic Writing using CheckGPT Paper

▶️ Polish Ratio

- Is chatgpt involved in texts? measure the polish ratio to detect chatgpt-generated text Paper

♣️ Robustness

▶️ Adversarial Data Augmentation

- Is chatgpt involved in texts? measure the polish ratio to detect chatgpt-generated text Paper

- Red Teaming Language Model Detectors with Language Models Paper

- MGTBench: Benchmarking Machine-Generated Text Detection Paper GitHub

▶️ Adversarial Learning

- Radar: Robust ai-text detection via adversarial learning Paper Project Page

- Outfox: Llm-generated essay detection through in-context learning with adversarially generated examples Paper

▶️ Stylistic/Consistency

- J-guard: Journalism guided adversarially robust detection of ai-generated news Paper

- Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts Paper

♣️ Empirical Study

▶️ Generalization/Robustness

- ChatLog: Recording and Analyzing ChatGPT Across Time Paper GitHub

- On the Zero-Shot Generalization of Machine-Generated Text Detectors Paper

- On the Generalization of Training-based ChatGPT Detection Methods Paper

- Supervised Machine-Generated Text Detectors: Family and Scale Matters Paper GitHub

- Deepfake Text Detection in the Wild Paper GitHub

▶️ Human Evaluation

- How close is chatgpt to human experts? comparison corpus, evaluation, and detection Paper GitHub

- Can LLM-Generated Misinformation Be Detected? Paper GitHub

▶️ Attribution

- From Text to Source: Results in Detecting Large Language Model-Generated Content Paper

▶️ Paraphrase Detection

- How large language models are transforming machine-paraphrased plagiarism Paper

- Paraphrase Detection: Human vs. Machine Content Paper

▶️ Sample Complexity

- On the Possibilities of AI-Generated Text Detection Paper

📸 Image

Pure Detection

<figure> <img src="assets/image_pure.png" alt="image_pure"> <figcaption style="text-align: center;">Illustrations of pure detection methodologies for LAIM-generated image.</figcaption> </figure>

♣️ Physical/Physiological based Methods

- Qualitative Failures of Image Generation Models and Their Application in Detecting Deepfakes Paper

- Perspective (in) consistency of paint by text Paper

- Lighting (in) consistency of paint by text Paper

♣️ Diffuser Fingerprints based Methods

- Deep Image Fingerprint: Accurate And Low Budget Synthetic Image Detector Paper

- DIRE for Diffusion-Generated Image Detection Paper GitHub

- Exposing the Fake: Effective Diffusion-Generated Images Detection Paper

♣️ Spatial-based Methods

- Rich and Poor Texture Contrast: A Simple yet Effective Approach for AI-generated Image Detection Paper Project Page

- Unmasking The Artist: Discriminating Human-Drawn And AI-Generated Human Face Art Through Facial Feature Analysis Paper

- Detecting images generated by deep diffusion models using their local intrinsic dimensionality Paper

♣️ Frequency-based Methods

- Wavelet-packets for deepfake image analysis and detection Paper GitHub

- AUSOME: authenticating social media images using frequency analysis Paper

- AI-Generated Image Detection using a Cross-Attention Enhanced Dual-Stream Network Paper

- Synthbuster: Towards Detection of Diffusion Model Generated Images Paper

Beyond Detection

<figure> <img src="assets/image_beyond.png" alt="image_beyond"> <figcaption style="text-align: center;">Illustrations of beyond detection methodologies for LAIM-generated image.</figcaption> </figure>

♣️ Attribution and Model Parsing

▶️ Attribution

- Level up the deepfake detection: a method to effectively discriminate images generated by gan architectures and diffusion models Paper

▶️ Model Parsing

- Reverse engineering of generative models: Inferring model hyperparameters from generated images Paper

♣️ Generalization

- Online Detection of AI-Generated Images Paper

- Towards universal fake image detectors that generalize across generative models Paper GitHub

- Raising the Bar of AI-generated Image Detection with CLIP Paper

- Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection Paper

- Fingerprintnet: Synthesized fingerprints for generated image detection Paper

- Detecting Deepfakes Without Seeing Any Paper GitHub

- Improving Synthetically Generated Image Detection in Cross-Concept Settings Paper

- Diffusion Noise Feature: Accurate and Fast Generated Image Detection Paper

♣️ Interpretability

- Interpretable-through-prototypes deepfake detection for diffusion models Paper GitHub

♣️ Localization

▶️ Fully-supervised

- Hierarchical fine-grained image forgery detection and localization Paper GitHub

- Perceptual Artifacts Localization for Image Synthesis Tasks Paper GitHub

- TruFor: Leveraging all-round clues for trustworthy image forgery detection and localization Paper GitHub

▶️ Weakly-supervised

- Weakly-supervised deepfake localization in diffusion-generated images Paper

♣️ Robustness

▶️ Spatial-based

- GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection Paper

- Exposing fake images generated by text-to-image diffusion models Paper

- Local Statistics for Generative Image Detection Paper

▶️ Frequency-based

- D4: Detection of Adversarial Diffusion Deepfakes Using Disjoint Ensembles Paper

♣️ Empirical Study

- On the detection of synthetic images generated by diffusion models Paper GitHub

- Intriguing properties of synthetic images: from generative adversarial networks to diffusion models Paper

- Towards the detection of diffusion model deepfakes Paper

- Unveiling the Impact of Image Transformations on Deepfake Detection: An Experimental Analysis Paper

- On the use of Stable Diffusion for creating realistic faces: from generation to detection Paper

- Finding AI-Generated Faces in the Wild Paper

- Forensic analysis of synthetically generated western blot images Paper

- Beyond Human Forgeries: An Investigation into Detecting Diffusion-Generated Handwriting Paper

- Organic or Diffused: Can We Distinguish Human Art from AI-generated Images? Paper

🎞️ Video

Illustration of detection methodology in generalization task for LAIM-generated video.

Beyond Detection

♣️ Generalization

- Revisiting Generalizability in Deepfake Detection: Improving Metrics and Stabilizing Transfer Paper

🎵 Audio

Pure Detection

The artifacts introduced by DM-based neural vocoders (WaveGrad and DiffWave) to a voice signal. The differences in mel-spectrograms between real and generated ones are illustrated in the third and fifth columns.

♣️ Vocoder-based

- AI-Synthesized Voice Detection Using Neural Vocoder Artifacts Paper GitHub

🍯 Multimodal

Pure Detection

Illustrations of pure detection methodologies for LAIM-generated multimodal media.

♣️ Text-assisted

- Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images Paper

♣️ Text-image Inconsistency

- Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News Paper GitHub

- Exposing Text-Image Inconsistency Using Diffusion Models Paper

Beyond Detection

Illustrations of beyond detection methodologies for LAIM-generated multimodal media.

♣️ Attribution

- De-fake: Detection and attribution of fake images generated by text-to-image generation models Paper

♣️ Generalization

▶️ Prompt Tuning

- AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors Paper GitHub

▶️ Contrastive Learning

- Generalizable Synthetic Image Detection via Language-guided Contrastive Learning Paper GitHub

♣️ Interpretability

- Combating Misinformation in the Era of Generative AI Models Paper

♣️ Localization

▶️ Spatial-based

- Detecting and grounding multi-modal media manipulation Paper

- Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding Paper

▶️ Frequency-based

- Unified Frequency-Assisted Transformer Framework for Detecting and Grounding Multi-Modal Manipulation Paper

♣️ Empirical Study

- Detecting Images Generated by Diffusers Paper GitHub

- CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection Paper

- VERITE: a Robust benchmark for multimodal misinformation detection accounting for unimodal bias Paper GitHub

- Can ChatGPT Detect DeepFakes? A Study of Using Multimodal Large Language Models for Media Forensics Paper

Detection Tools

Modality	Tool	Company	Link
Text	AI Content Detector	Copyleaks	Link
Text	AI Content Detector, ChatGPT detector	ZeroGPT	Link
Text	AI Content Detector	Winston AI	Link
Text	AI Content Detector	Crossplag	Link
Text	Giant Language model Test Room	GLTR	Link
Text	The AI Detector	Content at Scale	Link
Text	AI Checker	Originality ai	Link
Text	Advanced AI Detector and Humanizer	Undetectable ai	Link
Text	AI Content Detector	Writer	Link
Text	AI Content Detector	Conch	Link
Text	Illuminarty Text	Illuminarty	Link
Text	AI-Generated Text Detector	Is it AI	Link
Text	AI Detector Efficacy Research Tool	Originality ai	Link
Image	AI or Not image	AI or Not	Link
Image	AI-Generated Image Detector	Is it AI	Link
Image	Illuminarty Image	Illuminarty	Link
Image	SynthID	Google	Link
Image	Advanced AI Image Detector	Content at Scale	Link
Image	AI Image Detector	Huggingface	Link
Audio	AI or Not audio	AI or Not	Link

Awesome

Detect-LAIM-generated-Multimedia-Survey

💻 Contents

📈 Related Work

Generation

Public Datasets for Detection

:mag_right: Detection :fire:

<p align="center">:page_facing_up: Text </p>

Pure Detection

♣️ Easy Explainable Methods

▶️ Watermarking

▶️ Artifacts

▶️ Stylometry/Coherence

♣️ Hard Explainable Methods

▶️ Perplexity

▶️ Log Probabilities Curvature

▶️ Efficient Perturbations

▶️ Positive Unlabeled

Beyond Detection

♣️ Attribution

▶️ Deep-learning Based

▶️ Stylometric/Statistical

▶️ Perplexity

▶️ Style Representation

▶️ Origin Tracing

♣️ Generalization

▶️ Structured Search

▶️ Contrastive Learning

♣️ Interpretability

▶️ N-gram Overlaps

▶️ P-values

▶️ Shapley Additive Explanations

▶️ Polish Ratio

♣️ Robustness

▶️ Adversarial Data Augmentation

▶️ Adversarial Learning

▶️ Stylistic/Consistency

♣️ Empirical Study

▶️ Generalization/Robustness

▶️ Human Evaluation

▶️ Attribution

▶️ Paraphrase Detection

▶️ Sample Complexity

<p align="center"> 📸 Image </p>

Pure Detection

♣️ Physical/Physiological based Methods

♣️ Diffuser Fingerprints based Methods

♣️ Spatial-based Methods

♣️ Frequency-based Methods

Beyond Detection

♣️ Attribution and Model Parsing

▶️ Attribution

▶️ Model Parsing

♣️ Generalization

♣️ Interpretability

♣️ Localization

▶️ Fully-supervised

▶️ Weakly-supervised

♣️ Robustness

▶️ Spatial-based

▶️ Frequency-based

♣️ Empirical Study

<p align="center">🎞️ Video</p>

Beyond Detection

♣️ Generalization

<p align="center">🎵 Audio</p>

Pure Detection

♣️ Vocoder-based

<p align="center">🍯 Multimodal</p>

Pure Detection

♣️ Text-assisted

♣️ Text-image Inconsistency

Beyond Detection

♣️ Attribution

♣️ Generalization

▶️ Prompt Tuning

▶️ Contrastive Learning

♣️ Interpretability

♣️ Localization

▶️ Spatial-based