Home

Awesome

IDForge: Identity-Driven Multimedia Forgery Detection via Reference Assistance

<a href='https://arxiv.org/abs/2401.11764'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a> <a href='https://request.idforge.cfd/ '><img src='https://img.shields.io/badge/Request-IDForge-blue '></a>

Dataset Overview

Recent advancements in "deepfake" techniques have paved the way for generating various media forgeries. In response to the potential hazards of these media forgeries, many researchers engage in exploring detection methods, increasing the demand for high-quality media forgery datasets. Despite this, existing datasets have certain limitations. Firstly, most datasets focus on manipulating visual modality and usually lack diversity, as only a few forgery approaches are considered. Secondly, the quality of media is often inadequate in clarity and naturalness. Meanwhile, the size of the dataset is also limited. Thirdly, it is commonly observed that real-world forgeries are motivated by identity, yet the identity information of the individuals portrayed in these forgeries within existing datasets remains under-explored. For detection, identity information could be an essential clue to boost performance. Moreover, official media concerning relevant identities on the Internet can serve as prior knowledge, aiding both the audience and forgery detectors in determining the true identity. Therefore, we propose an identity-driven multimedia forgery dataset, IDForge, which contains 249,138 video shots sourced from 324 wild videos of 54 celebrities collected from the Internet. The fake video shots involve 9 types of manipulation across visual, audio, and textual modalities. Additionally, IDForge provides extra 214,438 real video shots as a reference set for the 54 celebrities. Correspondingly, we propose the Reference-assisted Multimodal Forgery Detection Network (R-MFDN), aiming at the detection of deepfake videos. Through extensive experiments on the proposed dataset, we demonstrate the effectiveness of R-MFDN on the multimedia detection task.

image

🫱 Access IDForge

Video Generation Training and Other Fair Uses

For those interested in utilizing our dataset for video generation training or similar fair use purposes, you can download the pristine portion along with the reference set of IDForge v1 here:

pristine part (80,000 real videos featuring people speaking):

Reference set (214,438 real videos featuring people speaking):

Forgery Detection

If you’re working on forgery detection, our full dataset is available upon request. To access it, please submit a request through the following link: πŸ‘‰ https://request.idforge.cfd/ πŸ‘ˆ

To prevent the malicious use of forged videos, we carefully review all requests for access to the full dataset.

Request will be processed in 1-3 days.

If you encounter any issues or do not receive a response within the expected timeframe, please contact idforge_access@outlook.com πŸ™‹β€β™‚οΈ.

Types of forgery

The following table categorizes different types of forgery based on various manipulation techniques, with $1$ indicating the presence of a technique and $0$ indicating its absence.

Forgery Type in Provided DataFakeLip synchronizationStand-in UseFace-swappingVoice cloning (TorToiSe)Voice conversion (RVC)Audio shufflingLLM generationText shuffling
face_audiomismatch_textmismatch111100101
face_rvc_textmismatch111101001
face_tts111110000
face_tts_textgen111110010
lip_audiomismatch_textmismatch110000101
lip_rvc_textmismatch110001001
lip_tts_textgen110010010
pristine000000000
rvc_textmismatch100001001

Dataset Structure

We provide two versions of the IDForge dataset.

IDForge v1

There are two parts in IDForge v1:

Main Folder Structure (reference set is similar):

./main
|-- face_audiomismatch_textmismatch
|-- face_rvc_textmismatch
|-- face_tts
|-- face_tts_textgen
|-- lip_audiomismatch_textmismatch
|-- lip_rvc_textmismatch
|-- lip_tts_textgen
|-- pristine
|-- rvc_textmismatch
|-- tts_textgen
`-- tts_textmismatch ##### Forgery Type
    |-- id01
    |-- id02
    |-- ...
    `-- id53 ##### Person
        |-- id53_00
        |-- id53_04
        |-- id53_05
        `-- id53_08 ##### Different Source Video
            |-- id53_scene_0083_0_simswap.mp4
            |-- id53_scene_0090_0_infoswap.mp4
            |-- id53_scene_0090_0_roop.mp4
            |-- id53_scene_0090_1_simswap.mp4
            `-- id53_scene_0092_0_roop.mp4 ##### Video File

IDForge v2

There are 4 parts in IDForge v2:

Each part is compressed into a series of split zip files.

You need the run the following code to extract data from Split Zip Files:

(if needed) apt-get update && apt-get install p7zip-full

7z x FILENAME.zip

Train Folder Structure (others are similar):

.
|-- face_audiomismatch_textmismatch
|-- face_rvc_textmismatch
|-- face_tts
|-- face_tts_textgen
|-- lip_audiomismatch_textmismatch
|-- lip_rvc_textmismatch
|-- lip_tts_textgen
|-- pristine
|-- rvc_textmismatch
|-- tts_textgen
`-- tts_textmismatch ##### Forgery Type
    |-- id01
    |-- id02
    |-- ...
    `-- id53 ##### Person
        |-- id53_00
        |-- id53_04
        |-- id53_05
        `-- id53_08 ##### Different Source Video
            |-- id53_scene_0083_0_simswap.mp4
            |-- id53_scene_0090_0_infoswap.mp4
            |-- id53_scene_0090_0_roop.mp4
            |-- id53_scene_0090_1_simswap.mp4
            `-- id53_scene_0092_0_roop.mp4 ##### Video File
                |-- frames_ndarray.npy ##### Extracted Frame (ndarray format)
                |-- id53_scene_0092_0_roop.mp3  ##### Extracted Audio by FFmpeg
                `-- id53_scene_0092_0_roop.txt ##### Extracted Transcript by Whisper

TODO