FakeShield

Abstract

The rapid development of generative AI is a double-edged sword, facilitating content creation while making image manipulation easier and more difficult to detect. Current image forgery detection and localization (IFDL) methods are generally effective, but they face two main challenges: (1) black-box nature with unknown detection principles, (2) limited generalization across diverse tampering methods (e.g., Photoshop, DeepFake, AIGC-Editing). To address these issues, we propose the explainable IFDL task and design FakeShield, a multi-modal framework capable of evaluating image authenticity, generating tampered region masks, and providing a judgment basis based on pixel-level and image-level tampering clues. Additionally, we leverage GPT-4o to enhance existing IFDL datasets, creating the Multi-Modal Tamper Description dataSet (MMTD-Set) for training FakeShield's tampering analysis capabilities. We incorporate a Domain Tag-guided Explainable Forgery Detection Module (DTE-FDM) and a Multi-modal Forgery Localization Module (MFLM) to address various types of tamper detection interpretation and achieve forgery localization guided by detailed textual descriptions. Extensive experiments demonstrate that FakeShield effectively detects and localizes various tampering techniques, offering an explainable and superior solution compared to previous IFDL methods.

MMTD-Set: Multi-Modal Tamper Description dataSet

Figure 2: Illustration of the construction process of our MMTD-Set.

We categorize the tampered images into three data domains: PhotoShop, DeepFake, and AIGC-Editing according to the tampering method. Based on the existing IFDL dataset, we utilize GPT-4o to generate the analysis and description of tampered images, and construct the "image-mask-description" ternary to support the multimodal training of the model. In addition, we design specific description hints for different tampering types to guide GPT to focus on different pixel artifacts and semantic errors.

Figure 3: MMTD-Set data samples.

FakeShield: Explainable Image Forgery Detection and Localization Framework

Figure 4: The pipeline of FakeShield.

The framework includes two key parts: Domain Tag-guided Explainable Forgery Detection Module (DTE-FDM) and Multi-modal Forgery Localization Module (MFLM).

Domain Tag-guided Explainable Forgery Detection Module. DTE-FDM is responsible for image forgery detection and analysis of detection results, using data domain tag to bridge the data domain conflict between different types of forgery data, and guiding the multimodal large language model to generate detection results and decision basis.
Multi-modal Forgery Localization Module. The MFLM then uses the description of the tampered region output by the DTE-FDM as the Prompt of the visual segmentation model to guide it to pinpoint the tampered region.

Performance

Comparison of detection performance with advanced IFDL methods

Table 1: Detection performance comparison between our FakeShield and other competitive methods.

We demonstrate FakeShield's superior detection accuracy and F1 scores over other methods on datasets including Photoshop, DeepFake, and AIGC-Editing. By leveraging a domain-tag guidance strategy, FakeShield effectively handles diverse tampering types and enhances cross-domain generalization. For example, it outperforms the next best method on the IMD2020 dataset with an ACC gain of 0.08 and an F1 improvement of 0.05. This domain-tagging approach allows us to resolve data conflicts across tampering types, achieving high detection performance on traditional IFDL benchmarks as well as more recent AIGC and DeepFake cases.

Comparison of explaining performance with advanced MLLMs methods

Table 2: Comparative results of the pre-trained M-LLMs and FakeShield in tampering explanation capabilities on the MMTD-Set.

We evaluate FakeShield's explanation capabilities by comparing its tampered area descriptions to those generated by pre-trained M-LLMs on datasets like Photoshop, DeepFake, and AIGC-Editing. Using cosine semantic similarity (CSS) as a metric, FakeShield consistently achieves the highest scores, reflecting its ability to generate accurate, detailed explanations of tampered regions. For instance, on the DSO dataset, FakeShield achieves a CSS of 0.8873, significantly outperforming the next best model, which scores 0.6484. This illustrates FakeShield's effectiveness in producing interpretable descriptions aligned closely with ground truth, even in complex tampering scenarios.

Figure 5: The response of mainstream pre-trained M-LLM and FakeShield to tampered pictures.

Comparison of localization performance with advanced IFDL methods

Table 3: Comparative results of tamper localization capabilities between competing IFDL methods and FakeShield.

We assess FakeShield's ability to accurately localize tampered regions by comparing it with other competitive IFDL methods on datasets such as Photoshop and AIGC-Editing. FakeShield consistently achieves the highest IoU and F1 scores across most test sets. For example, on the IMD2020 dataset, FakeShield surpasses the next best method, OSN, with an IoU improvement of 0.12 and an F1 increase of 0.1. Visual comparisons also show that FakeShield produces cleaner and more precise segmentations of tampered areas, accurately capturing boundaries where other methods, like PSCC-Net, tend to produce blurred and overly broad predictions.

Figure 6: Testing results on MMTD-Set for FakeShield are compared with PSCC-Net, OSN, CAT-Net, and MVSS-Net.

Dialogue Examples of FakeShield

Figure 7: Dialogue examples of FakeShield

BibTeX

@inproceedings{xu2024fakeshield,
        title={FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models},
        author={Xu, Zhipei and Zhang, Xuanyu and Li, Runyi and Tang, Zecheng and Huang, Qing and Zhang, Jian},
        booktitle={International Conference on Learning Representations},
        year={2025}
}

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

🔥 ICLR 2025