Enhanced DeepFake Image Detection via Swin-B Transformer with Visual Attention Analysis

Abstract

Deepfakes, synthetic media created using advanced machine learning techniques, pose significant societal challenges by spreading misinformation and undermining trust in media. With the increasing sophistication of deepfake technologies, distinguishing between genuine and synthetic media has become increasingly difficult. This paper presents a robust deepfake image detection framework using the Swin-B Transformer, a pre-trained model fine-tuned for our application. By integrating a hybrid dataset that combines real images from the FFHQ dataset and synthetically generated fake images from a publicly available Kaggle dataset, we simulate real-world media scenarios. Our model achieves an impressive accuracy of 97.47% on the test set, demonstrating superior generalization to both real and synthetic visual data. Using Grad-CAM, we visualize the spatial segments of the image that the model focuses on during classification, providing insight into the decision-making process. This work contributes to enhancing content authenticity, controlling fake news, and ensuring digital trust and safety.

Downloads

Download data is not yet available.
Published
2026-03-24
How to Cite
Suri, V., & GVSNRV, P. (2026). Enhanced DeepFake Image Detection via Swin-B Transformer with Visual Attention Analysis. ITEGAM-JETIA, 12(58), 304-314. https://doi.org/10.5935/jetia.v12i58.3045
Section
Articles