Enhanced DeepFake Image Detection via Swin-B Transformer with Visual Attention Analysis
Abstract
Deepfakes, synthetic media created using advanced machine learning techniques, pose significant societal challenges by spreading misinformation and undermining trust in media. With the increasing sophistication of deepfake technologies, distinguishing between genuine and synthetic media has become increasingly difficult. This paper presents a robust deepfake image detection framework using the Swin-B Transformer, a pre-trained model fine-tuned for our application. By integrating a hybrid dataset that combines real images from the FFHQ dataset and synthetically generated fake images from a publicly available Kaggle dataset, we simulate real-world media scenarios. Our model achieves an impressive accuracy of 97.47% on the test set, demonstrating superior generalization to both real and synthetic visual data. Using Grad-CAM, we visualize the spatial segments of the image that the model focuses on during classification, providing insight into the decision-making process. This work contributes to enhancing content authenticity, controlling fake news, and ensuring digital trust and safety.
Downloads
Copyright (c) 2026 ITEGAM-JETIA

This work is licensed under a Creative Commons Attribution 4.0 International License.








