Enhanced DeepFake Image Detection via Swin-B Transformer with Visual Attention Analysis

Vineela Krishna Suri; Prasad GVSNRV

doi:10.5935/jetia.v12i58.3045

Vineela Krishna Suri Research Scholar, Department of CSE, JNTUK, Kakinada, Andhra Pradesh, India http://orcid.org/0000-0002-4563-1979
Prasad GVSNRV Professor & Director(PGCRD), Department of CSE, SRGEC, Gudlavalleru, Andhra Pradesh, India http://orcid.org/0000-0003-0679-7324

DOI: https://doi.org/10.5935/jetia.v12i58.3045

Abstract

Deepfakes, synthetic media created using advanced machine learning techniques, pose significant societal challenges by spreading misinformation and undermining trust in media. With the increasing sophistication of deepfake technologies, distinguishing between genuine and synthetic media has become increasingly difficult. This paper presents a robust deepfake image detection framework using the Swin-B Transformer, a pre-trained model fine-tuned for our application. By integrating a hybrid dataset that combines real images from the FFHQ dataset and synthetically generated fake images from a publicly available Kaggle dataset, we simulate real-world media scenarios. Our model achieves an impressive accuracy of 97.47% on the test set, demonstrating superior generalization to both real and synthetic visual data. Using Grad-CAM, we visualize the spatial segments of the image that the model focuses on during classification, providing insight into the decision-making process. This work contributes to enhancing content authenticity, controlling fake news, and ensuring digital trust and safety.

Downloads

Download data is not yet available.

JETIA Journal Data
Available:	2015 - 2026
Volumes:	12
Issues:	58
Articles:	1.110
Article Processing Charges (APC):	PAID