Skip to main content
Kent Academic Repository

Real-time, high-fidelity face identity swapping with a vision foundation model

Yu, Jongmin, Oh, Hyeontaek, Sun, Zhongtian, Lee, Younkwan, Yang, Jinhong (2025) Real-time, high-fidelity face identity swapping with a vision foundation model. IEEE Access, 13 . pp. 157160-157174. E-ISSN 2169-3536. (doi:10.1109/ACCESS.2025.3606518) (KAR id:111837)

Abstract

Many recent face-swapping methods based on generative adversarial networks (GANs) or autoencoders achieve strong performance under constrained conditions but degrade significantly in high-resolution or extreme pose scenarios. Moreover, most existing models generate outputs at limited resolutions ( 128×128 ), which fall short of modern visual standards. Diffusion-based approaches have shown promise in handling such challenges, but are computationally intensive and unsuitable for real-time applications. In this work, we propose FaceChanger, a real-time face identity swap framework designed to enhance robustness across various poses and outputs at 256×256 (double the linear resolution of typical 128×128 baselines). While maintaining compatibility with conventional GAN- and autoencoder-based pipelines, FaceChanger uniquely incorporates a vision foundation model (VFM) to extract richer semantic features, which can enhance identity preservation, attribute control, and robustness to variations. In this work, we employ the Contrastive Language-Image Pre-training (CLIP) model to obtain the features. These features guide identity preservation and attribute control through newly designed VFM-based visual and textual semantic contrastive losses. Extensive evaluations on benchmarks such as the FaceForensics++ (FF++) dataset, the Multiple Pose, Illumination, and Expression (MPIE) dataset, and the large-pose Flickr face (LPFF) dataset demonstrate that FaceChanger matches or exceeds state-of-the-art performance under standard conditions and significantly outperforms them in high-resolution, pose-intensive scenarios.

Item Type: Article
DOI/Identification number: 10.1109/ACCESS.2025.3606518
Uncontrolled keywords: face identity swap; face swap; vision foundation model; contrastive learning
Subjects: Q Science
Institutional Unit: Schools > School of Computing
Former Institutional Unit:
There are no former institutional units.
Depositing User: Zhongtian Sun
Date Deposited: 03 Nov 2025 11:23 UTC
Last Modified: 05 Nov 2025 03:44 UTC
Resource URI: https://kar.kent.ac.uk/id/eprint/111837 (The current URI for this page, for reference purposes)

University of Kent Author Information

  • Depositors only (login required):

Total unique views of this page since July 2020. For more details click on the image.