Cross-Domain Multitask Model for Head Detection and Facial Attribute Estimation

Mirzaee Bafti, Saber, Chatzidimitriadis, Sotirios, Sirlantzis, Konstantinos (2022) Cross-Domain Multitask Model for Head Detection and Facial Attribute Estimation. IEEE Access, 10 . pp. 54703-54712. ISSN 2169-3536. (doi:10.1109/ACCESS.2022.3176621) (KAR id:95203)

PDF Publisher pdf Language: English DOI for this version: 10.1109/ACCESS.2022.3176621 This work is licensed under a Creative Commons Attribution 4.0 International License.
Download this file (PDF/1MB)	Preview
Request a format suitable for use with assistive technology e.g. a screenreader
Official URL: http://dx.doi.org/10.1109/ACCESS.2022.3176621

Abstract

Extracting specific attributes of a face within an image, such as emotion, age, or head pose has numerous applications. As one of the most widely used vision-based attribute extraction models, HPE (Head Pose Estimation) models have been extensively explored. In spite of the success of these models, the pre-processing step of cropping the region of interest from the image, before it is fed into the network, is still a challenge. Moreover, a significant portion of the existing models are problem-specific models developed specifically for HPE. In response to the wide application of HPE models and the limitations of existing techniques, we developed a multi-purpose, multi-task model to parallelize face detection and pose estimation (i.e., along both axes of yaw and pitch). This model is based on the Mask-RCNN object detection model, which computes a collection of mid-level shared features in conjunction with some independent neural networks, for the detection of faces and the estimation of poses. We evaluated the proposed model using two publicly available datasets, Prima and BIWI, and obtained MAEs (Mean Absolute Errors) of 8.0 ± 8.6, and 8.2 ± 8.1 for yaw and pitch detection on Prima, and 6.2 ± 4.7, and 6.6 ± 4.9 on BIWI dataset. The generalization capability of the model and its cross-domain effectiveness was assessed on the publicly available dataset of UTKFace for face detection and age estimation, resulting a MAE of 5.3 ± 3.2. A comparison of the proposed model’s performance on the domains it was tested on reveals that it compares favorably with the state-of-the-art models, as demonstrated by their published results. We provide the source code of our model for public use at: https://github.com/kahroba2000/MTL_MRCNN.

Item Type:	Article
DOI/Identification number:	10.1109/ACCESS.2022.3176621
Uncontrolled keywords:	Head tracking, head pose estimation, multi-task learning, age detection, object detection, mask R-CNN
Subjects:	T Technology > TK Electrical engineering. Electronics. Nuclear engineering
Institutional Unit:	Schools > School of Engineering, Mathematics and Physics > Engineering
Former Institutional Unit:	Divisions > Division of Computing, Engineering and Mathematical Sciences > School of Engineering and Digital Arts
Depositing User:	Saber Mirzaee-Bafti
Date Deposited:	27 May 2022 23:09 UTC
Last Modified:	29 Apr 2026 08:55 UTC
Resource URI:	https://kar.kent.ac.uk/id/eprint/95203 (The current URI for this page, for reference purposes)

University of Kent Author Information

Mirzaee Bafti, Saber.

Creator's ORCID:	https://orcid.org/0000-0001-8357-4373
CReDIT Contributor Roles:

Chatzidimitriadis, Sotirios.

Creator's ORCID:	https://orcid.org/0000-0002-2422-7221
CReDIT Contributor Roles:

Sirlantzis, Konstantinos.

Creator's ORCID:	https://orcid.org/0000-0002-0847-8880
CReDIT Contributor Roles:

Depositors only (login required):

Altmetric

Total Views

Total unique views of this page since July 2020. For more details click on the image.