UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

SPL-Net: Spatial-Semantic Patch Learning Network for Facial Attribute Recognition with Limited Labeled Data

Yan, Yan; Shu, Ying; Chen, Si; Xue, Jing-Hao; Shen, Chunhua; Wang, Hanzi; (2023) SPL-Net: Spatial-Semantic Patch Learning Network for Facial Attribute Recognition with Limited Labeled Data. International Journal of Computer Vision , 131 pp. 2097-2121. 10.1007/s11263-023-01787-w. Green open access

[thumbnail of IJCV-YanYan-YingShu-accepted.pdf]
Preview
Text
IJCV-YanYan-YingShu-accepted.pdf - Accepted Version

Download (1MB) | Preview

Abstract

Existing deep learning-based facial attribute recognition (FAR) methods rely heavily on large-scale labeled training data. Unfortunately, in many real-world applications, only limited labeled data are available, resulting in the performance deterioration of these methods. To address this issue, we propose a novel spatial-semantic patch learning network (SPL-Net), consisting of a multi-branch shared subnetwork (MSS), three auxiliary task subnetworks (ATS), and an FAR subnetwork, for attribute classification with limited labeled data. Considering the diversity of facial attributes, MSS includes a task-shared branch and four region branches, each of which contains cascaded dual cross attention modules to extract region-specific features. SPL-Net involves a two-stage learning procedure. In the first stage, MSS and ATS are jointly trained to perform three auxiliary tasks (i.e., a patch rotation task (PRT), a patch segmentation task (PST), and a patch classification task (PCT)), which exploit the spatial-semantic relationship on large-scale unlabeled facial data from various perspectives. Specifically, PRT encodes the spatial information of facial images based on self-supervised learning. PST and PCT respectively capture the pixel-level and image-level semantic information of facial images by leveraging a facial parsing model. Thus, a well-pretrained MSS is obtained. In the second stage, based on the pre-trained MSS, an FAR model is easily fine-tuned to predict facial attributes by requiring only a small amount of labeled data. Experimental results on challenging facial attribute datasets (including CelebA, LFWA, and MAAD) show the superiority of SPL-Net over several state-of-the-art methods in the case of limited labeled data.

Type: Article
Title: SPL-Net: Spatial-Semantic Patch Learning Network for Facial Attribute Recognition with Limited Labeled Data
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s11263-023-01787-w
Publisher version: https://doi.org/10.1007/s11263-023-01787-w
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Facial attribute recognition, Limited labeled data, Multi-task learning, Multi-label learning, Self-supervised learning, Semantic segmentation
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10169853
Downloads since deposit
608Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item