基于 Transformer 人像关键点检测网络的研究.

Authors :: 陈凯
 林珊玲
 林坚普
 林志贤
 缪志辉
 郭太良
Source :: Application Research of Computers / Jisuanji Yingyong Yanjiu. Jun2023, Vol. 40 Issue 6, p1870-1881. 12p.
Publication Year :: 2023
Abstract: In order to address the shortcomings of the facial landmarks detection models, which cannot model the relations between long-distance landmarks, this paper proposed a parallel multi-branch architecture combining with convolution and Transformer for facial landmarks tasks, called MCTN, it utilized the dynamic attention mechanism to model the long-distance relations between facial landmarks. The multi-branch parallel structure designing allowed MCTN to include shared weights, global information fusion and other merits. What’s more, this paper proposed the novel Transformer structure, Deformer, which could make the MCTN focused attention weights faster on sparse and meaningful locations and solved the problem of slow convergence of Transformer. MCTN reached 4.33%,3.12% and 3.15% normalized average error respectively on the WFLW,300W and COFW datasets, the results show that MCTN utilizes Transformer with CNN multi-branch parallel structure and Deformer structure to dramatically outperform other facial landmarks localization algorithms based on convolution network. [ABSTRACT FROM AUTHOR]