Gated Region-Refine pose transformer for human pose estimation.

Authors :: Wang, Tianfeng
Zhang, Xiaoxu
Source :: Neurocomputing. Apr2023, Vol. 530, p37-47. 11p.
Publication Year :: 2023
Abstract: Implementing the transformer for global fusion is a novel and efficient method for pose estimation. Although the computational complexity of modeling dense attention can be significantly reduced by pruning possible human tokens, the accuracy of pose estimation still suffers from the problem of high overlap of candidate regions and severe background noise. Moreover, the undifferentiated fusion of features from different views also leads to a sizeable effective information loss. To address these challenges, we propose a Gated Region-Refine Pose Transformer (GRRPT) for human pose estimation. The proposed GRRPT can obtain the general area of the human body from the coarse-grained tokens and then embed it into the fine-grained ones to extract more details of the joints. Experimental results on COCO demonstrate that performing the Multi-Resolution Attention mechanism learns more refined candidate regions and improves accuracy. Furthermore, we design a Fusion Gate module consisting of two gates to pixel-wise select valid information from the auxiliary views, which significantly alleviates information redundancy. Finally, we evaluate the effectiveness of our method on Human3.6M and our dataset FDU-Motion and achieve state-of-the-art performance. [ABSTRACT FROM AUTHOR]