Back to Search Start Over

CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization.

Authors :
Deng, Mingfang
Zhao, Huailin
Gao, Ming
Source :
Visual Computer. Feb2024, Vol. 40 Issue 2, p1053-1067. 15p.
Publication Year :
2024

Abstract

Recent progress in crowd counting and localization methods mainly relies on expensive point-level annotations and convolutional neural networks with limited receptive filed, which hinders their applications in complex real-world scenes. To this end, we present CLFormer, a Transformer-based weakly supervised crowd counting and localization framework. The model extracts global information from the input image using a Transformer and then passes the extracted features to both a regression branch for crowd counting and a localization branch for localization. Initial proposals are produced by the localization branch and filtered via score maps generated from the extracted features, and their centers are used as pseudo-point-level annotations. Through staggered training of the two branches, the quality of pseudo-point-level annotations is improved, and the final localization maps are generated. Experiments on four benchmark datasets (i.e., ShanghaiTech, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd) demonstrate that CLFormer obtains better counting performance than weakly supervised and fully supervised counting networks and comparable localization performance to fully supervised localization networks. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
01782789
Volume :
40
Issue :
2
Database :
Academic Search Index
Journal :
Visual Computer
Publication Type :
Academic Journal
Accession number :
174971129
Full Text :
https://doi.org/10.1007/s00371-023-02831-z