Back to Search
Start Over
CLFormer: a unified transformer-based framework for weakly supervised crowd counting and localization.
- Source :
-
Visual Computer . Feb2024, Vol. 40 Issue 2, p1053-1067. 15p. - Publication Year :
- 2024
-
Abstract
- Recent progress in crowd counting and localization methods mainly relies on expensive point-level annotations and convolutional neural networks with limited receptive filed, which hinders their applications in complex real-world scenes. To this end, we present CLFormer, a Transformer-based weakly supervised crowd counting and localization framework. The model extracts global information from the input image using a Transformer and then passes the extracted features to both a regression branch for crowd counting and a localization branch for localization. Initial proposals are produced by the localization branch and filtered via score maps generated from the extracted features, and their centers are used as pseudo-point-level annotations. Through staggered training of the two branches, the quality of pseudo-point-level annotations is improved, and the final localization maps are generated. Experiments on four benchmark datasets (i.e., ShanghaiTech, UCF-QNRF, JHU-CROWD++, and NWPU-Crowd) demonstrate that CLFormer obtains better counting performance than weakly supervised and fully supervised counting networks and comparable localization performance to fully supervised localization networks. [ABSTRACT FROM AUTHOR]
- Subjects :
- *COUNTING
*CONVOLUTIONAL neural networks
*LOCALIZATION (Mathematics)
*CROWDS
Subjects
Details
- Language :
- English
- ISSN :
- 01782789
- Volume :
- 40
- Issue :
- 2
- Database :
- Academic Search Index
- Journal :
- Visual Computer
- Publication Type :
- Academic Journal
- Accession number :
- 174971129
- Full Text :
- https://doi.org/10.1007/s00371-023-02831-z