Back to Search
Start Over
DTCC: Multi-level dilated convolution with transformer for weakly-supervised crowd counting
- Source :
- Computational Visual Media, Vol 9, Iss 4, Pp 859-873 (2023)
- Publication Year :
- 2023
- Publisher :
- SpringerOpen, 2023.
-
Abstract
- Abstract Crowd counting provides an important foundation for public security and urban management. Due to the existence of small targets and large density variations in crowd images, crowd counting is a challenging task. Mainstream methods usually apply convolution neural networks (CNNs) to regress a density map, which requires annotations of individual persons and counts. Weakly-supervised methods can avoid detailed labeling and only require counts as annotations of images, but existing methods fail to achieve satisfactory performance because a global perspective field and multi-level information are usually ignored. We propose a weakly-supervised method, DTCC, which effectively combines multi-level dilated convolution and transformer methods to realize end-to-end crowd counting. Its main components include a recursive swin transformer and a multi-level dilated convolution regression head. The recursive swin transformer combines a pyramid visual transformer with a fine-tuned recursive pyramid structure to capture deep multi-level crowd features, including global features. The multi-level dilated convolution regression head includes multi-level dilated convolution and a linear regression head for the feature extraction module. This module can capture both low- and high-level features simultaneously to enhance the receptive field. In addition, two regression head fusion mechanisms realize dynamic and mean fusion counting. Experiments on four well-known benchmark crowd counting datasets (UCF_CC_50, ShanghaiTech, UCF_QNRF, and JHU-Crowd++) show that DTCC achieves results superior to other weakly-supervised methods and comparable to fully-supervised methods.
Details
- Language :
- English
- ISSN :
- 20960433 and 20960662
- Volume :
- 9
- Issue :
- 4
- Database :
- Directory of Open Access Journals
- Journal :
- Computational Visual Media
- Publication Type :
- Academic Journal
- Accession number :
- edsdoj.2a82b860fcf948dda83e55ffb312c061
- Document Type :
- article
- Full Text :
- https://doi.org/10.1007/s41095-022-0313-5