1. Towards dense people detection with deep learning and depth images
- Author
-
Carlos A. Luna, Cristina Losada-Gutierrez, Daniel Pizarro, David Casillas-Perez, Javier Macias-Guarasa, Roberto Martin-Lopez, and David Fuentes-Jimenez
- Subjects
Fine-tuning ,Artificial neural network ,Computer science ,business.industry ,Deep learning ,Synthetic data ,Image (mathematics) ,Range (mathematics) ,Artificial Intelligence ,Control and Systems Engineering ,Position (vector) ,Identity (object-oriented programming) ,Computer vision ,Artificial intelligence ,Electrical and Electronic Engineering ,business - Abstract
This paper describes a novel DNN-based system, named PD3net , that detects multiple people from a single depth image, in real time. The proposed neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at each person’s head. This likelihood map encodes both the number of detected people as well as their position in the image, from which the 3D position can be computed. The proposed DNN includes spatially separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use synthetic data for initially training the network, followed by fine tuning with a small amount of real data. This allows adapting the network to different scenarios without needing large and manually labeled image datasets. Due to that, the people detection system presented in this paper has numerous potential applications in different fields, such as capacity control, automatic video-surveillance, people or groups behavior analysis, healthcare or monitoring and assistance of elderly people in ambient assisted living environments. In addition, the use of depth information does not allow recognizing the identity of people in the scene, thus enabling their detection while preserving their privacy. The proposed DNN has been experimentally evaluated and compared with other state-of-the-art approaches, including both classical and DNN-based solutions, under a wide range of experimental conditions. The achieved results allows concluding that the proposed architecture and the training strategy are effective, and the network generalize to work with scenes different from those used during training. We also demonstrate that our proposal outperforms existing methods and can accurately detect people in scenes with significant occlusions.
- Published
- 2021
- Full Text
- View/download PDF