4 results on '"Lei, Weimin"'
Search Results
2. An Adaptive Multi-Scale Network Based on Depth Information for Crowd Counting †.
- Author
-
Zhang, Peng, Lei, Weimin, Zhao, Xinlei, Dong, Lijia, and Lin, Zhaonan
- Subjects
- *
CONVOLUTIONAL neural networks , *DIGITAL cameras , *COMPUTER vision , *VIDEO surveillance , *PUBLIC spaces , *FEATURE extraction - Abstract
Crowd counting, as a basic computer vision task, plays an important role in many fields such as video surveillance, accident prediction, public security, and intelligent transportation. At present, crowd counting tasks face various challenges. Firstly, due to the diversity of crowd distribution and increasing population density, there is a phenomenon of large-scale crowd aggregation in public places, sports stadiums, and stations, resulting in very serious occlusion. Secondly, when annotating large-scale datasets, positioning errors can also easily affect training results. In addition, the size of human head targets in dense images is not consistent, making it difficult to identify both near and far targets using only one network simultaneously. The existing crowd counting methods mainly use density plot regression methods. However, this framework does not distinguish the features between distant and near targets and cannot adaptively respond to scale changes. Therefore, the detection performance in areas with sparse population distribution is not good. To solve such problems, we propose an adaptive multi-scale far and near distance network based on the convolutional neural network (CNN) framework for counting dense populations and achieving a good balance between accuracy, inference speed, and performance. However, on the feature level, in order to enable the model to distinguish the differences between near and far features, we use stacked convolution layers to deepen the depth of the network, allocate different receptive fields according to the distance between the target and the camera, and fuse the features between nearby targets to enhance the feature extraction ability of pedestrians under nearby targets. Secondly, depth information is used to distinguish distant and near targets of different scales and the original image is cut into four different patches to perform pixel-level adaptive modeling on the population. In addition, we add density normalized average precision (nAP) indicators to analyze the accuracy of our method in spatial positioning. This paper validates the effectiveness of NF-Net on three challenging benchmarks in Shanghai Tech Part A and B, UCF_ CC_50, and UCF-QNRF datasets. Compared with SOTA, it has more significant performance in various scenarios. In the UCF-QNRF dataset, it is further validated that our method effectively solves the interference of complex backgrounds. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Spatio-Temporal Video Denoising Based on Attention Mechanism.
- Author
-
Ji, Kai, Lei, Weimin, and Zhang, Wei
- Subjects
- *
IMAGE denoising , *CONVOLUTIONAL neural networks , *PATTERN recognition systems , *CAMCORDERS , *SIGNAL-to-noise ratio , *RANDOM noise theory - Abstract
The demands of high-quality videos captured by camera become bigger due to the rapid development of pattern recognition and artificial intelligence. Video denoising is the key technology to obtain clear videos. However, the research on video denoising is far from enough now. In this paper, we propose a video denoising method based on convolutional neural network architecture to reduce the noise from the sensor system. We improve the loss function of noise estimation by imposing adaptive penalty on under-estimation error of noise level which makes our method perform robustly. Furthermore, we make use of multi-level features to guide the spatial denoising, where multilayer semantic information of the image is regarded as the perceptual loss. Instead of relying on Optical Flow solving the characterization of inter-frame information, we utilize U-Net-like structure to handle motion implicitly. It is less computationally expensive and avoids distortions caused by inaccurate flow and object occlusion. In order to locate temporal features and suppress useless information, the attention mechanism is introduced to the skip connections of the U-Net-like structure. Experimental results demonstrate that the proposed algorithm outputs more convincing results in both peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) indexes when processing Gaussian noise, synthetic real noise, and real noise compared with selected approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
4. A framework for detecting fighting behavior based on key points of human skeletal posture.
- Author
-
Zhang, Peng, Zhao, Xinlei, Dong, Lijia, Lei, Weimin, Zhang, Wei, and Lin, Zhaonan
- Subjects
ARTIFICIAL neural networks ,VIOLENCE ,HUMAN skeleton ,VIDEO surveillance ,CRIMINAL behavior ,SOCIAL stability - Abstract
Detecting fights from videos and images in public surveillance places is an important task to limit violent criminal behavior. Real-time detection of violent behavior can effectively ensure the personal safety of pedestrians and further maintain public social stability. Therefore, in this paper, we aim to detect real-time violent behavior in videos. We propose a novel neural network model framework based on human pose key points, called Real-Time Pose Net (RTPNet). Utilize the pose extractor (YOLO-Pose) to extract human skeleton features, and classify video level violent behavior based on the 2DCNN model (ACTION-Net). Utilize appearance features and inter frame correlation to accurately detect fighting behavior. We have also proposed a new image dataset called VIMD (Violence Image Dataset), which includes images of fighting behavior collected online and captured independently. After training on the dataset, the network can effectively identify skeletal features from videos and locate fighting movements. The dataset is available on GitHub (https://github.com/ChinaZhangPeng/Violence-Image-Dataset). We also conducted experiments on four datasets, including Hockey-Fight, RWF-2000, Surveillance Camera Fight, and AVD dataset. These experimental results showed that RTPNet outperformed the most advanced methods in the past, achieving an accuracy of 99.4% on the Hockey-Fight dataset, 93.3% on the RWF-2000 dataset, and 93.4% on the Surveillance Camera Fight dataset, 99.3% on the AVD dataset. And with speeds capable of reaching 33fps, state-of-the-art results are achieved with faster speeds. In addition, RTPNet can also have good detection performance in violent behavior in complex backgrounds. • The YOLO-Pose extracts human skeletal features with the pose of key points as input, and combines RGB information and human skeletal points information to recognize and classify combat behavior. • The Weight Selection Module (WSM) uses parameters generated from skeletal data as weight coefficients to allocate skeletal information and RGB information. • The Weight Distribution Module (WDM) further corrects the error of weight coefficients, while the Keyframe Weight Allocation (KWA) is responsible for connecting skeleton information and RGB features to generate the final detection result. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.