11 results on '"Yao, Jian"'
Search Results
2. Automatic rape flower cluster counting method based on low-cost labelling and UAV-RGB images.
- Author
-
Li, Jie, Wang, Enguo, Qiao, Jiangwei, Li, Yi, Li, Li, Yao, Jian, and Liao, Guisheng
- Subjects
DEEP learning ,RAPE ,STANDARD deviations - Abstract
Background: The flowering period is a critical time for the growth of rape plants. Counting rape flower clusters can help farmers to predict the yield information of the corresponding rape fields. However, counting in-field is a time-consuming and labor-intensive task. To address this, we explored a deep learning counting method based on unmanned aircraft vehicle (UAV). The proposed method developed the in-field counting of rape flower clusters as a density estimation problem. It is different from the object detection method of counting the bounding boxes. The crucial step of the density map estimation using deep learning is to train a deep neural network that maps from an input image to the corresponding annotated density map. Results: We explored a rape flower cluster counting network series: RapeNet and RapeNet+. A rectangular box labeling-based rape flower clusters dataset (RFRB) and a centroid labeling-based rape flower clusters dataset (RFCP) were used for network model training. To verify the performance of RapeNet series, the paper compares the counting result with the real values of manual annotation. The average accuracy (Acc), relative root mean square error (rrMSE) and R 2 of the metrics are up to 0.9062, 12.03 and 0.9635 on the dataset RFRB, and 0.9538, 5.61 and 0.9826 on the dataset RFCP, respectively. The resolution has little influence for the proposed model. In addition, the visualization results have some interpretability. Conclusions: Extensive experimental results demonstrate that the RapeNet series outperforms other state-of-the-art counting approaches. The proposed method provides an important technical support for the crop counting statistics of rape flower clusters in field. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
3. Research and Implementation of Fast-LPRNet Algorithm for License Plate Recognition.
- Author
-
Wang, Zhichao, Jiang, Yu, Liu, Jiaxin, Gong, Siyu, Yao, Jian, and Jiang, Feng
- Subjects
DEEP learning ,RESEARCH implementation ,ALGORITHMS ,URBAN parks - Abstract
The license plate recognition is an important part of the intelligent traffic management system, and the application of deep learning to the license plate recognition system can effectively improve the speed and accuracy of recognition. Aiming at the problems of traditional license plate recognition algorithms such as the low accuracy, slow speed, and the recognition rate being easily affected by the environment, a Convolutional Neural Network- (CNN-) based license plate recognition algorithm-Fast-LPRNet is proposed. This algorithm uses the nonsegment recognition method, removes the fully connected layer, and reduces the number of parameters. The algorithm—which has strong generalization ability, scalability, and robustness—performs license plate recognition on the FPGA hardware. Increaseing the depth of network on the basis of the Fast-LPRNet structure, the dataset of Chinese City Parking Dataset (CCPD) can be recognized with an accuracy beyond 90%. The experimental results show that the license plate recognition algorithm has high recognition accuracy, strong generalization ability, and good robustness. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
4. A Denoising Autoencoder-Based Bearing Fault Diagnosis System for Time-Domain Vibration Signals.
- Author
-
Gu, Yi, Cao, Jiawei, Song, Xin, and Yao, Jian
- Subjects
FAULT diagnosis ,MONITORING of machinery ,ROTATING machinery ,DEEP learning ,ROLLER bearings ,NOISE control - Abstract
The condition monitoring of rotating machinery is always a focus of intelligent fault diagnosis. In view of the traditional methods' excessive dependence on prior knowledge to manually extract features, their limited capacity to learn complex nonlinear relations in fault signals and the mixing of the collected signals with environmental noise in the course of the work of rotating machines, this article proposes a novel approach for detecting the bearing fault, which is based on deep learning. To effectively detect, locate, and identify faults in rolling bearings, a stacked noise reduction autoencoder is utilized for abstracting characteristic from the original vibration of signals, and then, the characteristic is provided as input for backpropagation (BP) network classifier. The results output by this classifier represent different fault categories. Experimental results obtained on rolling bearing datasets show that this method can be used to effectively diagnose bearing faults based on original time-domain signals. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
5. Residual-Network-Leveraged Vehicle-Thrown-Waste Identification in Real-Time Traffic Surveillance Videos.
- Author
-
Qian, Pengjiang, Yuan, Kai, Yao, Jian, Fan, Chao, Zhang, Hua, Liu, Yuan, and Lu, Xianling
- Abstract
We attempt to intelligently identify violations of throwing waste from vehicles (TWV) in real-time traffic surveillance videos. In addition to polluting the environment, TWV easily causes injury to sanitation workers responsible for cleaning roads by passing vehicles. However, manual inspection is still the commonest way to recognize such uncivilized behavior in videos with very high time and labor-consuming. In answer to these challenges, we design a novel 20-layer residual network (Nov-ResNet-20) for training the vehicle-thrown-waste identification model (VTWIM). Then, incorporating Nov-ResNet-20, Selective Search, and Non-Maximum Suppression (NMS), we propose the deep-residual-network-leveraged vehicle-thrown-waste identification method (DRN-VTWI). Our method first splits one video frame into several regions matching suspected objects marked with location boxes via Selective Search. Then, in terms of the VTWIM trained by Nov-ResNet-20 our method identifies the regions containing TWV. Last, our method removes the redundant location boxes for each recognized, vehicle-thrown waste and only keeps the best one. The significance of our work is four-fold: 1) Nov-ResNet-20 has a moderate depth: 6 convolutional layers, 7 residual layers, and in total 20 weight layers. Due to the joint contribution of the residual, batch normalization, dropout, and cross-entropy loss, it is eligible to identify TWV using a small quantity of manually-annotated training samples. 2) Selective Search diversely marks all possible, suspected objects in video frames, whereas NMS keeps the best location box for each recognized vehicle-thrown waste, removing all redundancies. In this way, DRN-VTWI finds potential violations of TWV as many as possible and optimally annotates vehicle-thrown wastes in frames as well. 3) Combining the power of Nov-ResNet-20, Selective Search, and NMS, DRN-VTWI well solves the challenging, intelligent identification of vehicle-thrown wastes for real-time traffic surveillance. Experimental studies conducted on real-time traffic surveillance videos demonstrate the effectiveness as well as superiority of our efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
6. A Novel Octree-Based 3-D Fully Convolutional Neural Network for Point Cloud Classification in Road Environment.
- Author
-
Xiang, Binbin, Tu, Jingmin, Yao, Jian, and Li, Li
- Subjects
ARTIFICIAL neural networks ,POINT cloud ,AUTOMATIC classification ,CLASSIFICATION ,DEEP learning ,RADARSAT satellites ,VEGETATION classification - Abstract
The automatic classification of 3-D point clouds is publicly known as a challenging task in a complex road environment. Specifically, each point is automatically classified into a unique category label, and then, the labels are used as clues for semantic analysis and scene recognition. Instead of heuristically extracting handcrafted features in traditional methods to classify all points, we put forward an end-to-end octree-based fully convolutional network (FCN) to classify 3-D point clouds in an urban road environment. There are four contributions in this paper. The first is that the integration and comprehensive uses of OctNet and FCN greatly decrease the computing time and memory demands compared with a dense 3-D convolutional neural network (CNN). The second is that the octree-based network is strengthened by means of modifying the cross-entropy loss function to solve the problems of an unbalanced category distribution. The third is that an Inception-ResNet block is united with our network, which enables our 3-D CNN to effectively learn how to classify scenes containing objects at multiple scales and improve classification accuracy. The last is that an open source data set (HuangshiRoad data set) with ten different classes is introduced for 3-D point cloud classification. Three representative data sets [Semantic3D, WHU_MLS (blocks I and II), and HuangshiRoad] with different covered areas and numbers of points and classes are selected to evaluate our proposed method. The experimental results show that the overall classification accuracy is appreciable, with 89.4% for Semantic3D, 82.9% for WHU_MLS block I, 91.4% for WHU_MLS block II, and 94% for HuangshiRoad. Our deep learning approach can efficiently classify 3-D dense point clouds in an urban road environment measured by a mobile laser scanning (MLS) system or static LiDAR. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
7. Meta-Calib: A generic, robust and accurate camera calibration framework with ArUco-encoded meta-board.
- Author
-
Zhou, Pengwei, Yin, Hongche, Xu, Guozheng, Li, Li, Yao, Jian, Li, Jian, Liu, Changfeng, and Shi, Zuoqin
- Subjects
- *
CAMERA calibration , *ORTHOGRAPHIC projection , *COMPUTER vision , *HAZARD Analysis & Critical Control Point (Food safety system) , *CAMERAS , *DEEP learning , *POSE estimation (Computer vision) , *AUGMENTED reality , *EYE tracking - Abstract
The rapid development of augmented reality (AR), 3D reconstruction, simultaneous localization and mapping (SLAM), and autonomous driving requires off-the-shelf camera calibration solutions that are adaptable to cameras of different configurations in different complex scenarios. To this end, we propose a generic, robust, and accurate camera calibration framework, called Meta-Calib, by using single or multiple novel designed ArUco-encoded meta-board(s), which is dedicated to estimate accurate camera intrinsic parameters and extrinsic transformations of different multi-camera configurations. The ArUco calibration board has been redesigned to facilitate learning-based robust detection and obtain higher precision control point coordinates, which is termed the meta-board. This completely replaces the widely-used chessboard based on the corner extraction scheme to greatly alleviate the impact of image distortion on control points, especially when it is located at the boundary area of the fish-eye camera. A robust two-stage deep learning detection strategy is applied to reliably localize the ArUco-encoded inner coding region of the meta-board followed by identifying two categories of circular shapes representing "0" and "1" encoded in the ArUco pattern for decoding and orientation determination. The center points of circular shapes on the meta-board in the distorted image taken under the perspective view can be approximated through elliptical fitting with contour edges. The deviation between the fitting center points and ground-truth can be greatly suppressed when the refined sub-pixel contour edges extracted on the original image are projected to the orthographic projection view based on the camera intrinsic parameters, distortion coefficients and the prior information of the meta-board. Based on this observation, we propose a systematic iterative refinement approach to achieve the high-precision intrinsic calibration of a camera. This process involves improving the estimation of camera intrinsic parameters and fitting the center control points of circular shapes on the meta-boards in an iterative manner. The progressive nature of our approach permits reliably calibrate large distortion camera models under the presence of noisy measurements, which ensures good convergence. In addition, we also propose a graph-based multi-camera extrinsic calibration method via the corrected control points to reliably estimate both the relative poses of the meta-boards and cameras in the multi-camera system. The proposed method is not constrained by the number of cameras and meta-boards used, which makes our strategy accessible even with inflexible computer vision experts. Furthermore, we have derived the mathematical form for computing the covariance of the extrinsic transformation, which makes it possible to evaluate the uncertainty of the calibration results. Extensive experiments on a large number of both real and synthetic datasets, including perspective, fish-eye, and multiple overlapping cameras, are performed to prove the effectiveness and robustness of the developed Meta-Calib calibration framework. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. DM-SLAM: A Feature-Based SLAM System for Rigid Dynamic Scenes.
- Author
-
Cheng, Junhao, Wang, Zhi, Zhou, Hongyan, Li, Li, and Yao, Jian
- Subjects
OPTICAL flow ,DYNAMICAL systems ,VISUAL environment ,EYE ,SLAM (Robotics) - Abstract
Most Simultaneous Localization and Mapping (SLAM) methods assume that environments are static. Such a strong assumption limits the application of most visual SLAM systems. The dynamic objects will cause many wrong data associations during the SLAM process. To address this problem, a novel visual SLAM method that follows the pipeline of feature-based methods called DM-SLAM is proposed in this paper. DM-SLAM combines an instance segmentation network with optical flow information to improve the location accuracy in dynamic environments, which supports monocular, stereo, and RGB-D sensors. It consists of four modules: semantic segmentation, ego-motion estimation, dynamic point detection and a feature-based SLAM framework. The semantic segmentation module obtains pixel-wise segmentation results of potentially dynamic objects, and the ego-motion estimation module calculates the initial pose. In the third module, two different strategies are presented to detect dynamic feature points for RGB-D/stereo and monocular cases. In the first case, the feature points with depth information are reprojected to the current frame. The reprojection offset vectors are used to distinguish the dynamic points. In the other case, we utilize the epipolar constraint to accomplish this task. Furthermore, the static feature points left are fed into the fourth module. The experimental results on the public TUM and KITTI datasets demonstrate that DM-SLAM outperforms the standard visual SLAM baselines in terms of accuracy in highly dynamic environments. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
9. Multi-Oriented and Scale-Invariant License Plate Detection Based on Convolutional Neural Networks.
- Author
-
Han, Jing, Yao, Jian, Zhao, Jiao, Tu, Jingmin, and Liu, Yahui
- Subjects
- *
ARTIFICIAL neural networks , *DEEP learning , *PARALLELOGRAMS , *SCALE invariance (Statistical physics) , *MATHEMATICAL symmetry - Abstract
License plate detection (LPD) is the first and key step in license plate recognition. State-of-the-art object-detection algorithms based on deep learning provide a promising form of LPD. However, there still exist two main challenges. First, existing methods often enclose objects with horizontal rectangles. However, horizontal rectangles are not always suitable since license plates in images are multi-oriented, reflected by rotation and perspective distortion. Second, the scale of license plates often varies, leading to the difficulty of multi-scale detection. To address the aforementioned problems, we propose a novel method of multi-oriented and scale-invariant license plate detection (MOSI-LPD) based on convolutional neural networks. Our MOSI-LPD tightly encloses the multi-oriented license plates with bounding parallelograms, regardless of the license plate scales. To obtain bounding parallelograms, we first parameterize the edge points of license plates by relative positions. Next, we design mapping functions between oriented regions and horizontal proposals. Then, we enforce the symmetry constraints in the loss function and train the model with a multi-task loss. Finally, we map region proposals to three edge points of a nearby license plate, and infer the fourth point to form bounding parallelograms. To achieve scale invariance, we first design anchor boxes based on inherent shapes of license plates. Next, we search different layers to generate region proposals with multiple scales. Finally, we up-sample the last layer and combine proposal features extracted from different layers to recognize true license plates. Experimental results have demonstrated that the proposed method outperforms existing approaches in terms of detecting license plates with different orientations and multiple scales. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
10. Point2Roof: End-to-end 3D building roof modeling from airborne LiDAR point clouds.
- Author
-
Li, Li, Song, Nan, Sun, Fei, Liu, Xinyi, Wang, Ruisheng, Yao, Jian, and Cao, Shaosheng
- Subjects
- *
POINT cloud , *DEEP learning , *LIDAR , *BUILDING repair , *COMPUTER vision , *SOURCE code - Abstract
Three-dimensional (3D) building roof reconstruction from airborne LiDAR point clouds is an important task in photogrammetry and computer vision. To automatically reconstruct the 3D building models at Level of Detail 2 (LoD-2) from airborne LiDAR point clouds, the data-driven approaches usually need to be performed in two steps: geometric primitive extraction and roof structure inference. Obviously, the traditional approaches are not end-to-end, the accumulated errors in different stages cannot be avoided and the final 3D roof models may not be optimal. In addition, the results of 3D roof models largely depend on the accuracy of geometric primitives (planes, lines, etc.). To solve these problems, we present a deep learning-based approach to directly reconstruct building roofs from airborne LiDAR point clouds, named Point2Roof. In our method, we start by extracting the deep features for each input point using PointNet++. Then, we identify a set of candidate corner points from the input point clouds using the extracted deep features. In addition, we also regress the offset for each candidate corner point to refine their locations. After that, these candidates are clustered into a set of initial vertices, and we further refine their locations to obtain the final accurate vertices. Finally, we propose a Paired Point Attention (PPA) module to predict the true model edges from an exhaustive set of candidate edges between the vertices. Unlike traditional roof modeling approaches, the proposed Point2Roof is end-to-end. However, due to the lack of a building reconstruction dataset, we construct a large-scale synthetic dataset to verify the effectiveness and robustness of the proposed Point2Roof. The experimental results conducted on the synthetic benchmark demonstrate that the proposed Point2Roof significantly outperforms the traditional roof modeling approaches. The experiments also show that the network trained on the synthetic dataset can be applied to the real point clouds after fine-tuning the trained model on a small real dataset. The large-scale synthetic dataset, the small real dataset and the source code of our approach are publicly available in https://github.com/Li-Li-Whu/Point2Roof. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
11. A defocus and similarity attention-based cascaded network for multi-focus and misaligned image fusion.
- Author
-
Chen, Peiming, Jiang, Jiaqin, Li, Li, and Yao, Jian
- Subjects
- *
IMAGE fusion , *OPTICAL flow , *IMAGE stabilization , *CONVOLUTIONAL neural networks , *DEEP learning - Abstract
Multi-focus image fusion uses multiple images focused on different depths to generate a clear image covering the whole scene. The existing multi-focus image fusion methods do not consider the defocus successive variation in close-range photography and the camera shaking in sequence image shooting, and most methods cannot process multiple images simultaneously. We proposed an end-to-end deep learning network to generate the all-in-focus image from multi-focus and misaligned images to solve these problems. Specifically, taking multiple multi-focus and misaligned source images as input, our Defocus and Similarity Attention Fusion Network (DSAF-Net) first generates the corresponding defocus map through the Defocus-Net, and warps the source images and defocus maps to a unified camera view according to the optical flow estimated by the OpticalFlow-Net. Finally, our model applies a coarse-to-fine correspondence matching scheme to obtain the similarity weight maps, combined with the warped source images and defocus maps to fuse into a clear image. For training and testing our DSAF-Net, a multi-focus and misaligned cultural heritage photography dataset (WHU-MFM) is constructed using Blender Cycles renderer. Experiments results demonstrate that our method outperforms the state-of-the-art methods both qualitatively and quantitatively. DSAF-Net is available at https://github.com/PeimingCHEN/DSAF-Net. And WHU-MFM is available at https://github.com/PeimingCHEN/WHU-MFM-Dataset. • Proposed an end-to-end neural network based on defocus and similarity attention. • It can fuse several multi-focus and misaligned images simultaneously into a clear image. • Designed an attentional aggregation module for reducing pixel collisions and holes. • Built a first-of-its-kind multi-focus and misaligned close-range photography dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.