925 results on '"lightweight network"'
Search Results
2. LGCANet: lightweight hand pose estimation network based on HRNet.
- Author
-
Pan, Xiaoying, Li, Shoukun, Wang, Hao, Wang, Beibei, and Wang, Haoyi
- Subjects
- *
FEATURE extraction , *DEEP learning , *COMPUTER vision , *APPLICATION software , *VIRTUAL reality , *COMPUTATIONAL complexity , *AUTONOMOUS vehicles - Abstract
Hand pose estimation is a fundamental task in computer vision with applications in virtual reality, gesture recognition, autonomous driving, and virtual surgery. Keypoint detection often relies on deep learning methods and high-resolution feature map representations to achieve accurate detection. The HRNet framework serves as the basis, but it presents challenges in terms of extensive parameter count and demanding computational complexity due to high-resolution representations. To mitigate these challenges, we propose a lightweight keypoint detection network called LGCANet (Lightweight Ghost-Coordinate Attention Network). This network primarily consists of a lightweight feature extraction head for initial feature extraction and multiple lightweight foundational network modules called GCAblocks. GCAblocks introduce linear transformations to generate redundant feature maps while concurrently considering inter-channel relationships and long-range positional information using a coordinate attention mechanism. Validation on the RHD dataset and the COCO-WholeBody-Hand dataset shows that LGCANet reduces the number of parameters by 65.9% and GFLOPs by 72.6% while preserving the accuracy and improves the detection speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. A Highly Efficient and Lightweight Detection Method for Steel Surface Defect.
- Author
-
Xu, Changyu, Li, Jie, and Li, Xianguo
- Subjects
- *
SURFACE defects , *LIGHTWEIGHT steel , *STEEL , *FEATURE extraction - Abstract
The detection of steel surface defects is of great significance to steel production. In order to better meet the requirements of accuracy, real-time, and lightweight model, this paper proposes a highly efficient and lightweight steel surface defect detection method based on YOLOv5n. Firstly, ODMobileNetV2 composed of MobileNetV2 and ODConv is used as the backbone to improve the defect feature extraction capability. Secondly, GSConv is utilized in the neck to achieve deep information fusion through channel concatenation and shuffling, enhancing the ability of feature fusion. Finally, this paper proposes a spatial-channel reconstruction block (SCRB) designed to suppress redundant features and improve the representation ability of defect features through feature separation and reconstruction. Experimental results show that this method achieves 84.1% mAP and 109 FPS on the NEU-DET dataset, and 72.9% mAP and 110.1 FPS on the GC10-DET dataset, enabling accurate and efficient detection. Furthermore, the number of parameters is only 5.04M, which has a significant lightweight advantage. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Fast-Activated Minimal Gated Unit: Lightweight Processing and Feature Recognition for Multiple Mechanical Impact Signals.
- Author
-
Wang, Wenrui, Han, Dong, Duan, Xinyi, Yong, Yaxin, Wu, Zhengqing, Ma, Xiang, Zhang, He, and Dai, Keren
- Subjects
- *
MULTIBODY systems , *WAVELET transforms , *DYNAMICAL systems , *COMPUTATIONAL complexity , *SIGNALS & signaling - Abstract
Multiple dynamic impact signals are widely used in a variety of engineering scenarios and are difficult to identify accurately and quickly due to the signal adhesion phenomenon caused by nonlinear interference. To address this problem, an intelligent algorithm combining wavelet transforms with lightweight neural networks is proposed. First, the features of multiple impact signals are analyzed by establishing a transfer model for multiple impacts in multibody dynamical systems, and interference is suppressed using wavelet transformation. Second, a lightweight neural network, i.e., fast-activated minimal gated unit (FMGU), is elaborated for multiple impact signals, which can reduce computational complexity and improve real-time performance. Third, the experimental results show that the proposed method maintains excellent feature recognition results compared to gate recurrent unit (GRU) and long short-term memory (LSTM) networks under all test datasets with varying impact speeds, while its metrics for computational complexity are 50% lower than those of the GRU and LSTM. Therefore, the proposed method is of great practical value for weak hardware application platforms that require the accurate identification of multiple dynamic impact signals in real time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. LTSCD-YOLO: A Lightweight Algorithm for Detecting Typical Satellite Components Based on Improved YOLOv8.
- Author
-
Tang, Zixuan, Zhang, Wei, Li, Junlin, Liu, Ran, Xu, Yansong, Chen, Siyu, Fang, Zhiyue, and Zhao, Fuchenglong
- Subjects
- *
SPACE environment , *EXTRATERRESTRIAL resources , *ALGORITHMS , *GENERALIZATION , *NECK - Abstract
Typical satellite component detection is an application-valuable and challenging research field. Currently, there are many algorithms for detecting typical satellite components, but due to the limited storage space and computational resources in the space environment, these algorithms generally have the problem of excessive parameter count and computational load, which hinders their effective application in space environments. Furthermore, the scale of datasets used by these algorithms is not large enough to train the algorithm models well. To address the above issues, this paper first applies YOLOv8 to the detection of typical satellite components and proposes a Lightweight Typical Satellite Components Detection algorithm based on improved YOLOv8 (LTSCD-YOLO). Firstly, it adopts the lightweight network EfficientNet-B0 as the backbone network to reduce the model's parameter count and computational load; secondly, it uses a Cross-Scale Feature-Fusion Module (CCFM) at the Neck to enhance the model's adaptability to scale changes; then, it integrates Partial Convolution (PConv) into the C2f (Faster Implementation of CSP Bottleneck with two convolutions) module and Re-parameterized Convolution (RepConv) into the detection head to further achieve model lightweighting; finally, the Focal-Efficient Intersection over Union (Focal-EIoU) is used as the loss function to enhance the model's detection accuracy and detection speed. Additionally, a larger-scale Typical Satellite Components Dataset (TSC-Dataset) is also constructed. Our experimental results show that LTSCD-YOLO can maintain high detection accuracy with minimal parameter count and computational load. Compared to YOLOv8s, LTSCD-YOLO improved the mean average precision (mAP50) by 1.50% on the TSC-Dataset, reaching 94.5%. Meanwhile, the model's parameter count decreased by 78.46%, the computational load decreased by 65.97%, and the detection speed increased by 17.66%. This algorithm achieves a balance between accuracy and light weight, and its generalization ability has been validated on real images, making it effectively applicable to detection tasks of typical satellite components in space environments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. YOLOv5s-BiPCNeXt, a Lightweight Model for Detecting Disease in Eggplant Leaves.
- Author
-
Xie, Zhedong, Li, Chao, Yang, Zhuang, Zhang, Zhen, Jiang, Jiazhuo, and Guo, Hongyu
- Subjects
INTERPOLATION algorithms ,PLANT diseases ,PLANT identification ,COMPUTATIONAL complexity ,DISEASE mapping ,EGGPLANT - Abstract
Ensuring the healthy growth of eggplants requires the precise detection of leaf diseases, which can significantly boost yield and economic income. Improving the efficiency of plant disease identification in natural scenes is currently a crucial issue. This study aims to provide an efficient detection method suitable for disease detection in natural scenes. A lightweight detection model, YOLOv5s-BiPCNeXt, is proposed. This model utilizes the MobileNeXt backbone to reduce network parameters and computational complexity and includes a lightweight C3-BiPC neck module. Additionally, a multi-scale cross-spatial attention mechanism (EMA) is integrated into the neck network, and the nearest neighbor interpolation algorithm is replaced with the content-aware feature recombination operator (CARAFE), enhancing the model's ability to perceive multidimensional information and extract multiscale disease features and improving the spatial resolution of the disease feature map. These improvements enhance the detection accuracy for eggplant leaves, effectively reducing missed and incorrect detections caused by complex backgrounds and improving the detection and localization of small lesions at the early stages of brown spot and powdery mildew diseases. Experimental results show that the YOLOv5s-BiPCNeXt model achieves an average precision (AP) of 94.9% for brown spot disease, 95.0% for powdery mildew, and 99.5% for healthy leaves. Deployed on a Jetson Orin Nano edge detection device, the model attains an average recognition speed of 26 FPS (Frame Per Second), meeting real-time requirements. Compared to other algorithms, YOLOv5s-BiPCNeXt demonstrates superior overall performance, accurately detecting plant diseases under natural conditions and offering valuable technical support for the prevention and treatment of eggplant leaf diseases. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. RS-Xception: A Lightweight Network for Facial Expression Recognition.
- Author
-
Liao, Liefa, Wu, Shouluan, Song, Chao, and Fu, Jianglong
- Subjects
EMOTION recognition ,FACIAL expression ,ARTIFICIAL intelligence ,SENTIMENT analysis ,ACTING education - Abstract
Facial expression recognition (FER) utilizes artificial intelligence for the detection and analysis of human faces, with significant applications across various scenarios. Our objective is to deploy the facial emotion recognition network on mobile devices and extend its application to diverse areas, including classroom effect monitoring, human–computer interaction, specialized training for athletes (such as in figure skating and rhythmic gymnastics), and actor emotion training. Recent studies have employed advanced deep learning models to address this task, though these models often encounter challenges like subpar performance and an excessive number of parameters that do not align with the requirements of FER for embedded devices. To tackle this issue, we have devised a lightweight network structure named RS-Xception, which is straightforward yet highly effective. Drawing on the strengths of ResNet and SENet, this network integrates elements from the Xception architecture. Our models have been trained on FER2013 datasets and demonstrate superior efficiency compared to conventional network models. Furthermore, we have assessed the model's performance on the CK+, FER2013, and Bigfer2013 datasets, achieving accuracy rates of 97.13%, 69.02%, and 72.06%, respectively. Evaluation on the complex RAF-DB dataset yielded an accuracy rate of 82.98%. The incorporation of transfer learning notably enhanced the model's accuracy, with a performance of 75.38% on the Bigfer2013 dataset, underscoring its significance in our research. In conclusion, our proposed model proves to be a viable solution for precise sentiment detection and estimation. In the future, our lightweight model may be deployed on embedded devices for research purposes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Lightweight Road Damage Detection Network Based on YOLOv5.
- Author
-
Jingwei Zhao, Ye Tao, Zhixian Zhang, Chao Huang, and Wenhua Cui
- Abstract
The field of computer vision has experienced rapid progress owing to deep learning. The importance of road damage detection in ensuring traffic safety and reducing road maintenance costs is becoming increasingly evident. For detecting road damage, the YOLOv5 algorithm provides a reliable and effective method. However, YOLOv5 still requires a significant amount of computation. This paper proposes a lightweight network for detecting road damage that improves upon the YOLOv5 model in four ways. The algorithm accurately identifies and classifies different types of road damage, while simultaneously reducing the number of parameters and required computations. First, lightweight processing of the model is achieved. The Ghost module and Ghost Bottleneck are employed to construct the novel GBS module and C3Ghost, which replace the existing CBS and C3 modules. Second, the CIoU loss function is transformed into SIoU to improve the precision of target box regression. Furthermore, the original upsampling module is replaced by CARAFE to improve the model's semantic adaptability and receptive field. Finally, the CBAM attention mechanism is employed to concentrate on crucial feature information. The experiment's findings present that, in comparison to the baseline model, the upgraded model has 41.8% fewer parameters. Additionally, there has been a 43.8% reduction in floating-point computation and an improvement of 0.2% in detection accuracy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
9. Image super‐resolution via dynamic network.
- Author
-
Tian, Chunwei, Zhang, Xuanyu, Zhang, Qi, Yang, Mingming, and Ju, Zhaojie
- Subjects
HIGH resolution imaging ,CONVOLUTIONAL neural networks ,DIGITAL technology - Abstract
Convolutional neural networks depend on deep network architectures to extract accurate information for image super‐resolution. However, obtained information of these convolutional neural networks cannot completely express predicted high‐quality images for complex scenes. A dynamic network for image super‐resolution (DSRNet) is presented, which contains a residual enhancement block, wide enhancement block, feature refinement block and construction block. The residual enhancement block is composed of a residual enhanced architecture to facilitate hierarchical features for image super‐resolution. To enhance robustness of obtained super‐resolution model for complex scenes, a wide enhancement block achieves a dynamic architecture to learn more robust information to enhance applicability of an obtained super‐resolution model for varying scenes. To prevent interference of components in a wide enhancement block, a refinement block utilises a stacked architecture to accurately learn obtained features. Also, a residual learning operation is embedded in the refinement block to prevent long‐term dependency problem. Finally, a construction block is responsible for reconstructing high‐quality images. Designed heterogeneous architecture can not only facilitate richer structural information, but also be lightweight, which is suitable for mobile digital devices. Experimental results show that our method is more competitive in terms of performance, recovering time of image super‐resolution and complexity. The code of DSRNet can be obtained at https://github.com/hellloxiaotian/DSRNet. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. A Lightweight YOLOv8 Model for Apple Leaf Disease Detection.
- Author
-
Gao, Lijun, Zhao, Xing, Yue, Xishen, Yue, Yawei, Wang, Xiaoqiang, Wu, Huanhuan, and Zhang, Xuedong
- Subjects
MOBILE apps ,APPLE growing ,PLANT diseases ,COMPUTATIONAL complexity ,ALGORITHMS - Abstract
China holds the top position globally in apple production and consumption. Detecting diseases during the planting process is crucial for increasing yields and promoting the rapid development of the apple industry. This study proposes a lightweight algorithm for apple leaf disease detection in natural environments, which is conducive to application on mobile and embedded devices. Our approach modifies the YOLOv8n framework to improve accuracy and efficiency. Key improvements include replacing conventional Conv layers with GhostConv and parts of the C2f structure with C3Ghost, reducing the model's parameter count, and enhancing performance. Additionally, we integrate a Global attention mechanism (GAM) to improve lesion detection by more accurately identifying affected areas. An improved Bi-Directional Feature Pyramid Network (BiFPN) is also incorporated for better feature fusion, enabling more effective detection of small lesions in complex environments. Experimental results show a 32.9% reduction in computational complexity and a 39.7% reduction in model size to 3.8 M, with performance metrics improving by 3.4% to a mAP@0.5 of 86.9%. Comparisons with popular models like YOLOv7-Tiny, YOLOv6, YOLOv5s, and YOLOv3-Tiny demonstrate that our YOLOv8n–GGi model offers superior detection accuracy, the smallest size, and the best overall performance for identifying critical apple diseases. It can serve as a guide for implementing real-time crop disease detection on mobile and embedded devices. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. A Lightweight CER-YOLOv5s Algorithm for Detection of Construction Vehicles at Power Transmission Lines.
- Author
-
Yu, Pingping, Yan, Yuting, Tang, Xinliang, Shang, Yan, and Su, He
- Subjects
ELECTRIC lines ,FEATURE extraction ,PYRAMIDS ,ALGORITHMS - Abstract
In the context of power-line scenarios characterized by complex backgrounds and diverse scales and shapes of targets, and addressing issues such as large model parameter sizes, insufficient feature extraction, and the susceptibility to missing small targets in engineering-vehicle detection tasks, a lightweight detection algorithm termed CER-YOLOv5s is firstly proposed. The C3 module was restructured by embedding a lightweight Ghost bottleneck structure and convolutional attention module, enhancing the model's ability to extract key features while reducing computational costs. Secondly, an E-BiFPN feature pyramid network is proposed, utilizing channel attention mechanisms to effectively suppress background noise and enhance the model's focus on important regions. Bidirectional connections were introduced to optimize the feature fusion paths, improving the efficiency of multi-scale feature fusion. At the same time, in the feature fusion part, an ERM (enhanced receptive module) was added to expand the receptive field of shallow feature maps through multiple convolution repetitions, enhancing the global information perception capability in relation to small targets. Lastly, a Soft-DIoU-NMS suppression algorithm is proposed to improve the candidate box selection mechanism, addressing the issue of suboptimal detection of occluded targets. The experimental results indicated that compared with the baseline YOLOv5s algorithm, the improved algorithm reduced parameters and computations by 27.8% and 31.9%, respectively. The mean average precision (mAP) increased by 2.9%, reaching 98.3%. This improvement surpasses recent mainstream algorithms and suggests stronger robustness across various scenarios. The algorithm meets the lightweight requirements for embedded devices in power-line scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. A Efficient and Accurate UAV Detection Method Based on YOLOv5s.
- Author
-
Feng, Yunsong, Wang, Tong, Jiang, Qiangfu, Zhang, Chi, Sun, Shaohang, and Qian, Wangjiahe
- Subjects
FEATURE extraction ,DRONE aircraft ,ALGORITHMS ,NECK - Abstract
Due to the limited computational resources of portable devices, target detection models for drone detection face challenges in real-time deployment. To enhance the detection efficiency of low, slow, and small unmanned aerial vehicles (UAVs), this study introduces an efficient drone detection model based on YOLOv5s (EDU-YOLO), incorporating lightweight feature extraction and balanced feature fusion modules. The model employs the ShuffleNetV2 network and coordinate attention mechanisms to construct a lightweight backbone network, significantly reducing the number of model parameters. It also utilizes a bidirectional feature pyramid network and ghost convolutions to build a balanced neck network, enriching the model's representational capacity. Additionally, a new loss function, EloU, replaces CIoU to improve the model's positioning accuracy and accelerate network convergence. Experimental results indicate that, compared to the YOLOv5s algorithm, our model only experiences a minimal decrease in mAP by 1.1%, while reducing GFLOPs from 16.0 to 2.2 and increasing FPS from 153 to 188. This provides a substantial foundation for networked optoelectronic detection of UAVs and similar slow-moving aerial targets, expanding the defensive perimeter and enabling earlier warnings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. TinyCount: an efficient crowd counting network for intelligent surveillance.
- Author
-
Lee, Hyeonbeen and Lee, Jangho
- Abstract
Crowd counting, the task of estimating the total number of people in an image, is essential for intelligent surveillance. Integrating a well-trained crowd counting network into edge devices, such as intelligent CCTV systems, enables its application across various domains, including the prevention of crowd collapses and urban planning. For a model to be embedded in edge devices, it requires robust performance, reduced parameter count, and faster response times. This study proposes a lightweight and powerful model called TinyCount, which has only 60k parameters. The proposed TinyCount is a fully convolutional network consisting of a feature extract module (FEM) for robust and rapid feature extraction, a scale perception module (SPM) for scale variation perception and an upsampling module (UM) that adjusts the feature map to the same size as the original image. TinyCount demonstrated competitive performance across three representative crowd counting datasets, despite utilizing approximately 3.33 to 271 times fewer parameters than other crowd counting approaches. The proposed model achieved relatively fast inference times by leveraging the MobileNetV2 architecture with dilated and transposed convolutions. The application of SEblock and findings from existing studies further proved its effectiveness. Finally, we evaluated the proposed TinyCount on multiple edge devices, including the Raspberry Pi 4, NVIDIA Jetson Nano, and NVIDIA Jetson AGX Xavier, to demonstrate its potential for practical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Lightweight improved residual network for efficient inverse tone mapping.
- Author
-
Xue, Liqi, Xu, Tianyi, Song, Yongbao, Liu, Yan, Zhang, Lei, Zhen, Xiantong, and Xu, Jun
- Subjects
IMAGE reconstruction ,IMAGE reconstruction algorithms ,HIGH dynamic range imaging ,EVERYDAY life - Abstract
The display devices like HDR10 televisions are increasingly prevalent in our daily life for visualizing high dynamic range (HDR) images. But the majority of media images on the internet remain in 8-bit standard dynamic range (SDR) format. Therefore, converting SDR images to HDR ones by inverse tone mapping (ITM) is crucial to unlock the full potential of abundant media images. However, existing ITM methods are usually developed with complex network architectures requiring huge computational costs. In this paper, we propose a lightweight Improved Residual Network (IRNet) by enhancing the power of popular residual block for efficient ITM. Specifically, we propose a new Improved Residual Block (IRB) to extract and fuse multi-layer features for fine-grained HDR image reconstruction. Experiments on three benchmark datasets demonstrate that our IRNet achieves state-of-the-art performance on both the ITM and joint SR-ITM tasks. The code, models and data will be publicly available at https://github.com/ThisisVikki/ITM-baseline. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. LSRN-AED: lightweight super-resolution network based on asymmetric encoder–decoder.
- Author
-
Huang, Shuying, Li, Wei, Yang, Yong, Wan, Weiguo, and Lai, Houzeng
- Subjects
- *
ARTIFICIAL neural networks , *FEATURE extraction , *TRANSFORMER models , *HIGH resolution imaging - Abstract
Due to limited memory and computing resources, the application of deep neural networks on embedded and mobile devices is still a great challenge. To tackle this problem, this paper proposes a lightweight super-resolution network based on asymmetric encoder–decoder (LSRN-AED), which achieves better performance while reducing model parameters and computation. On the basis of rethinking the roles of encoder and decoder, an asymmetric encoder–decoder (AED) composed of complex encoders and simple decoders is designed to achieve feature extraction and reconstruction. Here, the decoder only adopts one inverted residual block, which can reduce the computational cost of the model and the redundancy of mapping features. For the encoder, inspired by the Transformer structure, an epiphany encoder is designed to realize the feature extraction and representation. In the encoder, a multi-way epiphany attention module (MEAM) is constructed, in which inverted residual blocks are used to replace traditional residual blocks to extract features and reduce model complexity. To realize the selection and enhancement of spatial features, an epiphany attention block (EAB) is designed by exploiting depth-wise convolutions which can learn the significant spatial information of the feature maps. Experimental results demonstrate that the proposed LSRN-AED can achieve better performance at lower parameter cost and outperform some existing state-of-the-art lightweight models. For example, compared to the advanced SMSR method, the proposed LSRN-AED has better evaluation metrics while reducing the number of parameters by 45%, 44%, and 44%, and FLOPs by 44%, 42%, and 41% on the × 2/3/4 SR tasks, respectively. The code will be published on GitHub after our paper is accepted for publication. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Deep Learning-Based Dynamic Region of Interest Autofocus Method for Grayscale Image.
- Author
-
Wang, Yao, Wu, Chuan, Gao, Yunlong, and Liu, Huiying
- Subjects
- *
DEEP learning , *GRAYSCALE model , *COST effectiveness - Abstract
In the field of autofocus for optical systems, although passive focusing methods are widely used due to their cost-effectiveness, fixed focusing windows and evaluation functions in certain scenarios can still lead to focusing failures. Additionally, the lack of datasets limits the extensive research of deep learning methods. In this work, we propose a neural network autofocus method with the capability of dynamically selecting the region of interest (ROI). Our main work is as follows: first, we construct a dataset for automatic focusing of grayscale images; second, we transform the autofocus issue into an ordinal regression problem and propose two focusing strategies: full-stack search and single-frame prediction; and third, we construct a MobileViT network with a linear self-attention mechanism to achieve automatic focusing on dynamic regions of interest. The effectiveness of the proposed focusing method is verified through experiments, and the results show that the focusing MAE of the full-stack search can be as low as 0.094, with a focusing time of 27.8 ms, and the focusing MAE of the single-frame prediction can be as low as 0.142, with a focusing time of 27.5 ms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. ELA-Net: An Efficient Lightweight Attention Network for Skin Lesion Segmentation.
- Author
-
Nie, Tianyu, Zhao, Yishi, and Yao, Shihong
- Subjects
- *
SKIN imaging , *FEATURE extraction , *IMAGE segmentation , *DATA mining , *MEDICAL equipment , *IMAGE processing - Abstract
In clinical conditions limited by equipment, attaining lightweight skin lesion segmentation is pivotal as it facilitates the integration of the model into diverse medical devices, thereby enhancing operational efficiency. However, the lightweight design of the model may face accuracy degradation, especially when dealing with complex images such as skin lesion images with irregular regions, blurred boundaries, and oversized boundaries. To address these challenges, we propose an efficient lightweight attention network (ELANet) for the skin lesion segmentation task. In ELANet, two different attention mechanisms of the bilateral residual module (BRM) can achieve complementary information, which enhances the sensitivity to features in spatial and channel dimensions, respectively, and then multiple BRMs are stacked for efficient feature extraction of the input information. In addition, the network acquires global information and improves segmentation accuracy by putting feature maps of different scales through multi-scale attention fusion (MAF) operations. Finally, we evaluate the performance of ELANet on three publicly available datasets, ISIC2016, ISIC2017, and ISIC2018, and the experimental results show that our algorithm can achieve 89.87%, 81.85%, and 82.87% of the mIoU on the three datasets with a parametric of 0.459 M, which is an excellent balance between accuracy and lightness and is superior to many existing segmentation methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Enhanced Hybrid Vision Transformer with Multi-Scale Feature Integration and Patch Dropping for Facial Expression Recognition.
- Author
-
Li, Nianfeng, Huang, Yongyuan, Wang, Zhenyan, Fan, Ziyao, Li, Xinyuan, and Xiao, Zhiguo
- Subjects
- *
TRANSFORMER models , *FACIAL expression , *CONVOLUTIONAL neural networks , *FEATURE extraction - Abstract
Convolutional neural networks (CNNs) have made significant progress in the field of facial expression recognition (FER). However, due to challenges such as occlusion, lighting variations, and changes in head pose, facial expression recognition in real-world environments remains highly challenging. At the same time, methods solely based on CNN heavily rely on local spatial features, lack global information, and struggle to balance the relationship between computational complexity and recognition accuracy. Consequently, the CNN-based models still fall short in their ability to address FER adequately. To address these issues, we propose a lightweight facial expression recognition method based on a hybrid vision transformer. This method captures multi-scale facial features through an improved attention module, achieving richer feature integration, enhancing the network's perception of key facial expression regions, and improving feature extraction capabilities. Additionally, to further enhance the model's performance, we have designed the patch dropping (PD) module. This module aims to emulate the attention allocation mechanism of the human visual system for local features, guiding the network to focus on the most discriminative features, reducing the influence of irrelevant features, and intuitively lowering computational costs. Extensive experiments demonstrate that our approach significantly outperforms other methods, achieving an accuracy of 86.51% on RAF-DB and nearly 70% on FER2013, with a model size of only 3.64 MB. These results demonstrate that our method provides a new perspective for the field of facial expression recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Task-Sensitive Efficient Feature Extraction Network for Oriented Object Detection in Remote Sensing Images.
- Author
-
Liu, Zhe, He, Guiqing, Dong, Liheng, Jing, Donglin, and Zhang, Haixi
- Subjects
- *
OBJECT recognition (Computer vision) , *REMOTE sensing , *FEATURE extraction , *CONVOLUTIONAL neural networks , *REMOTE-sensing images - Abstract
The widespread application of convolutional neural networks (CNNs) has led to significant advancements in object detection. However, challenges remain in achieving efficient and precise extraction of critical features when applying typical CNN-based methods to remote sensing detection tasks: (1) The convolutional kernels sliding horizontally in the backbone are misaligned with the features of arbitrarily oriented objects. Additionally, the detector shares the features extracted from the backbone, but the classification task requires orientation-invariant features while the regression task requires orientation-sensitive features. The inconsistency in feature requirements makes it difficult for the detector to extract the critical features required for each task. (2) The use of deeper convolutional structures can improve the detection accuracy, but it also results in substantial convolutional computations and feature redundancy, leading to inefficient feature extraction. To address this issue, we propose a Task-Sensitive Efficient Feature Extraction Network (TFE-Net). Specifically, we propose a special mixed fast convolution module for constructing an efficient network architecture that employs cheap transform operations to replace some of the convolution operations, generating more features with fewer parameters and computation resources. Next, we introduce the task-sensitive detection module, which first aligns the convolutional features with the targets using adaptive dynamic convolution based on the orientation of the targets. The task-sensitive feature decoupling mechanism is further designed to extract orientation-sensitive features and orientation-invariant features from the aligned features and feed them into the regression and classification branches, respectively, which provide the critical features needed for different tasks, thus improving the detection performance comprehensively. In addition, in order to make the training process more stable, we propose a balanced loss function to balance the gradients generated by different samples. Extensive experiments demonstrate that our proposed TFE-Net can achieve superior performance and obtain an effective balance between detection speed and accuracy on DOTA, UCAS-AOD, and HRSC2016. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Recognition of coal and gangue under low illumination based on SG-YOLO model.
- Author
-
Shang, Deyong, Yang, Zhiyuan, and Lv, Zhibin
- Subjects
- *
COAL , *RECOGNITION (Psychology) , *LIGHTING , *FEATURE extraction , *LEARNING ability - Abstract
For the low illumination and dust in the coal and gangue identification site environment, which leads to poor recognition, an improved lightweight low-illumination gangue recognition algorithm based on YOLOv5s model is proposed: SG-YOLO algorithm. The original backbone network is replaced by GhostNet, a lightweight network, to optimize the feature extraction structure, reduce the model parameters, and decrease the computational power of the model; the SimAM attention mechanism module is introduced in the head part of the model to enhance the learning ability of coal and gangue features. Experiments show that compared with the YOLOv5s model, the improved model has a mAP of 97.0% on the gangue dataset, which improved by 1.4%. The size of the model is compressed to 55% of the original. The number of parameters is reduced by 47.6%, and the computational effort is reduced by 49.4%. Meanwhile, the recognition accuracy of the improved SG-YOLO model for coal and gangue under low illumination is 96.5% and 98.5% respectively, which effectively improves the recognition accuracy of coal rain gangue under low illumination environment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. FireYOLO-Lite: Lightweight Forest Fire Detection Network with Wide-Field Multi-Scale Attention Mechanism.
- Author
-
Sheng, Sha, Liang, Zhengyin, Xu, Wenxing, Wang, Yong, and Su, Jiangdan
- Subjects
FEATURE extraction ,FOREST fires ,DEEP learning ,ALGORITHMS ,DETECTORS - Abstract
A lightweight forest fire detection model based on YOLOv8 is proposed in this paper in response to the problems existing in traditional sensors for forest fire detection. The performance of traditional sensors is easily constrained by hardware computing power, and their adaptability in different environments needs improvement. To balance the accuracy and speed of fire detection, the GhostNetV2 lightweight network is adopted to replace the backbone network for feature extraction of YOLOv8. The Ghost module is utilized to replace traditional convolution operations, conducting feature extraction independently in different dimensional channels, significantly reducing the complexity of the model while maintaining excellent performance. Additionally, an improved CPDCA channel priority attention mechanism is proposed, which extracts spatial features through dilated convolution, thereby reducing computational overhead and enabling the model to focus more on fire targets, achieving more accurate detection. In response to the problem of small targets in fire detection, the Inner IoU loss function is introduced. By adjusting the size of the auxiliary bounding boxes, this function effectively enhances the convergence effect of small target detection, further reducing missed detections, and improving overall detection accuracy. Experimental results indicate that, compared with traditional methods, the algorithm proposed in this paper significantly improves the average precision and FPS of fire detection while maintaining a smaller model size. Through experimental analysis, compared with YOLOv3-tiny, the average precision increased by 5.9% and the frame rate reached 285.3 FPS when the model size was only 4.9 M; compared with Shufflenet, the average precision increased by 2.9%, and the inference speed tripled. Additionally, the algorithm effectively addresses false positives, such as cloud and reflective light, further enhancing the detection of small targets and reducing missed detections. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. A Forest Fire Smoke Monitoring System Based on a Lightweight Neural Network for Edge Devices.
- Author
-
Huang, Jingwen, Yang, Huizhou, Liu, Yunfei, and Liu, Han
- Subjects
COMPUTER vision ,ARTIFICIAL intelligence ,FOREST fires ,RUNNING speed ,ARTIFICIAL vision ,DEEP learning - Abstract
Forest resources are one of the indispensable resources of the earth, which are the basis for the survival and development of human society. With the swift advancements in computer vision and artificial intelligence technology, the utilization of deep learning for smoke detection has achieved remarkable results. However, the existing deep learning models have poor performance in forest scenes and are difficult to deploy because of numerous parameters. Hence, we introduce an optimized forest fire smoke monitoring system for embedded edge devices based on a lightweight deep learning model. The model makes full use of the multi-scale variable attention mechanism of Transformer architecture to strengthen the ability of image feature extraction. Considering the needs of application scenarios, we propose an improved lightweight network model LCNet for feature extraction, which can reduce the parameters and enhance detecting ability. In order to improve running speed, a simple semi-supervised label knowledge distillation scheme is used to enhance the overall detection capability. Finally, we design and implement a forest fire smoke detection system on an embedded device, including the Jetson NX hardware platform, high-definition camera, and detection software system. The lightweight model is transplanted to the embedded edge device to achieve rapid forest fire smoke detection. Also, an asynchronous processing framework is designed to make the system highly available and robust. The improved model reduces three-fourths of the parameters and increases speed by 3.4 times with similar accuracy to the original model. This demonstrates that our system meets the precision demand and detects smoke in time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Multi-Feature Fusion Recognition and Localization Method for Unmanned Harvesting of Aquatic Vegetables.
- Author
-
Guan, Xianping, Shi, Longyuan, Yang, Weiguang, Ge, Hongrui, Wei, Xinhua, and Ding, Yuhan
- Subjects
OBJECT recognition (Computer vision) ,RECOGNITION (Psychology) ,FEATURE extraction ,IMAGE recognition (Computer vision) ,DEEP learning - Abstract
The vision-based recognition and localization system plays a crucial role in the unmanned harvesting of aquatic vegetables. After field investigation, factors such as illumination, shading, and computational cost have become the main difficulties restricting the identification and positioning of Brasenia schreberi. Therefore, this paper proposes a new lightweight detection method, YOLO-GS, which integrates feature information from both RGB and depth images for recognition and localization tasks. YOLO-GS employs the Ghost convolution module as a replacement for traditional convolution and innovatively introduces the C3-GS, a cross-stage module, to effectively reduce parameters and computational costs. With the redesigned detection head structure, its feature extraction capability in complex environments has been significantly enhanced. Moreover, the model utilizes Focal EIoU as the regression loss function to mitigate the adverse effects of low-quality samples on gradients. We have developed a data set of Brasenia schreberi that covers various complex scenarios, comprising a total of 1500 images. The YOLO-GS model, trained on this dataset, achieves an average accuracy of 95.7%. The model size is 7.95 MB, with 3.75 M parameters and a 9.5 GFLOPS computational cost. Compared to the original YOLOv5s model, YOLO-GS improves recognition accuracy by 2.8%, reduces the model size and parameter number by 43.6% and 46.5%, and offers a 39.9% reduction in computational requirements. Furthermore, the positioning errors of picking points are less than 5.01 mm in the X direction, 3.65 mm in the Y direction, and 1.79 mm in the Z direction. As a result, YOLO-GS not only excels with high recognition accuracy but also exhibits low computational demands, enabling precise target identification and localization in complex environments so as to meet the requirements of real-time harvesting tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. UAV forest fire detection based on lightweight YOLOv5 model.
- Author
-
Zhou, Mengdong, Wu, Lei, Liu, Shuai, and Li, Jianjun
- Subjects
FOREST fires ,FOREST fire prevention & control ,WILDFIRE prevention ,DRONE aircraft - Abstract
In recent years, the frequent occurrence of forest fires has caused serious impact on the environment and economy. Fire detection has become a hot research direction. Despite the remarkable achievements, the unmanned aerial vehicle (UAV) still has some problems such as insufficient precision and excessive parameters. In order to improve the application ability of UAV in forest fire prevention and control, a lightweight target detection model based on YOLOv5 is proposed. The model is based on the overall structure of YOLOv5, MobileNetV3 is used as the backbone network, and semi-supervised knowledge distillation (SSLD) is used for training to improve the convergence speed and accuracy of the model. The final model size was reduced by 94.1% from 107.6 MB to 6.3 MB. mAP0.5 increased by 0.8% and mAP0.95 increased by 2.6%. The improved lightweight YOLOv5 model has fewer parameters and less computation, which confirms that MobileNetV3 has an excellent effect on the compression of model memory, and the semi-supervised knowledge distillation method is beneficial to improve the accuracy of the model. In the future, the accuracy of the model and the detection rate of the covered flame should be further improved. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. A Lightweight Network with Dual Encoder and Cross Feature Fusion for Cement Pavement Crack Detection.
- Author
-
Zhong Qu, Guoqing Mu, and Bin Yuan
- Subjects
CRACKING of pavements ,DENTAL cements ,DEEP learning ,CONVOLUTIONAL neural networks ,FEATURE extraction ,RECOMMENDER systems ,INFORMATION filtering - Abstract
Automatic crack detection of cement pavement chiefly benefits from the rapid development of deep learning, with convolutional neural networks (CNN) playing an important role in this field. However, as the performance of crack detection in cement pavement improves, the depth and width of the network structure are significantly increased, which necessitates more computing power and storage space. This limitation hampers the practical implementation of crack detection models on various platforms, particularly portable devices like small mobile devices. To solve these problems, we propose a dual-encoder-based network architecture that focuses on extracting more comprehensive fracture feature information and combines cross-fusion modules and coordinated attention mechanisms formore efficient feature fusion. Firstly, we use small channel convolution to construct shallow feature extractionmodule (SFEM) to extract low-level feature information of cracks in cement pavement images, in order to obtainmore information about cracks in the shallowfeatures of images. In addition,we construct large kernel atrous convolution (LKAC) to enhance crack information, which incorporates coordination attention mechanism for non-crack information filtering, and large kernel atrous convolution with different cores, using different receptive fields to extract more detailed edge and context information. Finally, the three-stage feature map outputs from the shallow feature extraction module is cross-fused with the two-stage feature map outputs from the large kernel atrous convolution module, and the shallow feature and detailed edge feature are fully fused to obtain the final crack prediction map. We evaluate our method on three public crack datasets: DeepCrack, CFD, and Crack500. Experimental results on theDeepCrack dataset demonstrate the effectiveness of our proposed method compared to state-of-the-art crack detection methods, which achieves Precision (P) 87.2%, Recall (R) 87.7%, and F-score (F1) 87.4%. Thanks to our lightweight crack detectionmodel, the parameter count of the model in real-world detection scenarios has been significantly reduced to less than 2M. This advancement also facilitates technical support for portable scene detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Deep Dynamic Weights for Underwater Image Restoration.
- Author
-
Awan, Hafiz Shakeel Ahmad and Mahmood, Muhammad Tariq
- Subjects
CONVOLUTIONAL neural networks ,STANDARD deviations ,IMAGE intensifiers ,ATTENUATION of light ,DEEP learning - Abstract
Underwater imaging presents unique challenges, notably color distortions and reduced contrast due to light attenuation and scattering. Most underwater image enhancement methods first use linear transformations for color compensation and then enhance the image. We observed that linear transformation for color compensation is not suitable for certain images. For such images, non-linear mapping is a better choice. This paper introduces a unique underwater image restoration approach leveraging a streamlined convolutional neural network (CNN) for dynamic weight learning for linear and non-linear mapping. In the first phase, a classifier is applied that classifies the input images as Type I or Type II. In the second phase, we use the Deep Line Model (DLM) for Type-I images and the Deep Curve Model (DCM) for Type-II images. For mapping an input image to an output image, the DLM creatively combines color compensation and contrast adjustment in a single step and uses deep lines for transformation, whereas the DCM employs higher-order curves. Both models utilize lightweight neural networks that learn per-pixel dynamic weights based on the input image's characteristics. Comprehensive evaluations on benchmark datasets using metrics like peak signal-to-noise ratio (PSNR) and root mean square error (RMSE) affirm our method's effectiveness in accurately restoring underwater images, outperforming existing techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. A lightweight algorithm for small traffic sign detection based on improved YOLOv5s.
- Author
-
Cai, Kunhui, Yang, Jingmin, Ren, Jinghui, and Zhang, Wenjie
- Abstract
With the rise of deep learning technology, significant progress has been made in object detection. Traffic sign detection is a research hotspot for object detection tasks. However, due to small size of traffic signs, there is room for further improvement in the comprehensive performance of the existing technology. In this paper, we propose a lightweight network based on yolov5s to achieve real-time localization and classification of small traffic signs. First, we improve the bottleneck transformers with 3 convolution (Bot3) module to enhance the backbone network's ability to extract features from small targets, improving the accuracy while reducing the number of parameters and giga floating-point operations per second (GFLOPs). Second, we introduce ghost convolution (GhostConv) to obtain redundant feature maps with cheap operations to further improve the model's efficiency. Finally, we use soft non-maximum suppression (Soft-NMS) in the detection phase to improve the model accuracy again without additional computational overhead for training. According to the tests on the Tsinghua-Tencent 100 K (TT100K) dataset, the proposed method outperforms the original YOLOv5s in small traffic sign detection, with an increase of 8.7% in m A P 50 , a reduction of 22.5% in parameter count, and a 17.2% reduction in computational complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Ultrafast‐and‐Ultralight ConvNet‐Based Intelligent Monitoring System for Diagnosing Early‐Stage Mpox Anytime and Anywhere.
- Author
-
Yue, Yubiao, Shi, Xiaoqiang, Qin, Li, Zhang, Xinyue, Xu, Jialong, Zheng, Zipei, Li, Zhenzhang, and Li, Yang
- Abstract
Due to the absence of more efficient diagnostic tools, the spread of mpox continues to be unchecked. Although related studies have demonstrated the high efficiency of deep learning models in diagnosing mpox, key aspects such as model inference speed and parameter size have always been overlooked. Herein, an ultrafast and ultralight network named Fast‐MpoxNet is proposed. Fast‐MpoxNet, with only 0.27 m parameters, can process input images at 68 frames per second (FPS) on the CPU. To detect subtle image differences and optimize model parameters better, Fast‐MpoxNet incorporates an attention‐based feature fusion module and a multiple auxiliary losses enhancement strategy. Experimental results indicate that Fast‐MpoxNet, utilizing transfer learning and data augmentation, produces 98.40% classification accuracy for four classes on the mpox dataset. Furthermore, its Recall for early‐stage mpox is 93.65%. Most importantly, an application system named Mpox‐AISM V2 is developed, suitable for both personal computers and smartphones. Mpox‐AISM V2 can rapidly and accurately diagnose mpox and can be easily deployed in various scenarios to offer the public real‐time mpox diagnosis services. This work has the potential to mitigate future mpox outbreaks and pave the way for developing real‐time diagnostic tools in the healthcare field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Efficient and Lightweight Neural Network for Hard Hat Detection.
- Author
-
He, Chenxi, Tan, Shengbo, Zhao, Jing, Ergu, Daji, Liu, Fangyao, Ma, Bo, and Li, Jianjun
- Subjects
HELMETS ,SAFETY hats ,COMPUTER vision ,VIDEO surveillance ,COMPUTER engineering ,FEATURE extraction - Abstract
Electric power operation, as one of the key fields in the world, faces particularly prominent safety issues. Ensuring the safety of operators has become the most fundamental requirement in power operation. However, there are some safety hazards in power construction. These hazards are mainly due to weak safety awareness among staff and the failure to standardize the wearing of safety helmets. In order to effectively address this situation, technical means such as video surveillance technology and computer vision technology can be utilized to monitor whether staff are wearing helmets and provide timely feedback. Such measures will greatly enhance the safety level of power operation. This paper proposes an improved lightweight helmet detection algorithm named YOLO-M3C. The algorithm first replaces the YOLOv5s backbone network with MobileNetV3, successfully reducing the model size from 13.7 MB to 10.2 MB, thereby increasing the model's detection speed from 42.0 frames per second to 55.6 frames per second. Then, the CA attention mechanism is introduced into the backbone network to enhance the feature extraction capability of the model. Finally, in order to further improve the detection recall rate and accuracy of the model, a knowledge distillation of the model was carried out. The experimental results show that, compared with the original YOLOv5s algorithm, the average accuracy of the improved YOLO-M3C algorithm is improved by 0.123, and the recall rate is the same. These results verify that the algorithm YOLO-M3C has excellent performance in target detection and recognition, which can improve accuracy and confidence, while reducing false detection and missing detection, and effectively meet the needs of helmet-wearing detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. LEO navigation observables extraction using CLOCFC network
- Author
-
Zhisen Wang, Hu Lu, and Zhiang Bian
- Subjects
Signals of opportunity ,Low earth orbit satellite communication ,Instantaneous Doppler positioning ,Lightweight network ,CFC network ,Medicine ,Science - Abstract
Abstract In case of mitigate the reliance of aviation users on the Global Navigation Satellite System (GNSS) in an increasingly interference-prone environment, utilizing opportunistic signals from Low-Earth Orbit (LEO) for navigation and positioning is an alternative approach. However, LEO satellite SOPs are not intended for navigation. Therefore, it is necessary to design methods to extract navigation observables from these signals. In this paper, we proposed a lightweight deep learning model with a two-branch structure called CLOCFC, designed to extract navigation observables. Furthermore, we have established a low Earth orbit satellite signal dataset by using ORBCOMM constellation signals as the input to the model and Doppler frequency as the label for the model. The results show that CLOCFC, as a lightweight model, demonstrates a significantly faster convergence rate and higher accuracy in navigation observables extraction compared to other models (ResNet, Swin Transformer, and Clo Transformer). In CLOCFC, we introduce the CFC module, a kind of Liquid Neural Network, to enhance the information acquisition capability through the spatiotemporal information in the data sequence. Finally, we have also conducted extensive experiments with the Doppler shift extraction of LEO satellites as an example, under various noise and resolution conditions, demonstrating the superiority of the CLOCFC.
- Published
- 2024
- Full Text
- View/download PDF
31. Image super‐resolution via dynamic network
- Author
-
Chunwei Tian, Xuanyu Zhang, Qi Zhang, Mingming Yang, and Zhaojie Ju
- Subjects
CNN ,dynamic network ,image super‐resolution ,lightweight network ,Computational linguistics. Natural language processing ,P98-98.5 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Convolutional neural networks depend on deep network architectures to extract accurate information for image super‐resolution. However, obtained information of these convolutional neural networks cannot completely express predicted high‐quality images for complex scenes. A dynamic network for image super‐resolution (DSRNet) is presented, which contains a residual enhancement block, wide enhancement block, feature refinement block and construction block. The residual enhancement block is composed of a residual enhanced architecture to facilitate hierarchical features for image super‐resolution. To enhance robustness of obtained super‐resolution model for complex scenes, a wide enhancement block achieves a dynamic architecture to learn more robust information to enhance applicability of an obtained super‐resolution model for varying scenes. To prevent interference of components in a wide enhancement block, a refinement block utilises a stacked architecture to accurately learn obtained features. Also, a residual learning operation is embedded in the refinement block to prevent long‐term dependency problem. Finally, a construction block is responsible for reconstructing high‐quality images. Designed heterogeneous architecture can not only facilitate richer structural information, but also be lightweight, which is suitable for mobile digital devices. Experimental results show that our method is more competitive in terms of performance, recovering time of image super‐resolution and complexity. The code of DSRNet can be obtained at https://github.com/hellloxiaotian/DSRNet.
- Published
- 2024
- Full Text
- View/download PDF
32. PSFHSP-Net: an efficient lightweight network for identifying pubic symphysis-fetal head standard plane from intrapartum ultrasound images.
- Author
-
Qiu, Ruiyu, Zhou, Mengqiang, Bai, Jieyun, Lu, Yaosheng, and Wang, Huijin
- Abstract
The accurate selection of the ultrasound plane for the fetal head and pubic symphysis is critical for precisely measuring the angle of progression. The traditional method depends heavily on sonographers manually selecting the imaging plane. This process is not only time-intensive and laborious but also prone to variability based on the clinicians' expertise. Consequently, there is a significant need for an automated method driven by artificial intelligence. To enhance the efficiency and accuracy of identifying the pubic symphysis-fetal head standard plane (PSFHSP), we proposed a streamlined neural network, PSFHSP-Net, based on a modified version of ResNet-18. This network comprises a single convolutional layer and three residual blocks designed to mitigate noise interference and bolster feature extraction capabilities. The model's adaptability was further refined by expanding the shared feature layer into task-specific layers. We assessed its performance against both traditional heavyweight and other lightweight models by evaluating metrics such as F1-score, accuracy (ACC), recall, precision, area under the ROC curve (AUC), model parameter count, and frames per second (FPS). The PSFHSP-Net recorded an ACC of 0.8995, an F1-score of 0.9075, a recall of 0.9191, and a precision of 0.9022. This model surpassed other heavyweight and lightweight models in these metrics. Notably, it featured the smallest model size (1.48 MB) and the highest processing speed (65.7909 FPS), meeting the real-time processing criterion of over 24 images per second. While the AUC of our model was 0.930, slightly lower than that of ResNet34 (0.935), it showed a marked improvement over ResNet-18 in testing, with increases in ACC and F1-score of 0.0435 and 0.0306, respectively. However, precision saw a slight decrease from 0.9184 to 0.9022, a reduction of 0.0162. Despite these trade-offs, the compression of the model significantly reduced its size from 42.64 to 1.48 MB and increased its inference speed by 4.4753 to 65.7909 FPS. The results confirm that the PSFHSP-Net is capable of swiftly and effectively identifying the PSFHSP, thereby facilitating accurate measurements of the angle of progression. This development represents a significant advancement in automating fetal imaging analysis, promising enhanced consistency and reduced operator dependency in clinical settings. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Real-time detection of mature table grapes using ESP-YOLO network on embedded platforms.
- Author
-
Chen, Jiaoliao, Chen, Huan, Xu, Fang, Lin, Mengnan, Zhang, Dan, and Zhang, Libin
- Subjects
- *
TABLE grapes , *OBJECT recognition (Computer vision) , *FEATURE extraction , *DEEP learning , *GRAPES - Abstract
The real-time and high-precision detection methods on embedded platforms are critical for harvesting robots to accurately locate the position of the table grapes. A novel detection method (ESP-YOLO) for the table grapes in the trellis structured orchards is proposed to improve the detection accuracy and efficiency based on You Only Look Once (YOLO), Efficient Layer Shuffle Aggregation Networks (ELSAN), Squeeze-and-Excitation (SE), Partial Convolution (PConv) and Soft Non-maximum suppression (Soft_NMS). According to cross-group information interchange, the channel shuffle operation is presented to modify transition layers instead of the CSPDarkNet53 (C3) in backbone networks for the table grape feature extraction. The PConv is utilised in the neck network to extract the part channel's features for the inference speed and spatial features. SE is inserted in backbone networks to adjust the channel weight for channel-wise features of grape images. Then, Soft_NMS is modified to enhance the segmentation capability for densely clustered grapes. The algorithm is conducted on embedded platforms to detect table grapes in complex scenarios, including the overlap of multi-grape adhesion and the occlusion of stems and leaves. ELSAN block boosts inference speed by 46% while maintaining accuracy. The mAP@0.5:0.95 of ESP-YOLO surpasses that of other advanced methods by 3.7%–16.8%. ESP-YOLO can be a useful tool for harvesting robots to detect table grapes accurately and quickly in various complex scenarios. • An improved YOLOv5s method was proposed for mature table grapes detection. • ELSAN was proposed to develop a lightweight model with high detection accuracy. • ESP-YOLO allowed accurate detection of overlap and distant shot table grapes. • ESP-YOLO achieved mAP of 98.3% with high detection speed on embedded platforms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Mobile-Deeplab: a lightweight pixel segmentation-based method for fabric defect detection.
- Author
-
Bai, Zichen and Jing, Junfeng
- Subjects
CONVOLUTIONAL neural networks ,COMPUTER vision ,PRODUCT quality ,SCARCITY ,TEXTILES - Abstract
Fabric defect detection has always been a key issue, and it positively correlated its efficiency with productivity. From manual visual methods to machine vision and deep learning-based techniques, a variety of methods have been studied to improve production efficiency and product quality. Although deep learning-based methods have proven to be powerful tools for segmentation, there are still many pressing issues that need to be addressed in practical applications. First, the scarcity of defective samples compared to normal samples can cause data imbalance and thus affect accuracy. Second, high real-time performance is also required in the actual detection process. To overcome these problems, we propose a high real-time convolutional neural network, named Mobile-Deeplab, to implement end-to-end defect segmentation. In addition, we proposed a loss function to consider the fabric image sample imbalance problem. We evaluated the performance of the model with two public structured datasets and three self-constructed structured datasets. The experimental results show that the segmentation method has better segmentation accuracy than other segmentation models, which verifies the segmentation effect of the method. In addition, 87.11 frames per second on a 256 × 256 size image meet industrial real-time requirements. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Development and optimization of object detection technology in pavement engineering: A literature review
- Author
-
Hui Yao, Yaning Fan, Yanhao Liu, Dandan Cao, Ning Chen, Tiancheng Luo, Jingyu Yang, Xueyi Hu, Jie Ji, and Zhanping You
- Subjects
Pavement engineering ,Object detection ,Lightweight network ,Attention mechanism ,Convolutional neural network ,Highway engineering. Roads and pavements ,TE1-450 ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Due to the rapid advancement of the transportation industry and the continual increase in pavement infrastructure, it is difficult to keep up with the huge road maintenance task by relying only on the traditional manual detection method. Intelligent pavement detection technology with deep learning techniques is available for the research and industry areas by the gradual development of computer vision technology. Due to the different characteristics of pavement distress and the uncertainty of the external environment, this kind of object detection technology for distress classification and location still faces great challenges. This paper discusses the development of object detection technology and analyzes classical convolutional neural network (CNN) architecture. In addition to the one-stage and two-stage object detection frameworks, object detection without anchor frames is introduced, which is divided according to whether the anchor box is used or not. This paper also introduces attention mechanisms based on convolutional neural networks and emphasizes the performance of these mechanisms to further enhance the accuracy of object recognition. Lightweight network architecture is introduced for mobile and industrial deployment. Since stereo cameras and sensors are rapidly developed, a detailed summary of three-dimensional object detection algorithms is also provided. While reviewing the history of the development of object detection, the scope of this review is not only limited to the area of pavement crack detection but also guidance for researchers in related fields is shared.
- Published
- 2024
- Full Text
- View/download PDF
36. A new lightweight network for efficient UAV object detection
- Author
-
Wei Hua, Qili Chen, and Wenbai Chen
- Subjects
Lightweight network ,Object detection ,Convolutional neural network ,Unmanned aerial vehicle (UAV) ,Medicine ,Science - Abstract
Abstract Optimizing the structure of deep neural networks is essential in many applications. Especially in the object detection tasks of Unmanned Aerial Vehicles. Due to the constraints of the onboard platform, a more efficient network is required to meet practical demands. Nevertheless, existing lightweight detection networks exhibit excessive redundant computations and may yield in a certain level of accuracy loss. To address these issues, this paper proposes a new lightweight network structure named Cross-Stage Partially Deformable Network (CSPDNet). The initial proposal consists of a Deformable Separable Convolution Block (DSCBlock), separating feature channels, greatly reducing the computational load of convolution, and applying adaptive sampling to the separated feature map. Subsequently, to establish information interaction between feature layers, a channel weighting module is proposed. This module calculates weights for the separated feature map, facilitating information exchange across channels and resolutions. Moreover, it compensates for the effect of point-wise (1 $$\times$$ × 1) convolutions, filtering out more important feature information. Furthermore, a new CSPDBlock is designed, primarily composed of DSCBlock, establishing multidimensional feature correlations for each separated feature layer. This approach improves the ability to capture critical feature information and reconstruct gradient paths, thereby preserving detection accuracy. The proposed technology achieves a balance between model parameter size and detection accuracy. Furthermore, experimental results on object detection datasets demonstrate that our designed network, using fewer parameters, achieves competitive detection performance results compared to existing lightweight networks YOLOv5n, YOLOv6n, YOLOv8n, NanoDet and PP-PicoDet. The optimization effect of the designed CSPDBlock, using the VisDrone dataset, is validated when incorporated into advanced detection algorithms YOLOv5m, PPYOLOEm, YOLOv7, RTMDetm and YOLOv8m. In more detail, by incorporating the designed modules it was achieved that the parameters were reduced by 10–20% while almost maintaining detection accuracy.
- Published
- 2024
- Full Text
- View/download PDF
37. Lightweight strip steel defect detection algorithm based on improved YOLOv7
- Author
-
Jianbo Lu, MiaoMiao Yu, and Junyu Liu
- Subjects
Deep learning ,YOLOv7 ,Lightweight network ,Strip surface defect detection ,D-SimSPPF ,Medicine ,Science - Abstract
Abstract The precise identification of surface imperfections in steel strips is crucial for ensuring steel product quality. To address the challenges posed by the substantial model size and computational complexity in current algorithms for detecting surface defects in steel strips, this paper introduces SS-YOLO (YOLOv7 for Steel Strip), an enhanced lightweight YOLOv7 model. This method replaces the CBS module in the backbone network with a lightweight MobileNetv3 network, reducing the model size and accelerating the inference time. The D-SimSPPF module, which integrates depth separable convolution and a parameter-free attention mechanism, was specifically designed to replace the original SPPCSPC module within the YOLOv7 network, expanding the receptive field and reducing the number of network parameters. The parameter-free attention mechanism SimAM is incorporated into both the neck network and the prediction output section, enhancing the ability of the model to extract essential features of strip surface defects and improving detection accuracy. The experimental results on the NEU-DET dataset show that SS-YOLO achieves a 97% mAP50 accuracy, which is a 4.5% improvement over that of YOLOv7. Additionally, there was a 79.3% reduction in FLOPs(G) and a 20.7% decrease in params. Thus, SS-YOLO demonstrates an effective balance between detection accuracy and speed while maintaining a lightweight profile.
- Published
- 2024
- Full Text
- View/download PDF
38. A ResNet mini architecture for brain age prediction
- Author
-
Xuan Zhang, Si-Yuan Duan, Si-Qi Wang, Yao-Wen Chen, Shi-Xin Lai, Ji-Sheng Zou, Yan Cheng, Ji-Tian Guan, Ren-Hua Wu, and Xiao-Lei Zhang
- Subjects
Brain age prediction ,MRI ,Deep learning ,Lightweight network ,ResNet ,Medicine ,Science - Abstract
Abstract The brain presents age-related structural and functional changes in the human life, with different extends between subjects and groups. Brain age prediction can be used to evaluate the development and aging of human brain, as well as providing valuable information for neurodevelopment and disease diagnosis. Many contributions have been made for this purpose, resorting to different machine learning methods. To solve this task and reduce memory resource consumption, we develop a mini architecture of only 10 layers by modifying the deep residual neural network (ResNet), named ResNet mini architecture. To support the ResNet mini architecture in brain age prediction, the brain age dataset (OpenNeuro #ds000228) that consists of 155 study participants (three classes) and the Alzheimer MRI preprocessed dataset that consists of 6400 images (four classes) are employed. We compared the performance of the ResNet mini architecture with other popular networks using the two considered datasets. Experimental results show that the proposed architecture exhibits generality and robustness with high accuracy and less parameter number.
- Published
- 2024
- Full Text
- View/download PDF
39. LBCapsNet: a lightweight balanced capsule framework for image classification of porcelain fragments
- Author
-
Ruoxue Li, Guohua Geng, Xizhi Wang, Yulin Qin, Yangyang Liu, Pengbo Zhou, and Haibo Zhang
- Subjects
Image classification ,Porcelain fragments ,Capsule network ,Lightweight network ,Cultural heritage digitization ,Fine Arts ,Analytical chemistry ,QD71-142 - Abstract
Abstract The image classification task of porcelain fragments is of great significance for the digital preservation of cultural heritage. However, common issues are encountered in the image processing of porcelain fragments, including the low computation speed, decreased accuracy due to the uneven distribution of sample categories, and model instability. This study proposes a novel Capsule Network model, referred to as LBCapsNet, which is suitable for the extraction of features from images of porcelain artifacts fragments. A bottleneck-like channel transformation module denoted by ChannelTrans, which resides between the convolutional layer and the PrimaryCaps layer, was first designed. This module is used to reduce the computational complexity and enhance the processing speed when dealing with intricate porcelain images. The MF-R loss function was then proposed by incorporating focal loss into the original loss function. This allows to address the issue of imbalanced distribution of ceramic shard samples and reduce the classification errors, which leads to faster convergence with smoother trend. Finally, an adaptive dynamic routing mechanism is designed with a dynamic learning rate to enhance the overall stability of the classification process. The experimental results obtained on public datasets, such as MNIST, Fashion- MNIST, CIFAR10, FMD and DTD as well as porcelain fragments dataset, demonstrate that LBCapsNet achieves high classification accuracy with faster and more stable computation compared with existing methods. Furthermore, the ability of LBCapsNet to process special textures can provide technical support for the digital preservation and restoration of cultural heritage.
- Published
- 2024
- Full Text
- View/download PDF
40. LUFFD-YOLO: A Lightweight Model for UAV Remote Sensing Forest Fire Detection Based on Attention Mechanism and Multi-Level Feature Fusion.
- Author
-
Han, Yuhang, Duan, Bingchen, Guan, Renxiang, Yang, Guang, and Zhen, Zhen
- Subjects
- *
FOREST fires , *REMOTE sensing , *REMOTE-sensing images , *FOREST fire prevention & control , *VIDEO surveillance , *WILDFIRES , *DRONE aircraft - Abstract
The timely and precise detection of forest fires is critical for halting the spread of wildfires and minimizing ecological and economic damage. However, the large variation in target size and the complexity of the background in UAV remote sensing images increase the difficulty of real-time forest fire detection. To address this challenge, this study proposes a lightweight YOLO model for UAV remote sensing forest fire detection (LUFFD-YOLO) based on attention mechanism and multi-level feature fusion techniques: (1) GhostNetV2 was employed to enhance the conventional convolution in YOLOv8n for decreasing the number of parameters in the model; (2) a plug-and-play enhanced small-object forest fire detection C2f (ESDC2f) structure was proposed to enhance the detection capability for small forest fires; (3) an innovative hierarchical feature-integrated C2f (HFIC2f) structure was proposed to improve the model's ability to extract information from complex backgrounds and the capability of feature fusion. The LUFFD-YOLO model surpasses the YOLOv8n, achieving a 5.1% enhancement in mAP and a 13% reduction in parameter count and obtaining desirable generalization on different datasets, indicating a good balance between high accuracy and model efficiency. This work would provide significant technical support for real-time forest fire detection using UAV remote-sensing images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. LMA-EEGNet: A Lightweight Multi-Attention Network for Neonatal Seizure Detection Using EEG signals.
- Author
-
Zhou, Weicheng, Zheng, Wei, Feng, Youbing, and Li, Xiaolong
- Subjects
FEATURE extraction ,ELECTROENCEPHALOGRAPHY ,EPILEPSY ,SEIZURES (Medicine) ,DEEP learning ,BRAIN damage ,COMPUTATIONAL complexity - Abstract
Neonatal epilepsy is an early postnatal brain disorder, and automatic seizure detection is crucial for timely diagnosis and treatment to reduce potential brain damage. This work proposes a novel Lightweight Multi-Attention Network, LMA-EEGNet, for diagnosing neonatal epileptic seizures from multi-channel EEG signals employing dilated depthwise separable convolution (DDS Conv) for feature extraction and using pointwise convolution followed by global average pooling for classification. The proposed approach substantially reduces the model size, number of parameters, and computational complexity, which are crucial for real-time detection and clinical diagnosis of neonatal epileptic seizures. LMA-EEGNet integrates temporal and spectral features through distinct temporal and spectral branches. The temporal branch uses DDS Conv to extract temporal features, enhanced by a channel attention mechanism. The spectral branch utilizes similar convolutions alongside a spatial attention mechanism to highlight key frequency components. Outputs from both branches are merged and processed through a pointwise convolution layer and a global average pooling layer for efficient neonatal seizure detection. Experimental results show that our model, with only 2471 parameters and a size of 23 KB, achieves an accuracy of 95.71% and an AUC of 0.9862, demonstrating its potential for practical deployment. This study provides an effective deep learning solution for the early detection of neonatal epileptic seizures, improving diagnostic accuracy and timeliness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Lightweight PCB defect detection algorithm based on MSD-YOLO.
- Author
-
Zhou, Guoao, Yu, Lijuan, Su, Yixin, Xu, Bingrong, and Zhou, Guoyuan
- Subjects
- *
MATHEMATICAL decoupling , *ALGORITHMS , *FEATURE extraction , *PROBLEM solving - Abstract
Aiming at the problems of low accuracy and slow detection rate of existing target detection algorithms for PCB defect detection, and too many parameters of algorithm model leading to the inability to deploy on mobile terminals, a PCB defect detection algorithm based on MSD-YOLOv5 is proposed. Firstly, to ensure both detection accuracy and speed while reducing the model's size,we combine the lightweight MobileNet-v3 network with the CSPDarknet53 network. Further,the attention mechanism is introduced to highlight the important feature channels and weaken the less useful ones, so as to improve the feature extraction ability of the network. Finally, the coupling detection head is replaced with a decoupling detection head, and the defect location information and category information on the PCB are extracted and learned respectively, which solves the problem of highly coupling of different information feature distributions and enhances the generalization ability of the model. We conduct experiments on a publicly available PCB defect dataset from Peking University using this algorithm. The results show that the proposed method reduces the parameters of YOLOv5 model by 46% and improves the detection accuracy by 3.34%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Lightweight multi-scale network with attention for accurate and efficient crowd counting.
- Author
-
Xi, Mengyuan and Yan, Hua
- Subjects
- *
COMPUTER vision , *FEATURE extraction , *CROWDS , *COUNTING - Abstract
Crowd counting is a significant task in computer vision, which aims to estimate the total number of people appeared in images or videos. However, it is still very challenging due to the huge scale variation and uneven density distributions in dense scenes. Moreover, although many works have been presented to tackle these issues, these methods always have a large number of parameters and high computation complexity, which leads to a limitation to the wide applications in edge devices. In this work, we propose a lightweight method for accurate and efficient crowd counting, called lightweight multi-scale network with attention. It is mainly composed of four parts: lightweight extractor, multi-scale features extraction module (MFEM), attention-based fusion module (ABFM), and efficient density map regressor. We design the MFEM and ABFM delicately to obtain rich scale representations, which is significantly beneficial for improving the counting accuracy. Moreover, the normalized union loss function is proposed to balance contribution of samples with diverse density distributions. Extensive experiments carried out on six mainstream crowd datasets demonstrate that our proposed method achieves superior performance to the other state-of-the-art methods with a small model size and low computational cost. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Research on Facial Expression Recognition Algorithm Based on Lightweight Transformer.
- Author
-
Jiang, Bin, Li, Nanxing, Cui, Xiaomei, Liu, Weihua, Yu, Zeqi, and Xie, Yongheng
- Subjects
- *
FACIAL expression , *ALGORITHMS - Abstract
To avoid the overfitting problem of the network model and improve the facial expression recognition effect of partially occluded facial images, an improved facial expression recognition algorithm based on MobileViT has been proposed. Firstly, in order to obtain features that are useful and richer for experiments, deep convolution operations are added to the inverted residual blocks of this network, thus improving the facial expression recognition rate. Then, in the process of dimension reduction, the activation function can significantly improve the convergence speed of the model, and then quickly reduce the loss error in the training process, as well as to preserve the effective facial expression features as much as possible and reduce the overfitting problem. Experimental results on RaFD, FER2013, and FER2013Plus show that this method has significant advantages over mainstream networks and the network achieves the highest recognition rate. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. A lightweight contour detection network inspired by biology.
- Author
-
Lin, Chuan, Zhang, Zhenguang, Peng, Jiansheng, Li, Fuzhang, Pan, Yongcai, and Zhang, Yuwei
- Subjects
COMPUTER vision ,PARALLEL processing ,CONVOLUTIONAL neural networks ,BIOLOGY ,FEATURE extraction ,DATA mining - Abstract
In recent years, the field of bionics has attracted the attention of numerous scholars. Some models combined with biological vision have achieved excellent performance in computer vision and image processing tasks. In this paper, we propose a new bio-inspired lightweight contour detection network (BLCDNet) by combining parallel processing mechanisms of bio-visual information with convolutional neural networks. The backbone network of BLCDNet simulates the parallel pathways of ganglion cell–lateral geniculate nucleus and primary visual cortex (V1) area, realizing parallel processing and step-by-step extraction of input information, effectively extracting local features and detailed features in images, and thus improving the overall performance of the model. In addition, we design a depth feature extraction module combining depth separable convolution and residual connection in the decoding network to integrate the output of the backbone network, which further improves the performance of the model. We conducted a large number of experiments on BSDS500 and NYUD datasets, and the experimental results show that the BLCDNet proposed in this paper achieves the best performance compared with traditional methods and previous biologically inspired contour detection methods. In addition, BLCDNet still outperforms some VGG-based contour detection methods without pre-training and with fewer parameters, and it is competitive among all of them. The research in this paper also provides a new idea for the combination of biological vision and convolutional neural networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Dehaze-UNet: A Lightweight Network Based on UNet for Single-Image Dehazing.
- Author
-
Zhou, Hao, Chen, Zekai, Li, Qiao, and Tao, Tao
- Subjects
HAZE ,PRIOR learning ,ATMOSPHERIC models ,DEEP learning - Abstract
Numerous extant image dehazing methods based on learning improve performance by increasing the depth or width, the size of the convolution kernel, or using the Transformer structure. However, this will inevitably introduce many parameters and increase the computational overhead. Therefore, we propose a lightweight dehazing framework: Dehaze-UNet, which has excellent dehazing performance and very low computational overhead to be suitable for terminal deployment. To allow Dehaze-UNet to aggregate the features of haze, we design a LAYER module. This module mainly aggregates the haze features of different hazy images through the batch normalization layer, so that Dehaze-UNet can pay more attention to haze. Furthermore, we revisit the use of the physical model in the network. We design an ASMFUN module to operate the feature map of the network, allowing the network to better understand the generation and removal of haze and learn prior knowledge to improve the network's generalization to real hazy scenes. Extensive experimental results indicate that the lightweight Dehaze-UNet outperforms state-of-the-art methods, especially for hazy images of real scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. AMEA-YOLO: a lightweight remote sensing vehicle detection algorithm based on attention mechanism and efficient architecture.
- Author
-
Wang, Shou-Bin, Gao, Zi-Meng, Jin, Deng-Hui, Gong, Shu-Ming, Peng, Gui-Li, and Yang, Zi-Jian
- Subjects
- *
REMOTE sensing , *OBJECT recognition (Computer vision) , *ALGORITHMS , *INTELLIGENT transportation systems - Abstract
Due to the large computational requirements of object detection algorithms, high-resolution remote sensing vehicle detection always involves numerous small objects, high level of background complexity, and challenges in balancing model accuracy and parameter count. The attention mechanism and efficient architecture lightweight-YOLO (AMEA-YOLO) is proposed in this paper. A lightweight network as the backbone network of AMEA-YOLO is designed, and it could maintain model accuracy and ensure good lightweight. FasterNet is employed to accelerate model training speed. The enhanced deep second-order channel attention module (EnhancedSOCA) is utilized to improve the image high-resolution. In addition, a lightweight module is devised to further reduce the model's weight. The implementation of the HardSwish activation function improves model accuracy. The experimental results indicate that the AMEA-YOLO algorithm could ensure model lightweight and accurate performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A ResNet mini architecture for brain age prediction.
- Author
-
Zhang, Xuan, Duan, Si-Yuan, Wang, Si-Qi, Chen, Yao-Wen, Lai, Shi-Xin, Zou, Ji-Sheng, Cheng, Yan, Guan, Ji-Tian, Wu, Ren-Hua, and Zhang, Xiao-Lei
- Subjects
- *
AGE - Abstract
The brain presents age-related structural and functional changes in the human life, with different extends between subjects and groups. Brain age prediction can be used to evaluate the development and aging of human brain, as well as providing valuable information for neurodevelopment and disease diagnosis. Many contributions have been made for this purpose, resorting to different machine learning methods. To solve this task and reduce memory resource consumption, we develop a mini architecture of only 10 layers by modifying the deep residual neural network (ResNet), named ResNet mini architecture. To support the ResNet mini architecture in brain age prediction, the brain age dataset (OpenNeuro #ds000228) that consists of 155 study participants (three classes) and the Alzheimer MRI preprocessed dataset that consists of 6400 images (four classes) are employed. We compared the performance of the ResNet mini architecture with other popular networks using the two considered datasets. Experimental results show that the proposed architecture exhibits generality and robustness with high accuracy and less parameter number. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Polyp segmentation network based on lightweight model and reverse attention mechanisms.
- Author
-
Long, Jianwu, Yang, Chengxin, Song, Xinlei, Zeng, Ziqin, and Ren, Yan
- Subjects
- *
INTESTINAL polyps , *COLON polyps , *COMPUTER-aided diagnosis , *POLYPS , *GASTROINTESTINAL cancer - Abstract
Colorectal cancer is a common gastrointestinal malignancy. Early screening and segmentation of colorectal polyps are of great clinical significance. Colonoscopy is the most effective method to detect polyps, but some polyps may be missed in the detection process. On this basis, the use of computer‐aided diagnosis technology is particularly important for colorectal polyp segmentation. To improve the detection rate of intestinal polyps under colonoscopy, a polyp segmentation network (MobileRaNet) based on a lightweight model and reverse attention (RA) mechanism was proposed to accurately segment polyps in colonoscopy images. The coordinated attention module is used to improve MobileNetV3 and make it the backbone network (CaNet). Second, a part of the output of the high‐level feature from the backbone network is passed into the parallel axial receptive field module (PA_RFB) to extract the global dependency representation without losing the details. Third, a global map is generated based on this combined feature as the initial boot area of the subsequent components. Finally, the RA module is used to mine the target region and boundary clues to improve the segmentation accuracy. To verify the effectiveness and lightweight performance of the algorithm, five challenging datasets, including CVC‐ColonDB, CVC‐300, and Kvasir, are used in this paper. In six indexes, including MeanDice, MeanIoU, and MAE, compared with seven typical models such as PraNet and TransUnet, accuracy, FLOPs, parameters, and FPS were compared. The experimental results show that the MobileRaNet proposed in this paper has improved the performance of the five datasets to varying degrees, especially the MeanDice and MeanIOU indexes of the Kvasir dataset reach 91.2% and 85.6%, which are, respectively, increased by 1.4% and 1.6% compared with PraNet. Compared with PraNet, FLOPs and parameters decreased by 83.3% and 76.7%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. EARN: toward efficient and robust JPEG compression artifact reduction.
- Author
-
Teng, Ge, Jiang, Rongxin, Liu, Xuesong, Zhou, Fan, and Chen, Yaowu
- Subjects
- *
JPEG (Image coding standard) , *IMAGE compression , *DEEP learning , *ARCHITECTURAL design - Abstract
JPEG is one of the most widely used lossy image compression algorithms, but artifacts are generated during compression. Various artifact reduction methods have been proposed, and many of them, especially deep learning-based approaches, showed promising performance. However, one major drawback that limited their deployment and application is their cumbersome and complicated model. To remedy this problem, we propose a simple and efficient network named Efficient Artifact Reduction Network. To achieve efficiency, we consider enlarging the receptive field and preserving pixel-wise information as significant concerns. On the one hand, we notice choosing a proper down-sampling ratio is important, as the down-sampling operation is a trade-off between these two aspects. On the other hand, we design a Large Kernel Depthwise Separable Convolution block that considers both aspects. For flexibility over different compression qualities, which is the focus of research in recent years, we design a Half Adaptive Instance Normalization-based approach that elegantly integrates information of the Quantization Matrix into the feature map. It adaptively normalizes half of the channels in the Encoder to embed the compression quality information and precise pixel-wise information is preserved through the other half channels. We also design a scalable architecture inspired by prior works to enable a post-training balance between computational cost and restoration performance. Experiments on various datasets show that our network achieves state-of-the-art restoration performance with much fewer parameters and less computational cost. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.