164 results
Search Results
2. Capturing Visual Narratives: Employing GRU and Attention Mechanism in a Deep Learning Framework for Automatic Image Captioning.
- Author
-
Jaiswal, Sushma, Pallthadka, Harikumar, Chinhewadi, Rajesh P., and Jaiswal, Tarun
- Subjects
NATURAL language processing ,RECURRENT neural networks ,DEEP learning ,CONVOLUTIONAL neural networks ,COMPUTER vision ,IMAGE analysis ,CRANES (Birds) - Abstract
The goal of automatic image captioning in computer vision and natural language processing is to provide precise and insightful image captions. The paper provides a novel approach for automatic image captioning, which combines an attention mechanism with a deep learning model based on gated recurrent units (GRUs). Using the attention mechanism, which dynamically weighs the image's attributes, the model may focus on relevant portions of the image during the caption production process. This makes it easier to match related words in the generated captions with the features of the image. The output captions accurately depict word relationships and mimic the sequential structure of natural language through the application of recurrent neural networks of the GRU variety. With a large collection of images and captions, the network is taught to generate rational and contextually relevant descriptions for various image types. By assessing the proposed model with widely-used metrics such as BLEU, METEOR, ROUGE, and CIDEr, we demonstrate its ability to generate high-quality captions. The findings show that the approach outperforms baseline techniques, highlighting the advantages of combining GRU with an attention mechanism for image captioning. The method produces captions that are accurate and convey a deeper understanding of the visual content in the photos, making it highly applicable in real-world applications such as image interpretation, accessibility and content suggestion. [ABSTRACT FROM AUTHOR]
- Published
- 2024
3. DEVELOPING A PROTOTYPE OF FIRE DETECTION AND AUTOMATIC EXTINGUISHER MOBILE ROBOT BASED ON CONVOLUTIONAL NEURAL NETWORK.
- Author
-
Saif, Amin, Muneer, Gamal, Abdulrahman, Yusuf, Abdulbaqi, Hareth, Abdullah, Aimen, Ali, Abdullah, and Derhim, Abduljalil
- Subjects
MOBILE robots ,CONVOLUTIONAL neural networks ,MACHINE learning ,OBJECT recognition (Computer vision) ,COMPUTER vision ,FIRE fighters - Abstract
The object of research is a prototype of fire detection and automatic extinguisher mobile robot based on convolutional neural network. Within the recent few decades, fires are considered as one of the most serious disaster that occurs in many places around the world. The severity of fire incidents causes damages to buildings, infrastructures and properties. Resulting losses of human’s life and costs them a lot of losses. Thus, fire poses a great threat to us significantly; it is extremely dangerous for fire fighters. Fires can be resulted by materials such as rubber and chemical products. Other sources of fire are the short circuits on electrical devices and faults in power circuits. Additionally, overheating and overloading problems can be the cause of fire incidents. All these reasons lead to bad consequences when there is no immediate response to such problems. The advent of computer vision technology has played such a significant role for human life. Artificial intelligence field has improved the efficiency and behaviors of robotics beyond expectations. The interference of artificial intelligence made robotics act intelligently. For this reason, in this paper we presented a mobile robot based on deep learning to detect the fire source and determines its coordinate position then automatically moves toward the target and extinguish fire. Deep learning algorithms are the efficient ones for object detection applications. CNN model is one of the most common deep learning algorithms which have been used in the study for the fire detection. Due to the insufficient amount of datasets and large efforts required to build model from scratch. MobileNet V2 is one of the CNN models that support transfer learning technique. After training the model and testing it on 20 % of the used datasets the classification accuracy achieved up to 98.01 %. The motion repeatability of the robot has been implemented and tested resulting mean error 0.648 cm. The object of research is a prototype of fire detection and automatic extinguisher mobile robot based on convolutional neural network. Within the recent few decades, fires are considered as one of the most serious disaster that occurs in many places around the world. The severity of fire incidents causes damages to buildings, infrastructures and properties. Resulting losses of human’s life and costs them a lot of losses. Thus, fire poses a great threat to us significantly; it is extremely dangerous for fire fighters. Fires can be resulted by materials such as rubber and chemical products. Other sources of fire are the short circuits on electrical devices and faults in power circuits. Additionally, overheating and overloading problems can be the cause of fire incidents. All these reasons lead to bad consequences when there is no immediate response to such problems. The advent of computer vision technology has played such a significant role for human life. Artificial intelligence field has improved the efficiency and behaviors of robotics beyond expectations. The interference of artificial intelligence made robotics act intelligently. For this reason, in this paper we presented a mobile robot based on deep learning to detect the fire source and determines its coordinate position then automatically moves toward the target and extinguish fire. Deep learning algorithms are the efficient ones for object detection applications. CNN model is one of the most common deep learning algorithms which have been used in the study for the fire detection. Due to the insufficient amount of datasets and large efforts required to build model from scratch. MobileNet V2 is one of the CNN models that support transfer learning technique. After training the model and testing it on 20 % of the used datasets the classification accuracy achieved up to 98.01 %. The motion repeatability of the robot has been implemented and tested resulting mean error 0.648 cm. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
4. Scene description with context information using dense-LSTM.
- Author
-
Singh, Varsha, Agrawal, Prakhar, and Tiwary, Uma Shanker
- Subjects
NATURAL language processing ,GRAPHICAL user interfaces ,CONVOLUTIONAL neural networks ,COMPUTER vision ,NATURAL languages ,VIDEO coding ,SOURCE code - Abstract
Generating natural language description for visual content is a technique for describing the content available in the image(s). It requires knowledge of both the domains of computer vision and natural language processing. For this, various models with different approaches are suggested. One of them is encoder-decoder-based description generation. Existing papers used only objects for descriptions, but the relationship between them is equally essential, requiring context information. Which required techniques like Long Short-Term Memory (LSTM). This paper proposes an encoder-decoder-based methodology to generate human-like textual descriptions. Dense-LSTM is presented for better description as a decoder with a modified VGG19 encoder to capture information to describe the scene. Standard datasets Flickr8K and Flickr30k are used for testing and training purposes. BLEU (Bilingual Evaluation Understudy) score is used to evaluate the generated text. For the proposed model, a GUI (Graphical User Interface) is developed, which produces the audio description of the output received and provides an interface for searching the related visual content and query-based search. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
5. 三分支空间变换注意力机制的图像匹配算法.
- Author
-
黄妍妍, 盖绍彦, and 达飞鹏
- Subjects
CONVOLUTIONAL neural networks ,COMPUTER vision ,IMAGE registration - Abstract
Copyright of Systems Engineering & Electronics is the property of Journal of Systems Engineering & Electronics Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
6. SquconvNet: Deep Sequencer Convolutional Network for Hyperspectral Image Classification.
- Author
-
Li, Bing, Wang, Qi-Wen, Liang, Jia-Hong, Zhu, En-Ze, and Zhou, Rong-Qian
- Subjects
DEEP learning ,CONVOLUTIONAL neural networks ,COMPUTER vision - Abstract
The application of Transformer in computer vision has had the most significant influence of all the deep learning developments over the past five years. In addition to the exceptional performance of convolutional neural networks (CNN) in hyperspectral image (HSI) classification, Transformer has begun to be applied to HSI classification. However, for the time being, Transformer has not produced satisfactory results in HSI classification. Recently, in the field of image classification, the creators of Sequencer have proposed a Sequencer structure that substitutes the Transformer self-attention layer with a BiLSTM2D layer and achieves satisfactory results. As a result, this paper proposes a unique network called SquconvNet, that combines CNN with Sequencer block to improve hyperspectral classification. In this paper, we conducted rigorous HSI classification experiments on three relevant baseline datasets to evaluate the performance of the proposed method. The experimental results show that our proposed method has clear advantages in terms of classification accuracy and stability. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
7. Objective Evaluation of Fabric Flatness Grade Based on Convolutional Neural Network.
- Author
-
Zhan, Zhu, Zhang, Wenjun, Chen, Xia, and Wang, Jun
- Abstract
As an important indicator for the appearance and intrinsic quality of textiles, fabric flatness is the immediate cause affecting the aesthetic appearance and performance of textiles. In this paper, the objective evaluation system of fabric flatness based on 3D scanner and convolutional neural network (CNN) is constructed by using the height data of AATCC flatness template. The 3D scanner is responsible for the collection of the height value data of the sample. The effect of different sub-sample cutting sizes, cutting offsets, and network model depths on the objective evaluation coincidence rate of multiple flatness level was studied. The experimental results show that the coincidence rate of the system reaches 98.9% when the collected sample data are cut into subsamples of 20 pixel × 20 pixel with 12 pixel cutting offsets and the 11-layer network model is selected. Finally, this scheme is used to evaluate the flatness of four real fabrics with different colors and textures. The result shows that all of the samples can achieve a higher coincidence rate, which further verifies the adaptability and stability of the objective evaluation system constructed in this paper for fabric flatness evaluation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. CNN-ViT Supported Weakly-Supervised Video Segment Level Anomaly Detection.
- Author
-
Sharif, Md. Haidar, Jiao, Lei, and Omlin, Christian W.
- Subjects
ANOMALY detection (Computer security) ,DEEP learning ,COMPUTER vision ,CONVOLUTIONAL neural networks ,COMPUTER engineering ,TRANSFORMER models - Abstract
Video anomaly event detection (VAED) is one of the key technologies in computer vision for smart surveillance systems. With the advent of deep learning, contemporary advances in VAED have achieved substantial success. Recently, weakly supervised VAED (WVAED) has become a popular VAED technical route of research. WVAED methods do not depend on a supplementary self-supervised substitute task, yet they can assess anomaly scores straightway. However, the performance of WVAED methods depends on pretrained feature extractors. In this paper, we first address taking advantage of two pretrained feature extractors for CNN (e.g., C3D and I3D) and ViT (e.g., CLIP), for effectively extracting discerning representations. We then consider long-range and short-range temporal dependencies and put forward video snippets of interest by leveraging our proposed temporal self-attention network (TSAN). We design a multiple instance learning (MIL)-based generalized architecture named CNN-ViT-TSAN, by using CNN- and/or ViT-extracted features and TSAN to specify a series of models for the WVAED problem. Experimental results on publicly available popular crowd datasets demonstrated the effectiveness of our CNN-ViT-TSAN. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
9. Application of Convolutional Neural Networks for Dentistry Occlusion Classification
- Author
-
Juneja, Mamta, Saini, Sumindar Kaur, Kaur, Harleen, and Jindal, Prashant
- Published
- 2024
- Full Text
- View/download PDF
10. Enhanced Deep Learning for Detecting Suspicious Fall Event in Video Data.
- Author
-
Agrawal, Madhuri and Agrawal, Shikha
- Subjects
DEEP learning ,CONVOLUTIONAL neural networks ,LONG-term memory ,FEATURE extraction ,VIDEO monitors ,PATIENT safety ,ONLINE monitoring systems - Abstract
Suspicious fall events are particularly significant hazards for the safety of patients and elders. Recently, suspicious fall event detection has become a robust research case in real-time monitoring. This paper aims to detect suspicious fall events during video monitoring of multiple people in different moving backgrounds in an indoor environment; it is further proposed to use a deep learning method known as Long Short Term Memory (LSTM) by introducing visual attention- guided mechanism along with a bi-directional LSTM model. This method contributes essential information on the temporal and spatial locations of 'suspicious fall' events in learning the video frame in both forward and backward directions. The effective "You only look once V4" (YOLO V4)--a real-time people detection system illustrates the detection of people in videos, followed by a tracking module to get their trajectories. Convolutional Neural Network (CNN) features are extracted for each person tracked through bounding boxes. Subsequently, a visual attention-guided Bi-directional LSTM model is proposed for the final suspicious fall event detection. The proposed method is demonstrated using two different datasets to illustrate the efficiency. The proposed method is evaluated by comparing it with other state-of-the-art methods, showing that it achieves 96.9% accuracy, good performance, and robustness. Hence, it is acceptable to monitor and detect suspicious fall events. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. ssFPN: Scale Sequence (S 2) Feature-Based Feature Pyramid Network for Object Detection.
- Author
-
Park, Hye-Jin, Kang, Ji-Woo, and Kim, Byung-Gyu
- Subjects
OBJECT recognition (Computer vision) ,CONVOLUTIONAL neural networks ,PYRAMIDS ,COMPUTER vision - Abstract
Object detection is a fundamental task in computer vision. Over the past several years, convolutional neural network (CNN)-based object detection models have significantly improved detection accuracyin terms of average precision (AP). Furthermore, feature pyramid networks (FPNs) are essential modules for object detection models to consider various object scales. However, the AP for small objects is lower than the AP for medium and large objects. It is difficult to recognize small objects because they do not have sufficient information, and information is lost in deeper CNN layers. This paper proposes a new FPN model named ssFPN (scale sequence (S 2 ) feature-based feature pyramid network) to detect multi-scale objects, especially small objects. We propose a new scale sequence (S 2 ) feature that is extracted by 3D convolution on the level of the FPN. It is defined and extracted from the FPN to strengthen the information on small objects based on scale-space theory. Motivated by this theory, the FPN is regarded as a scale space and extracts a scale sequence (S 2 ) feature by three-dimensional convolution on the level axis of the FPN. The defined feature is basically scale-invariant and is built on a high-resolution pyramid feature map for small objects. Additionally, the deigned S 2 feature can be extended to most object detection models based on FPNs. We also designed a feature-level super-resolution approach to show the efficiency of the scale sequence (S 2 ) feature. We verified that the scale sequence (S 2 ) feature could improve the classification accuracy for low-resolution images by training a feature-level super-resolution model. To demonstrate the effect of the scale sequence (S 2 ) feature, experiments on the scale sequence (S 2 ) feature built-in object detection approach including both one-stage and two-stage models were conducted on the MS COCO dataset. For the two-stage object detection models Faster R-CNN and Mask R-CNN with the S 2 feature, AP improvements of up to 1.6% and 1.4%, respectively, were achieved. Additionally, the AP S of each model was improved by 1.2% and 1.1%, respectively. Furthermore, the one-stage object detection models in the YOLO series were improved. For YOLOv4-P5, YOLOv4-P6, YOLOR-P6, YOLOR-W6, and YOLOR-D6 with the S 2 feature, 0.9%, 0.5%, 0.5%, 0.1%, and 0.1% AP improvements were observed. For small object detection, the AP S increased by 1.1%, 1.1%, 0.9%, 0.4%, and 0.1%, respectively. Experiments using the feature-level super-resolution approach with the proposed scale sequence (S 2 ) feature were conducted on the CIFAR-100 dataset. By training the feature-level super-resolution model, we verified that ResNet-101 with the S 2 feature trained on LR images achieved a 55.2% classification accuracy, which was 1.6% higher than for ResNet-101 trained on HR images. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Application of Artificial Intelligence in Claim Management and Fire Surveying in the context of Bangladesh.
- Author
-
Nahar, Nujhat, Naheen, Intisar Tahmid, and Hasan, Sayed Jobaer
- Subjects
FIRE management ,ARTIFICIAL intelligence ,NATURAL language processing ,COMPUTER vision ,CONVOLUTIONAL neural networks - Abstract
With the advancement of technology, the whole insurance business is getting more and more automated day by day. In Bangladesh, the insurance industry is getting involved as technology as well. Even though the concept of Artificial intelligence (AI) is quite new in the insurance business, it is having a massive stride in other financial corporations. Some classes of AI like Natural Language Processing (NLP), Computer Vision (CV) are getting used in different financial organizations. Claim management and surveyance are two of the most important parts of the insurance system. In this paper, we explore the possibility of using AI in these two fields of insurance in the context of Bangladesh. After discussing the initial terminologies, we discuss the steps following which we can utilize AI in claim management and fire surveying. We also discuss the network architecture of AI models in the paper. And finally, we discuss our progress so far in implementing AI in the Bangladesh insurance business. [ABSTRACT FROM AUTHOR]
- Published
- 2020
13. Diagnosis of tomato pests and diseases based on lightweight CNN model.
- Author
-
Sun, Li, Liang, Kaibo, Wang, Yuzhi, Zeng, Wang, Niu, Xinyue, and Jin, Longhao
- Subjects
- *
CONVOLUTIONAL neural networks , *COMPUTER vision , *CROP yields - Abstract
Tomato crop yield can be negatively impacted by various diseases and pests. The application of computer vision to diagnose tomato diseases and pests presents two significant challenges. Firstly, the collected dataset of tomato pests and diseases is often imbalanced, leading to poor model performance. Secondly, mainstream models struggle to balance the relationship between training efficiency and accuracy. In this paper, we propose a novel method for diagnosing tomato pests and diseases using a lightweight network. Specifically, we develop Squeeze and SE Net (SSNet), a new convolutional neural network (CNN) based on SqueezeNet and SENet. We evaluate the performance of SSNet through comparative experiments for diagnosing tomato pests and diseases. Additionally, we examine the impact of dataset balance, data volume, and hyperparameters on model performance. Our results demonstrate that SSNet achieves model accuracies of 98.80% and 98.39% for tomato pests and diseases, respectively, with only 0.398 M parameters. This approach provides a promising and lightweight alternative to existing models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Towards Home-Based Diabetic Foot Ulcer Monitoring: A Systematic Review.
- Author
-
Kairys, Arturas, Pauliukiene, Renata, Raudonis, Vidas, and Ceponis, Jonas
- Subjects
DIABETIC foot ,FOOT ,COMPUTER vision ,DIABETES complications ,CONVOLUTIONAL neural networks ,FEATURE extraction ,ARTIFICIAL intelligence - Abstract
It is considered that 1 in 10 adults worldwide have diabetes. Diabetic foot ulcers are some of the most common complications of diabetes, and they are associated with a high risk of lower-limb amputation and, as a result, reduced life expectancy. Timely detection and periodic ulcer monitoring can considerably decrease amputation rates. Recent research has demonstrated that computer vision can be used to identify foot ulcers and perform non-contact telemetry by using ulcer and tissue area segmentation. However, the applications are limited to controlled lighting conditions, and expert knowledge is required for dataset annotation. This paper reviews the latest publications on the use of artificial intelligence for ulcer area detection and segmentation. The PRISMA methodology was used to search for and select articles, and the selected articles were reviewed to collect quantitative and qualitative data. Qualitative data were used to describe the methodologies used in individual studies, while quantitative data were used for generalization in terms of dataset preparation and feature extraction. Publicly available datasets were accounted for, and methods for preprocessing, augmentation, and feature extraction were evaluated. It was concluded that public datasets can be used to form a bigger, more diverse datasets, and the prospects of wider image preprocessing and the adoption of augmentation require further research. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Neural Architecture Search Survey: A Computer Vision Perspective.
- Author
-
Kang, Jeon-Seong, Kang, JinKyu, Kim, Jung-Jun, Jeon, Kwang-Woo, Chung, Hyun-Joon, and Park, Byung-Hoon
- Subjects
COMPUTER vision ,NETWORK-attached storage ,DATABASE searching ,TEXT recognition ,INTERNET surveys ,ELECTRONIC information resource searching - Abstract
In recent years, deep learning (DL) has been widely studied using various methods across the globe, especially with respect to training methods and network structures, proving highly effective in a wide range of tasks and applications, including image, speech, and text recognition. One important aspect of this advancement is involved in the effort of designing and upgrading neural architectures, which has been consistently attempted thus far. However, designing such architectures requires the combined knowledge and know-how of experts from each relevant discipline and a series of trial-and-error steps. In this light, automated neural architecture search (NAS) methods are increasingly at the center of attention; this paper aimed at summarizing the basic concepts of NAS while providing an overview of recent studies on the applications of NAS. It is worth noting that most previous survey studies on NAS have been focused on perspectives of hardware or search strategies. To the best knowledge of the present authors, this study is the first to look at NAS from a computer vision perspective. In the present study, computer vision areas were categorized by task, and recent trends found in each study on NAS were analyzed in detail. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. Enhanced Attention-Based Encoder-Decoder Framework for Text Recognition.
- Author
-
Prabu, S. and Abraham Sundar, K. Joseph
- Subjects
TEXT recognition ,CONVOLUTIONAL neural networks ,COMPUTER vision - Abstract
Recognizing irregular text in natural images is a challenging task in computer vision. The existing approaches still face difficulties in recognizing irregular text because of its diverse shapes. In this paper, we propose a simple yet powerful irregular text recognition framework based on an encoder-decoder architecture. The proposed framework is divided into four main modules. Firstly, in the image transformation module, a Thin Plate Spline (TPS) transformation is employed to transform the irregular text image into a readable text image. Secondly, we propose a novel Spatial Attention Module (SAM) to compel the model to concentrate on text regions and obtain enriched feature maps. Thirdly, a deep bi-directional long short-term memory (Bi-LSTM) network is used to make a contextual feature map out of a visual feature map generated from a Convolutional Neural Network (CNN). Finally, we propose a Dual Step Attention Mechanism (DSAM) integrated with the Connectionist Temporal Classification (CTC) - Attention decoder to re-weights visual features and focus on the intra-sequence relationships to generate a more accurate character sequence. The effectiveness of our proposed framework is verified through extensive experiments on various benchmarks datasets, such as SVT, ICDAR, CUTE80, and IIIT5k. The performance of the proposed text recognition framework is analyzed with the accuracy metric. Demonstrate that our proposed method outperforms the existing approaches on both regular and irregular text. Additionally, the robustness of our approach is evaluated using the grocery datasets, such as GroZi-120, Web- Market, SKU-110K, and Freiburg Groceries datasets that contain complex text images. Still, our framework produces superior performance on grocery datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. A deep learning-based car accident detection approach in video-based traffic surveillance
- Author
-
Wu, Xinyu and Li, Tingting
- Published
- 2024
- Full Text
- View/download PDF
18. Classification of Tea Leaves Based on Fluorescence Imaging and Convolutional Neural Networks.
- Author
-
Wei, Kaihua, Chen, Bojian, Li, Zejian, Chen, Dongmei, Liu, Guangyu, Lin, Hongze, and Zhang, Baihua
- Subjects
CONVOLUTIONAL neural networks ,FLUORESCENCE ,LEAF anatomy ,COMPUTER vision ,FLUORESCENT lamps - Abstract
The development of the smartphone and computer vision technique provides customers with a convenient approach to identify tea species, as well as qualities. However, the prediction model may not behave robustly due to changes in illumination conditions. Fluorescence imaging can induce the fluorescence signal from typical components, and thus may improve the prediction accuracy. In this paper, a tea classification method based on fluorescence imaging and convolutional neural networks (CNN) is proposed. Ultra-violet (UV) LEDs with a central wavelength of 370 nm were utilized to induce the fluorescence of tea samples so that the fluorescence images could be captured. Five kinds of tea were included and pre-processed. Two CNN-based classification models, e.g., the VGG16 and ResNet-34, were utilized for model training. Images captured under the conventional fluorescent lamp were also tested for comparison. The results show that the accuracy of the classification model based on fluorescence images is better than those based on the white-light illumination images, and the performance of the VGG16 model is better than the ResNet-34 model in our case. The classification accuracy of fluorescence images reached 97.5%, which proves that the LED-induced fluorescence imaging technique is promising to use in our daily life. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
19. DMA-Net: DeepLab With Multi-Scale Attention for Pavement Crack Segmentation.
- Author
-
Sun, Xinzi, Xie, Yuanchang, Jiang, Liming, Cao, Yu, and Liu, Benyuan
- Abstract
Cracks are important indicators of pavement structural and operational conditions. Early pavement crack detection and treatments can help extend pavement service life, reduce fuel consumption, and improve safety and ride quality. Pavement distress surveys have traditionally been performed manually by visually inspecting the roads, which is labor-intensive and time-consuming. Therefore, computer-vision-based automated crack detection has great practical significance in pavement maintenance and traffic safety. Traditional image processing techniques are sensitive to noise in images and are thus likely to miss detecting some cracks due to the crack texture variety, complex lighting conditions, and various similar but irrelevant objects on the road. This paper adopts and enhances DeepLabv3+, a popular deep learning framework for semantic image segmentation, for road pavement crack detection. We propose a multi-scale attention module in the decoder of DeepLabv3+ to generate an attention mask and dynamically assign weights between high-level and low-level feature maps. Compared with fixed weights across different features, the dynamic weights strategy can assign more reasonable weights to different feature maps. Ablation experiments show that the attention mask can effectively help the model better combine multi-scale features and generate more accurate pavement crack segmentation results. The proposed method achieves state-of-the-art results on three benchmarks, including Crack500, DeepCrack, and FMA (Fitchburg Municipal Airport) datasets. We further test it on pavement crack images captured by smartphones, and the results show that it provides a viable approach to road pavement crack segmentation in practice with excellent performance. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
20. Cofopose: Conditional 2D Pose Estimation with Transformers.
- Author
-
Aidoo, Evans, Wang, Xun, Liu, Zhenguang, Tenagyei, Edwin Kwadwo, Owusu-Agyemang, Kwabena, Kodjiku, Seth Larweh, Ejianya, Victor Nonso, and Aggrey, Esther Stacy E. B.
- Subjects
ARTIFICIAL vision ,COMPUTER vision ,ARTIFICIAL intelligence - Abstract
Human pose estimation has long been a fundamental problem in computer vision and artificial intelligence. Prominent among the 2D human pose estimation (HPE) methods are the regression-based approaches, which have been proven to achieve excellent results. However, the ground-truth labels are usually inherently ambiguous in challenging cases such as motion blur, occlusions, and truncation, leading to poor performance measurement and lower levels of accuracy. In this paper, we propose Cofopose, which is a two-stage approach consisting of a person and keypoint detection transformers for 2D human pose estimation. Cofopose is composed of conditional cross-attention, a conditional DEtection TRansformer (conditional DETR), and an encoder-decoder in the transformer framework; this allows it to achieve person and keypoint detection. In a significant departure from other approaches, we use conditional cross-attention and fine-tune conditional DETR for our person detection, and encoder-decoders in the transformers for our keypoint detection. Cofopose was extensively evaluated using two benchmark datasets, MS COCO and MPII, achieving an improved performance with significant margins over the existing state-of-the-art frameworks. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
21. SARDO: An Automated Search-and-Rescue Drone-Based Solution for Victims Localization.
- Author
-
Albanese, Antonio, Sciancalepore, Vincenzo, and Costa-Perez, Xavier
- Subjects
COMMUNICATION infrastructure ,CELL phones ,GLOBAL Positioning System ,COMPUTER vision - Abstract
Natural disasters affect millions of people every year. Finding missing persons in the shortest possible time is of crucial importance to reduce the death toll. This task is especially challenging when victims are sparsely distributed in large and/or difficult-to-reach areas and cellular networks are down. In this paper we present SARDO, a drone-based search and rescue solution that leverages the high penetration rate of mobile phones in the society to localize missing people. SARDO is an autonomous, all-in-one drone-based mobile network solution that does not require infrastructure support or mobile phones modifications. It builds on novel concepts such as pseudo-trilateration combined with machine-learning techniques to efficiently locate mobile phones in a given area. Our results, with a prototype implementation in a field-[1], show that SARDO rapidly determines the location of mobile phones ($\sim \!3$ ∼ 3 min/UE) in a given area with an accuracy of few tens of meters and at a low battery consumption cost ($\sim \!5\%$ ∼ 5 % ). State-of-the-art localization solutions for disaster scenarios rely either on mobile infrastructure support or exploit onboard cameras for human/computer vision, IR, thermal-based localization. To the best of our knowledge, SARDO is the first drone-based cellular search-and-rescue solution able to accurately localize missing victims through mobile phones. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
22. Underwater Image Detection for Cleaning Purposes; Techniques Used for Detection Based on Machine Learning.
- Author
-
Goxhaj, Ornelta, Yilmaz, Nilay Gul, Kouhalvandi, Lida, Shayea, Ibraheem, and Azizan, Azızul
- Subjects
MACHINE learning ,CONVOLUTIONAL neural networks ,WATER pollution ,IMAGE processing ,COMPUTER vision - Abstract
Serious problems are on the rise, especially in these current times. The world is facing too many environmental threats. Water pollution is one of the main issues threatening the future. In some parts of the world, the water's surface is covered by mucilage, which is dangerous for both aquatic animals and humans. This article firstly defines mucilage and highlights the reasons for its production. Afterwards to tackle water pollution, cleaning systems using image detection with the help of machine learning supervised classification algorithms are highlighted. This paper showcases the machine learning and classification used as well as the best solution for convolutional neural network and region-based convolutional neural network methods. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
23. Hardware Efficient Modified CNN Architecture for Traffic Sign Detection and Recognition.
- Author
-
Vaidya, Bhaumik and Paunwala, Chirag
- Subjects
TRAFFIC monitoring ,DEEP learning ,TRAFFIC signs & signals ,DRIVER assistance systems ,CONVOLUTIONAL neural networks ,COMPUTER vision - Abstract
Traffic sign recognition is a vital part for any driver assistance system which can help in making complex driving decision based on the detected traffic signs. Traffic sign detection (TSD) is essential in adverse weather conditions or when the vehicle is being driven on the hilly roads. Traffic sign recognition is a complex computer vision problem as generally the signs occupy a very small portion of the entire image. A lot of research is going on to solve this issue accurately but still it has not been solved till the satisfactory performance. The goal of this paper is to propose a deep learning architecture which can be deployed on embedded platforms for driver assistant system with limited memory and computing resources without sacrificing on detection accuracy. The architecture uses various architectural modification to the well-known Convolutional Neural Network (CNN) architecture for object detection. It uses a trainable Color Transformer Network (CTN) with the existing CNN architecture for making the system invariant to illumination and light changes. The architecture uses feature fusion module for detecting small traffic signs accurately. In the proposed work, receptive field calculation is used for choosing the number of convolutional layer for prediction and the right scales for default bounding boxes. The architecture is deployed on Jetson Nano GPU Embedded development board for performance evaluation at the edge and it has been tested on well-known German Traffic Sign Detection Benchmark (GTSDB) and Tsinghua-Tencent 100k dataset. The architecture only requires 11 MB for storage which is almost ten times better than the previous architectures. The architecture has one sixth parameters than the best performing architecture and 50 times less floating point operations per second (FLOPs). The architecture achieves running time of 220 ms on desktop GPU and 578 ms on Jetson Nano which is also better compared to other similar implementation. It also achieves comparable accuracy in terms of mean average precision (mAP) for both the datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
24. VPRS-Based Regional Decision Fusion of CNN and MRF Classifications for Very Fine Resolution Remotely Sensed Images.
- Author
-
Zhang, Ce, Sargent, Isabel, Pan, Xin, Gardiner, Andy, Hare, Jonathon, and Atkinson, Peter M.
- Subjects
COMPUTER vision ,PATTERN perception ,NEURAL circuitry ,DATA ,IMAGE - Abstract
Recent advances in computer vision and pattern recognition have demonstrated the superiority of deep neural networks using spatial feature representation, such as convolutional neural networks (CNNs), for image classification. However, any classifier, regardless of its model structure (deep or shallow), involves prediction uncertainty when classifying spatially and spectrally complicated very fine spatial resolution (VFSR) imagery. We propose here to characterize the uncertainty distribution of CNN classification and integrate it into a regional decision fusion to increase classification accuracy. Specifically, a variable precision rough set (VPRS) model is proposed to quantify the uncertainty within CNN classifications of VFSR imagery and partition this uncertainty into positive regions (correct classifications) and nonpositive regions (uncertain or incorrect classifications). Those “more correct” areas were trusted by the CNN, whereas the uncertain areas were rectified by a multilayer perceptron (MLP)-based Markov random field (MLP-MRF) classifier to provide crisp and accurate boundary delineation. The proposed MRF-CNN fusion decision strategy exploited the complementary characteristics of the two classifiers based on VPRS uncertainty description and classification integration. The effectiveness of the MRF-CNN method was tested in both urban and rural areas of southern England as well as semantic labeling data sets. The MRF-CNN consistently outperformed the benchmark MLP, support vector machine, MLP-MRF, CNN, and the baseline methods. This paper provides a regional decision fusion framework within which to gain the advantages of model-based CNN, while overcoming the problem of losing effective resolution and uncertain prediction at object boundaries, which is especially pertinent for complex VFSR image classification. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
25. Wire Defect Recognition of Spring-Wire Socket Using Multitask Convolutional Neural Networks.
- Author
-
Tao, Xian, Wang, Zihao, Zhang, Zhengtao, Zhang, Dapeng, Xu, De, Gong, Xinyi, and Zhang, Lei
- Subjects
ARTIFICIAL neural networks ,ELECTRIC connectors ,COMPUTER vision ,SIGNAL convolution ,IMAGE processing - Abstract
As a critical electrical connector component in the modern industrial environment, spring-wire sockets and their manufacture quality are closely relevant to equipment safety. These types of defects in a component are difficult to properly distinguish due to the defect similarity and diversity. In such cases, defect types can only be determined using cumbersome human visual inspection. To satisfy the requirements of quality control, a machine vision apparatus for component inspection is presented in this paper. With a brief description of the apparatus system design, our emphasis is put on the defect recognition algorithm. A multitask convolutional neural network (CNN) is proposed for detecting those ambiguous defects. Compared with the image processing method in machine vision, the defect inspection problem is converted into object detection and classification problems. Instead of breaking it down into two separate tasks, we jointly handle both aspects in a single CNN. In addition, data augmentation methods are discussed to analyze their effects on defects recognition. Successful inspection results using the presented model are obtained using challenging real-world defect image data gathered from a spring-wire socket module inspection line in an industrial plant. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
26. Angel-Eye: A Complete Design Flow for Mapping CNN Onto Embedded FPGA.
- Author
-
Guo, Kaiyuan, Sui, Lingzhi, Qiu, Jiantao, Yu, Jincheng, Wang, Junbin, Yao, Song, Han, Song, Wang, Yu, and Yang, Huazhong
- Subjects
FIELD programmable gate arrays ,SIGNAL convolution ,ARTIFICIAL neural networks ,KERNEL (Mathematics) ,COMPUTER vision - Abstract
Convolutional neural network (CNN) has become a successful algorithm in the region of artificial intelligence and a strong candidate for many computer vision algorithms. But the computation complexity of CNN is much higher than traditional algorithms. With the help of GPU acceleration, CNN-based applications are widely deployed in servers. However, for embedded platforms, CNN-based solutions are still too complex to be applied. Various dedicated hardware designs on field-programmable gate arrays (FPGAs) have been carried out to accelerate CNNs, while few of them explore the whole design flow for both fast deployment and high power efficiency. In this paper, we investigate state-of-the-art CNN models and CNN-based applications. Requirements on memory, computation and the flexibility of the system are summarized for mapping CNN on embedded FPGAs. Based on these requirements, we propose Angel-Eye, a programmable and flexible CNN accelerator architecture, together with data quantization strategy and compilation tool. Data quantization strategy helps reduce the bit-width down to 8-bit with negligible accuracy loss. The compilation tool maps a certain CNN model efficiently onto hardware. Evaluated on Zynq XC7Z045 platform, Angel-Eye is 6 \times faster and 5\times better in power efficiency than peer FPGA implementation on the same platform. Applications of VGG network, pedestrian detection and face alignment are used to evaluate our design on Zynq XC7Z020. NIVIDA TK1 and TX1 platforms are used for comparison. Angel-Eye achieves similar performance and delivers up to 16 \times better energy efficiency. [ABSTRACT FROM PUBLISHER]
- Published
- 2018
- Full Text
- View/download PDF
27. Image super-resolution: A comprehensive review, recent trends, challenges and applications.
- Author
-
Lepcha, Dawa Chyophel, Goyal, Bhawna, Dogra, Ayush, and Goyal, Vishal
- Subjects
- *
HIGH resolution imaging , *DEEP learning , *COMPUTER vision , *IMAGING systems , *IMAGE processing , *VISUAL perception - Abstract
• Detailed survey on recent advancements in image super resolution. • Broad Classification of methods in three categories. • Benchmark algorithms are discussed with experimental analysis. Super resolution (SR) is an eminent system in the field of computer vison and image processing to improve the visual perception of the poor-quality images. The key objective of image super resolution is to address the limitations of imaging systems mainly due to hardware problems and requirements for clinical processing of medical imaging using post-processing operations. Numerous super resolution strategies have been put-forward in the computer vision community to improve and achieve high-resolution images over the years. In the past few years, there has been a significant advancement in image super-resolution algorithms. This paper aims to provide the detailed survey on recent advancements in image super-resolution in terms of traditional, deep learning and the latest transformer-based algorithms. The in-depth taxonomy of broadly classified super-resolution techniques within these categories has been broadly discussed. An extensive survey has been carried out on deep learning techniques in terms of parameters, architecture, network complexity, depth, learning rate, framework, optimization, and loss function. Furthermore, we also address some of the significant parameters such as problem definition, evaluation metrics, publicly benchmarks datasets, loss functions and applications. In addition, we have performed an experimental analysis and comparison of various benchmark algorithms on publicly available datasets both qualitively and quantitively. Lastly, we conclude our survey by emphasizing some of the prospective future directions and open issues that the community need to address in the future. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
28. TomConv: An Improved CNN Model for Diagnosis of Diseases in Tomato Plant Leaves.
- Author
-
Baser, Preeti, Saini, Jatinderkumar R., and Kotecha, Ketan
- Subjects
PLANT diseases ,CONVOLUTIONAL neural networks ,FOLIAGE plants ,DIAGNOSIS ,COMPUTER vision - Abstract
Crop disease in the plant is a significant issue in the agriculture sector, and it is currently very difficult to detect these illnesses in crop leaves. The foundation of the global economy is agriculture. India ranks second in the production of tomatoes worldwide. The tomato crop is affected by various diseases which lead to a reduction in product quality and quantity. The advancement in computer vision and deep learning opens up the door for predicting diseases that appear in the crops. The aim of this paper is classification among 10 different categories of tomato plant leaves using the proposed novel TomConv model which deploys an improved Convolutional Neural Network (CNN). For this purpose, the publicly available dataset called PlantVillage comprising of more than 16000 images of tomato leaves, both diseased and healthy was used for the experimentation purpose. The proposed model is the simplest model among all the available state-of-the-art models. The tomato leaf images were preprocessed for reducing the size in 150 × 150 dimension. The model constitutes four layered CNN followed by a max pooling layer. The model splits the corpus into training and validation datasets in 80:20 ratio, is trained under 105 epochs for tomato leaf images, and achieved an accuracy of 98.19%. The proposed model is compared with existing models under different parameters such as no. of classes, no. of layers, and accuracy. The results are promising as they outperform all the available state-of-the-art models. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
29. Human Detection Aided by Deeply Learned Semantic Masks.
- Author
-
Wang, Xinyu, Shen, Chunhua, Li, Hanxi, and Xu, Shugong
- Subjects
DEEP learning ,CONVOLUTIONAL neural networks ,COMPUTER vision ,VIDEO surveillance ,IMAGE converters ,MASKS - Abstract
Human detection is one of the long-standing computer vision tasks, and it has been a cornerstone for many real-world applications, such as photo album organization, video surveillance, and autonomous driving. Benefiting from deep learning technologies, such as convolutional neural networks and modern object detectors, have been achieving much improved accuracy in generic object detection tasks. In this paper, we aim to improve deep learning-based human detection. Our main idea is to exploit semantic context information for human detection by using deep-learnt semantic features provided by semantic segmentation masks. Segmentation masks play as an attention mechanism and enforce the detectors to focus on the image regions where potential object candidates are likely to appear. Meanwhile, the extra segmentation mask channel can also guide the convolutional kernels to automatically learn more discriminative features which make it easier to distinguish the background and foreground. We implement our methods with two popular detection frameworks, i.e., faster R-CNN and SSD and experimentally analyze the effectiveness of the proposed methods. Evaluation results on the widely used MS-COCO dataset and the very recent CrowdHuman dataset are provided. Our proposed methods outperform the baseline detectors and achieve better performance on highly occluded human detection. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
30. Deep Learning-Based Technology in Responses to the Joint Call for Proposals on Video Compression With Capability Beyond HEVC.
- Author
-
Liu, Dong, Chen, Zhenzhong, Liu, Shan, and Wu, Feng
- Subjects
DEEP learning ,VIDEO coding ,VIDEO compression ,CONVOLUTIONAL neural networks ,COMPUTER vision ,VISUAL fields ,IMAGE processing - Abstract
Deep learning has achieved great success in the past decade, especially in the fields of computer vision and image processing. After witnessing such success, video coding experts are motivated to consider whether deep learning can also benefit video coding, and if so, they seek to discover why and how. Indeed, a number of research studies have been conducted to explore deep learning for image and video coding, which has been an active and fast-growing research area especially since the year 2015. These prior arts can be divided into two categories: new coding schemes that are built solely upon deep networks (deep schemes), and deep network-based coding tools that are embedded into traditional coding schemes (deep tools). Moreover, in the responses to the joint call for proposals on video compression with capability beyond High Efficiency Video Coding (HEVC), a number of deep tools have been proposed, and some of them are further studied for the upcoming Versatile Video Coding (VVC). In this paper, we summarize the ongoing efforts in the Joint Video Experts Team about the proposed deep tools, and we discuss several promising tools in much detail, including neural network-based intra prediction, convolutional neural network (CNN) based in-loop filtering, and CNN-based block-adaptive-resolution coding. A series of experimental results are provided to demonstrate the capability of these tools in achieving higher compression efficiency than the VVC or HEVC anchor. These results shed light on the promising direction of deep learning-based future video coding, towards which a lot of open problems call for further study. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
31. Investigations of Object Detection in Images/Videos Using Various Deep Learning Techniques and Embedded Platforms—A Comprehensive Review.
- Author
-
Murthy, Chinthakindi Balaram, Hashmi, Mohammad Farukh, Bokde, Neeraj Dhanraj, and Geem, Zong Woo
- Subjects
CONVOLUTIONAL neural networks ,DEEP learning ,COMPUTER vision ,FEATURE extraction ,APPLICATION software - Abstract
In recent years there has been remarkable progress in one computer vision application area: object detection. One of the most challenging and fundamental problems in object detection is locating a specific object from the multiple objects present in a scene. Earlier traditional detection methods were used for detecting the objects with the introduction of convolutional neural networks. From 2012 onward, deep learning-based techniques were used for feature extraction, and that led to remarkable breakthroughs in this area. This paper shows a detailed survey on recent advancements and achievements in object detection using various deep learning techniques. Several topics have been included, such as Viola–Jones (VJ), histogram of oriented gradient (HOG), one-shot and two-shot detectors, benchmark datasets, evaluation metrics, speed-up techniques, and current state-of-art object detectors. Detailed discussions on some important applications in object detection areas, including pedestrian detection, crowd detection, and real-time object detection on Gpu-based embedded systems have been presented. At last, we conclude by identifying promising future directions. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
32. A High-Precision Positioning Approach for Catenary Support Components With Multiscale Difference.
- Author
-
Liu, Zhigang, Liu, Kai, Zhong, Junping, Han, Zhiwei, and Zhang, Wenxuan
- Subjects
MULTISCALE modeling ,CONVOLUTIONAL neural networks ,CATENARY ,COMPUTER vision ,ALGORITHMS ,DYNAMIC positioning systems ,ELECTRIC lines - Abstract
The catenary support components (CSCs) are the most important devices in high-speed railways to support contact lines for powering trains. To estimate the states of CSCs, it is very necessary to locate their positions in a monitoring system based on computer vision. Considering the application scenarios and characteristics of CSCs, an automatic and quick positioning system is designed in this paper to simultaneously position the multiscale CSCs with 12 categories. In the system, an effective framework called CSCs network (CSCNET) is presented, which cascades the coarse positioning network and the fine positioning network to reduce multiscale differences between different CSCs. In the coarse positioning network, a new unsupervised clustering algorithm based on the relative positioning information is proposed to classify the catenary images. Then, a convolutional neural network (CNN) classification network is trained to extract the structural features of catenary images and generate the proposal regions with labels. In the fine positioning network, a modified CNN positioning framework is applied to obtain the accurate positions of CSCs based on the coarse positioning results. Due to the special lightweight structure with a classification network, the relative position information is applied and makes the CSCNET sensitive to small-scale components. The experimental results from some high-speed railway lines in China show that the proposed system has obvious advantages in the CSCs positioning. The mean average precision and frames per second of CSCNET reach 0.837 and 2.17, respectively. Compared with some popular convolutional networks [faster region-based CNN (Faster R-CNN), etc.] and a typical positioning method, the proposed system significantly improves the AP without increasing the computational time. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
33. Activity Recognition Using Temporal Optical Flow Convolutional Features and Multilayer LSTM.
- Author
-
Ullah, Amin, Muhammad, Khan, Del Ser, Javier, Baik, Sung Wook, and de Albuquerque, Victor Hugo C.
- Subjects
VIDEO surveillance ,OPTICAL flow ,ARTIFICIAL neural networks ,HUMAN activity recognition ,ELECTRONIC surveillance ,COMPUTER vision ,COMPUTER engineering - Abstract
Nowadays digital surveillance systems are universally installed for continuously collecting enormous amounts of data, thereby requiring human monitoring for the identification of different activities and events. Smarter surveillance is the need of this era through which normal and abnormal activities can be automatically identified using artificial intelligence and computer vision technology. In this paper, we propose a framework for activity recognition in surveillance videos captured over industrial systems. The continuous surveillance video stream is first divided into important shots, where shots are selected using the proposed convolutional neural network (CNN) based human saliency features. Next, temporal features of an activity in the sequence of frames are extracted by utilizing the convolutional layers of a FlowNet2 CNN model. Finally, a multilayer long short-term memory is presented for learning long-term sequences in the temporal optical flow features for activity recognition. Experiments 1 https://github.com/Aminullah6264/Activity%5fRec%5fML-LSTM. are conducted using different benchmark action and activity recognition datasets, and the results reveal the effectiveness of the proposed method for activity recognition in industrial settings compared with state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
34. Remote Sensor Design for Visual Recognition With Convolutional Neural Networks.
- Author
-
Jaffe, Lucas, Zelinski, Michael, and Sakla, Wesam
- Subjects
ARTIFICIAL neural networks ,REMOTE sensing ,COMPUTER engineering ,COMPUTER performance ,DEEP learning ,COMPUTER vision - Abstract
While deep learning technologies for computer vision have developed rapidly since 2012, modeling of remote sensing systems has remained focused around human vision. In particular, remote sensing systems are usually constructed to optimize sensing cost-quality tradeoffs with respect to human image interpretability. While some recent studies have explored remote sensing system design as a function of simple computer vision algorithm performance, there has been little work relating this design to the state of the art in computer vision: deep learning with convolutional neural networks. We develop experimental systems to conduct this analysis, showing results with modern deep learning algorithms and recent overhead image data. Our results are compared to standard image quality measurements based on human visual perception, and we conclude not only that machine and human interpretability differ significantly but also that computer vision performance is largely self-consistent across a range of disparate conditions. This paper is presented as a cornerstone for a new generation of sensor design systems that focus on computer algorithm performance instead of human visual perception. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
35. FastFace: 实时鲁棒的人脸检测算法.
- Author
-
李启运, 纪庆革, and 洪赛丁
- Subjects
ARTIFICIAL neural networks ,FACE ,FEATURE extraction ,DEEP learning ,COMPUTER multitasking ,MATHEMATICAL convolutions ,FUSIFORM gyrus - Abstract
Copyright of Journal of Image & Graphics is the property of Editorial Office of Journal of Image & Graphics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2019
- Full Text
- View/download PDF
36. Fast Semantic Segmentation for Scene Perception.
- Author
-
Zhang, Xuetao, Chen, Zhenxue, Wu, Q. M. Jonathan, Cai, Lei, Lu, Dan, and Li, Xianming
- Abstract
Semantic segmentation is a challenging problem in computer vision. Many applications, such as autonomous driving and robot navigation with urban road scene, need accurate and efficient segmentation. Most state-of-the-art methods focus on accuracy, rather than efficiency. In this paper, we propose a more efficient neural network architecture, which has fewer parameters, for semantic segmentation in the urban road scene. An asymmetric encoder–decoder structure based on ResNet is used in our model. In the first stage of encoder, we use continuous factorized block to extract low-level features. Continuous dilated block is applied in the second stage, which ensures that the model has a larger view field, while keeping the model small-scale and shallow. The down sampled features from encoder are up sampled with decoder to the same-size output as the input image and the details refined. Our model can achieve end-to-end and pixel-to-pixel training without pretraining from scratch. The parameters of our model are only $0.2M$ , $100 \times$ less than those of others such as SegNet, etc. Experiments are conducted on five public road scene datasets (CamVid, CityScapes, Gatech, KITTI Road Detection, and KITTI Semantic Segmentation), and the results demonstrate that our model can achieve better performance. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
37. Survey of Collective Activity Recognition Based on Deep Learning.
- Author
-
PEI Lishen and ZHAO Xuezhuan
- Subjects
DEEP learning ,HUMAN activity recognition ,COMPUTER vision ,VISUAL fields - Abstract
Collective activity recognition is an important issue in the field of computer vision, with wide application and urgently to be solved. With the development of deep neural network, the width and depth of collective activity recognition and understanding are expanding. Through investigating the research literature of collective activity recognition in recent ten years, the problem definition of the collective activity recognition is determined. The existing problems and challenges of collective activity recognition are pointed out. In the framework of deep neural network, this paper describes the development of the collective activity recognition algorithm from the early stage which only classifies and recognizes the collective activity categories to the current stage which focuses more on the understanding of the details of activities in the group behavior. Then, based on the network architectures such as CNN/3DCNN, Two-Stream Network, RNN/LSTM and Transformer, the core network architecture and the main research ideas of the mainstream collective activity recognition methods are mainly introduced. The recognition performance of these algorithms on common datasets is compared. The commonly used collective activity recognition datasets labeled with multilevel labels such as collective activity types and individual activity categories are combed and compared. Through objective and fair discussion and analysis of the advantages and disadvantages of various algorithms, it is expected to prompt readers to propose new solutions or new problems of collective activity recognition. Finally, the future development of collective activity recognition is prospected, which is expected to stimulate new research directions. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Scaling-Translation-Equivariant Networks with Decomposed Convolutional Filters.
- Author
-
Wei Zhu, Qiang Qiu, Calderbank, Robert, Sapiro, Guillermo, and Xiuyuan Cheng
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *COMPUTER vision - Abstract
Encoding the scale information explicitly into the representation learned by a convolutional neural network (CNN) is beneficial for many computer vision tasks especially when dealing with multiscale inputs. We study, in this paper, a scaling-translation-equivariant (ST -equivariant) CNN with joint convolutions across the space and the scaling group, which is shown to be both sufficient and necessary to achieve equivariance for the regular representation of the scaling-translation group ST. To reduce the model complexity and computational burden, we decompose the convolutional filters under two pre-fixed separable bases and truncate the expansion to low-frequency components. A further benefit of the truncated filter expansion is the improved deformation robustness of the equivariant representation, a property which is theoretically analyzed and empirically verified. Numerical experiments demonstrate that the proposed scaling-translation-equivariant network with decomposed convolutional filters (ScDCFNet) achieves significantly improved performance in multiscale image classification and better interpretability than regular CNNs at a reduced model size. [ABSTRACT FROM AUTHOR]
- Published
- 2022
39. An Evaluation of Various Machine Learning Approaches for Detecting Leaf Diseases in Agriculture.
- Author
-
Ok-Hue Cho
- Subjects
MACHINE learning ,DATA augmentation ,COMPUTER vision ,CONVOLUTIONAL neural networks ,SUPPORT vector machines - Abstract
Background: Machine learning has shown remarkable promise in recent years for use in areas such as pattern detection and categorization. The diagnosis of diseases is crucial in agriculture since they are a natural occurrence in plants. The easiest and most effective way to identify crop disease is through the use of image processing, computer vision and machine learning techniques. Methods: To identify and categorize cotton leaf diseases, the study compares the effectiveness of established techniques like Support Vector Machine (SVM) and random forest with state-of-the-art techniques like neural network (CNN) methods and architectures like Inceptionv3, VGG16 and RasNet50 with data augmentation and transfer learning. Result: The models were trained with four distinct types of plant photos that were manually gathered from a government agency and a farm. It was also noted that as the quantity of training data rose, so performed the resultant models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. Ultrasound Image Analysis with Vision Transformers—Review.
- Author
-
Vafaeezadeh, Majid, Behnam, Hamid, and Gifani, Parisa
- Subjects
TRANSFORMER models ,IMAGE analysis ,ULTRASONIC imaging ,CONVOLUTIONAL neural networks ,COMPUTER vision - Abstract
Ultrasound (US) has become a widely used imaging modality in clinical practice, characterized by its rapidly evolving technology, advantages, and unique challenges, such as a low imaging quality and high variability. There is a need to develop advanced automatic US image analysis methods to enhance its diagnostic accuracy and objectivity. Vision transformers, a recent innovation in machine learning, have demonstrated significant potential in various research fields, including general image analysis and computer vision, due to their capacity to process large datasets and learn complex patterns. Their suitability for automatic US image analysis tasks, such as classification, detection, and segmentation, has been recognized. This review provides an introduction to vision transformers and discusses their applications in specific US image analysis tasks, while also addressing the open challenges and potential future trends in their application in medical US image analysis. Vision transformers have shown promise in enhancing the accuracy and efficiency of ultrasound image analysis and are expected to play an increasingly important role in the diagnosis and treatment of medical conditions using ultrasound imaging as technology progresses. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. A Simple Deep Learning Network for Classification of 3D Mobile LiDAR Point Clouds.
- Author
-
Yanjun WANG, Shaochun LI, Mengjie WANG, and Yunhao LIN
- Subjects
DEEP learning ,LIDAR ,POINT cloud ,REMOTE-sensing images ,COMPUTER vision - Abstract
Automatic and accurate classification is a fundamental problem to the analysis and modeling of LiDAR (Light Detection and Ranging) data. Recently, convolutional neural network (ConvNet or CNN) has achieved remarkable performance in image recognition and computer vision. While significant efforts have also been made to develop various deep networks for satellite image scene classification, it still needs to further investigate suitable deep learning network frameworks for 3D dense mobile laser scanning (MLS) data. In this paper, we present a simple deep CNN for multiple object classification based on multi-scale context representation. For the pointwise classification, we first extracted the neighboring points within spatial context and transformed them into a three-channel image for each point. Then, the classification task can be treated as the image recognition using CNN. The proposed CNN architecture adopted common convolution' maximum pooling and rectified linear unit (ReLU) layers, which combined multiple deeper network layers. After being trained and tested on approximately seven million labeled MLS points, the deep CNN model can classify accurately into nine classes. Comparing with the widely used ResNet algorithm, this model performs better precision and recall rates, and less processing time, which indicated the significant potential of deep-learning-based methods in MLS data classification. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
42. A New Model for Image Segmentation Based on Deep Learning.
- Author
-
Mamdouh, Rafeek, El-Khamisy, Nashaat, Amer, Khaled, Riad, Alaa, and El-Bakry, Hazem M.
- Subjects
DEEP learning ,IMAGE segmentation ,COMPUTER vision ,CONVOLUTIONAL neural networks ,IMAGE processing ,ALGORITHMS - Abstract
Image segmentation is main point in computer vision (CV) and image processing (IP), that are used routinely in the fields of medicine and surgery training tools. Segmenting images and converting into a model that depends on work by the different algorithms from analysis DICOM files to convert to three-dimensional models. This paper describes a combination of two fields of solving segmentation problem to convert through the workflow of a hybrid algorithm structure Convolutional neural network (CNN, Active Contour & Deep Multi-Planar) based on seg3d2 to switch DICOMmedical rays "Digital Imaging and Communications in Medicine" into a 3Dimintional model, using data from active contour to be the input of deep learning. the result of the pre-processing from DICOMrawimages, each image contains edges and image size =256 X 256 pixel, which through adjustment and control we can create multiple results for output using Active Contour, by resizing the threshold frames and gray-scale image, and show liver 3D-model Deep architecture, it is through the CNN which the images of the three axes X, Y, and Z (three orthogonal) (coronal = X, sagittal = Y, axial = Z = 1) are determined and matched with a real image of the body, the area required to be determined, and edits the contrast using a histogram. This research will be using are human liver DICOM images and is divided into two stages (medical image segmentation - retinal model optimization), to help surgeons to study the patient's condition with accuracy and efficiency through the use of mixed reality technology in liver surgery [living donor liver transplantation (LDLT)], all implement by Seg3D2 and Python. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
43. Revolutionizing Accessibility: Smart Wheelchair Robot and Mobile Application for Mobility, Assistance, and Home Management.
- Author
-
Jayasekera, Ninura, Kulathunge, Binali, Premaratne, Hirudika, Nilam, Insaf, Rajapaksha, Samantha, and Krishara, Jenny
- Subjects
GLOBAL Positioning System ,MOBILE apps ,MOBILE robots ,PEOPLE with disabilities ,WHEELCHAIRS ,COMPUTER vision - Abstract
This research aims to advance accessibility and inclusivity for individuals with disabilities. We focus on specific daily challenges facing people with disabilities in communication, mobility, and daily task management and introduce AssistEase, a groundbreaking smart wheelchair solution designed to empower people with disabilities by improving mobility, communication capabilities, and daily task management. AssistEase will contribute to the disabled community around the world by allowing them to manage daily tasks and communicate more easily while ensuring mobility. AssistEase offers control options such as handsfree voice control, traditional manual control, smartphone-based Bluetooth control, or innovative gesture control, designed to cater to different user preferences and needs. This uses technologies such as speech recognition, computer vision, and haptic [92] feedback to help users navigate safely while avoiding obstacles. It integrates technologies like Flutter, TensorFlow, YOLOV8, Global Positioning System (GPS), Bluetooth, and Apple Home Kit, along with hardware components including Arduino and Raspberry PI. Preliminary trials have shown improvements in mobility, communication, and daily tasks for users in need. It achieves 95% precision in guiding wheelchair users while maintaining about 90% accuracy for the robotic arm and 89% for health monitoring and location tracking. Also, it provides a user-friendly app with 90% control accuracy. The communication device has 92% accuracy in facilitating user communication, while hand gesture control achieves 90% accuracy. To advance AssistEase smart wheelchair technology, further research, and development are required to enhance its adaptability for specific disabilities. AssistEase reflects a commitment to creating a more inclusive and thriving society, focusing on innovation and inclusion for individuals of all abilities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Rulers2023: An Annotated Dataset of Synthetic and Real Images for Ruler Detection Using Deep Learning.
- Author
-
Matuzevičius, Dalius
- Subjects
DEEP learning ,CONVOLUTIONAL neural networks ,COMPUTER vision ,OBJECT recognition (Computer vision) ,COMPUTER systems - Abstract
This research investigates the usefulness and efficacy of synthetic ruler images for the development of a deep learning-based ruler detection algorithm. Synthetic images offer a compelling alternative to real-world images as data sources in the development and advancement of computer vision systems. This research aims to answer whether using a synthetic dataset of ruler images is sufficient for training an effective ruler detector and to what extent such a detector could benefit from including synthetic images as a data source. The article presents the procedural method for generating synthetic ruler images, describes the methodology for evaluating the synthetic dataset using trained convolutional neural network (CNN)-based ruler detectors, and shares the compiled synthetic and real ruler image datasets. It was found that the synthetic dataset yielded superior results in training the ruler detectors compared with the real image dataset. The results support the utility of synthetic datasets as a viable and advantageous approach to training deep learning models, especially when real-world data collection presents significant logistical challenges. The evidence presented here strongly supports the idea that when carefully generated and used, synthetic data can effectively replace real images in the development of CNN-based detection systems. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
45. Optimizing Multimodal Scene Recognition through Mutual Information-Based Feature Selection in Deep Learning Models.
- Author
-
Hammad, Mohamed, Chelloug, Samia Allaoua, Alayed, Walaa, and El-Latif, Ahmed A. Abd
- Subjects
FEATURE selection ,DEEP learning ,MULTIMODAL user interfaces ,COMPUTER vision ,ARTIFICIAL intelligence ,CONVOLUTIONAL neural networks ,ARTIFICIAL vision - Abstract
The field of scene recognition, which lies at the crossroads of computer vision and artificial intelligence, has experienced notable progress because of scholarly pursuits. This article introduces a novel methodology for scene recognition by combining convolutional neural networks (CNNs) with feature selection techniques based on mutual information (MI). The main goal of our study is to address the limitations inherent in conventional unimodal methods, with the aim of improving the precision and dependability of scene classification. The focus of our research is around the formulation of a comprehensive approach for scene detection, utilizing multimodal deep learning methodologies implemented on a solitary input image. Our work distinguishes itself by the innovative amalgamation of CNN- and MI-based feature selection. This integration provides distinct advantages and enhanced capabilities when compared to prevailing methodologies. In order to assess the effectiveness of our methodology, we performed tests on two openly accessible datasets, namely, the scene categorization dataset and the AID dataset. The results of these studies exhibited notable levels of precision, with accuracies of 100% and 98.83% achieved for the corresponding datasets. These findings surpass the performance of other established techniques. The primary objective of our end-to-end approach is to reduce complexity and resource requirements, hence creating a robust framework for the task of scene categorization. This work significantly advances the practical application of computer vision in various real-world scenarios, leading to a large improvement in the accuracy of scene recognition and interpretation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
46. White Blood Cell Classification: Convolutional Neural Network (CNN) and Vision Transformer (ViT) under Medical Microscope.
- Author
-
Abou Ali, Mohamad, Dornaika, Fadi, and Arganda-Carreras, Ignacio
- Subjects
CONVOLUTIONAL neural networks ,TRANSFORMER models ,LEUCOCYTES ,COMPUTER vision ,DEEP learning ,MULTILAYER perceptrons - Abstract
Deep learning (DL) has made significant advances in computer vision with the advent of vision transformers (ViTs). Unlike convolutional neural networks (CNNs), ViTs use self-attention to extract both local and global features from image data, and then apply residual connections to feed these features directly into a fully networked multilayer perceptron head. In hospitals, hematologists prepare peripheral blood smears (PBSs) and read them under a medical microscope to detect abnormalities in blood counts such as leukemia. However, this task is time-consuming and prone to human error. This study investigated the transfer learning process of the Google ViT and ImageNet CNNs to automate the reading of PBSs. The study used two online PBS datasets, PBC and BCCD, and transferred them into balanced datasets to investigate the influence of data amount and noise immunity on both neural networks. The PBC results showed that the Google ViT is an excellent DL neural solution for data scarcity. The BCCD results showed that the Google ViT is superior to ImageNet CNNs in dealing with unclean, noisy image data because it is able to extract both global and local features and use residual connections, despite the additional time and computational overhead. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
47. 增量角度域损失和多特征融合的地标识别.
- Author
-
毛雪宇 and 彭艳兵
- Subjects
CONVOLUTIONAL neural networks ,IMAGE recognition (Computer vision) ,COGNITIVE bias ,COMPUTER vision ,ALGORITHMS ,FEATURE extraction - Abstract
Copyright of Journal of Image & Graphics is the property of Editorial Office of Journal of Image & Graphics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
48. Structural correlation filters combined with a Gaussian particle filter for hierarchical visual tracking.
- Author
-
Dai, Manna, Xiao, Gao, Cheng, Shuying, Wang, Dadong, and He, Xiangjian
- Subjects
- *
CONVOLUTIONAL neural networks , *ARTIFICIAL satellite tracking , *FILTERS & filtration , *COMPUTER vision , *TRAFFIC engineering , *HUMAN-computer interaction - Abstract
• A homogeneous ensemble strategy is proposed to provide preliminary locating. • A motion detection method based on Lukas-Kanade is utilized for tackling boundary effects. • A facile weight strategy is designed to measure reliability degrees of weak classifiers. • The GPF with CNN features is adopted to execute re-detection and scale estimation of a target. Visual tracking is a key problem for many computer vision applications such as human-computer interaction, intelligent medical diagnosis, navigation and traffic control management. Most of the existing tracking methods are mainly based on correlation filters. However, boundary effect, scale estimation and template updating have not been fully resolved. Herein, this paper presents a new hierarchical tracking method combining structural correlation filters with a Gaussian Particle Filter (GPF), named KCF-GPF. Weak KCF classifiers are constructed via a Lukas-Kanade (LK) method and the preliminary target location is presented as a weighted sum of these classifiers. Specially, a facile weight strategy is implemented to estimate the reliability of each weak classifier. On the basis of the preliminary target location, the GPF using features from a Convolutional Neural Network (CNN) is employed to predict the location and scale of a target. Extensive experiments with the OTB-2013 and the OTB-2015 databases demonstrate that the proposed algorithm performs favourably against state-of-the-art trackers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
49. 深度卷积神经网络图像语义分割研究进展.
- Author
-
青晨, 禹晶, 肖创柏, and 段娟
- Subjects
CONVOLUTIONAL neural networks ,MARKOV random fields ,DEPENDENCE (Statistics) ,COMPUTER vision ,VISUAL learning ,DEEP learning ,SUPERVISED learning ,PARSING (Computer grammar) - Abstract
Copyright of Journal of Image & Graphics is the property of Editorial Office of Journal of Image & Graphics and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2020
- Full Text
- View/download PDF
50. Object detection in optical remote sensing images: A survey and a new benchmark.
- Author
-
Li, Ke, Wan, Gang, Cheng, Gong, Meng, Liqiu, and Han, Junwei
- Subjects
- *
OPTICAL remote sensing , *ARTIFICIAL neural networks , *COMPUTER vision , *DEEP learning - Abstract
Substantial efforts have been devoted more recently to presenting various methods for object detection in optical remote sensing images. However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate. Moreover, most of the existing datasets have some shortcomings, for example, the numbers of images and object categories are small scale, and the image diversity and variations are insufficient. These limitations greatly affect the development of deep learning based object detection methods. In the paper, we provide a comprehensive review of the recent deep learning based object detection progress in both the computer vision and earth observation communities. Then, we propose a large-scale, publicly available benchmark for object DetectIon in Optical Remote sensing images, which we name as DIOR. The dataset contains 23,463 images and 192,472 instances, covering 20 object classes. The proposed DIOR dataset (1) is large-scale on the object categories, on the object instance number, and on the total image number; (2) has a large range of object size variations, not only in terms of spatial resolutions, but also in the aspect of inter- and intra-class size variability across objects; (3) holds big variations as the images are obtained with different imaging conditions, weathers, seasons, and image quality; and (4) has high inter-class similarity and intra-class diversity. The proposed benchmark can help the researchers to develop and validate their data-driven methods. Finally, we evaluate several state-of-the-art approaches on our DIOR dataset to establish a baseline for future research. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.