Descriptor: "Computer vision" / Journal: journal of electronic imaging - Searchworks@Jio Institute Digital Library Search Results

1. Beyond pixels: text-guided deep insights into graphic design image aesthetics.

Author: Shi, Guangyu, Li, Luming, and Song, Mengke
Subjects: *PEARSON correlation (Statistics), *COMPUTER vision, *GRAPHIC design, *COMPUTATIONAL complexity, *AESTHETICS, *DEEP learning
Abstract: The rapid development of computer vision and deep learning has significantly advanced image aesthetic assessment, yet traditional methods, which primarily rely on low-level visual features such as color and texture, often struggle with the complexity of graphic design images. These images are characterized by diverse design elements, including color, typography, and layout, as well as various styles such as minimalism, retro, and modernism, presenting substantial challenges to conventional assessment techniques. To overcome these limitations, we propose an innovative multimodal learning approach that integrates image content with textual descriptions to comprehensively analyze the aesthetic qualities of graphic design images. The core innovation of our method lies in the utilization of two distinct textual description methodologies: holistic descriptions, which capture the main theme of the design, and detailed descriptions, which focus on specific aspects such as composition, color, detail, and atmosphere. This dual approach allows for a more nuanced and complete assessment of aesthetic value. To effectively merge these descriptions with visual content, we introduce a feature similarity blending mechanism that aligns and integrates features from both modalities, enhancing the representation of aesthetic attributes. In addition, we employ a score bagging technique to aggregate scores from multiple fused features, ensuring robustness and reliability in the assessments. Our method is implemented within a multi-task learning framework, enabling simultaneous prediction across multiple rating dimensions. Experimental results demonstrate that, compared with the state-of-the-art TAHF method, our approach achieves notable improvements in Spearman's rank correlation coefficient—by 1.7%, 3.4%, and 2.6% on the HDDI, BAID, and TAD66K datasets, respectively—along with consistent gains in Pearson's linear correlation coefficient and accuracy. Moreover, our method achieves these performance improvements with fewer parameters and lower computational complexity, highlighting its efficiency and effectiveness in graphic design image aesthetic assessment. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

2. Image quality and object detection performance.

Author: Bergstrom, Austin C. and Messinger, David W.
Subjects: *OBJECT recognition (Computer vision), *IMAGE recognition (Computer vision), *COMPUTER vision, *FUNCTIONAL equations, *COMPUTER performance
Abstract: Significant bodies of research have explored the topics of computer vision and image quality, but research on the intersection of these two disciplines remains limited. In addition, evidence suggests that image quality as determined by the human visual system may differ from image quality as determined by the performance of computer vision algorithms. Furthermore, most research on the relationships between image quality and computer vision performance has focused on single-label image classification and has not considered tasks such as semantic segmentation or object detection. We consider the relationship between three primary image quality factors—resolution, blur, and noise—and the performance of deep-learning-based object detection models. To do so, we examine the impacts of these image quality variables on the mean average precision (mAP) of object detection models, evaluating the performance of models trained on only high-quality images as well as models fine-tuned on lower-quality images. In addition, we map our primary image quality variables to the terms used in the general image quality equation—namely ground sample distance, relative edge response, and signal-to-noise ratio—and assess the suitability of the general image quality equation functional form for modeling object detector performance in the presence of significant image distortions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Str-L Pose: integrating point and structured line for relative pose estimation in dual graph.

Author: Zhang, Zherong, Lin, Chunyu, Huang, Shujuan, Yang, Shangrong, and Zhao, Yao
Subjects: *GRAPH neural networks, *COMPUTER vision, *APPLICATION software, *AUTONOMOUS vehicles, *TECHNICAL institutes
Abstract: Relative pose estimation is crucial for various computer vision applications, including robotic and autonomous driving. Current methods primarily depend on selecting and matching feature points prone to incorrect matches, leading to poor performance. Consequently, relying solely on point-matching relationships for pose estimation is a huge challenge. To overcome these limitations, we propose a geometric correspondence graph neural network that integrates point features with extra structured line segments. This integration of matched points and line segments further exploits the geometry constraints and enhances model performance across different environments. We employ the dual-graph module and feature-weighted fusion module to aggregate geometric and visual features effectively, facilitating complex scene understanding. We demonstrate our approach through extensive experiments on the DeMoN and Karlsruhe Institute of Technology and Toyota Technological Institute Odometry datasets. The results show that our method is competitive with state-of-the-art techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder for image translation of dotted Arabic expiration dates.

Author: Zidane, Ahmed and Soliman, Ghada
Subjects: *OPTICAL character recognition, *RECURRENT neural networks, *IMAGE reconstruction, *OPTICAL images, *EXPIRATION
Abstract: We propose an approach of Ladder Bottom-up Convolutional Bidirectional Variational Autoencoder (LCBVAE) architecture for the encoder and decoder, which is trained on the image translation of the dotted Arabic expiration dates by reconstructing the Arabic dotted expiration dates into filled-in expiration dates. We employed a customized and adapted version of the convolutional recurrent neural network (CRNN) model to meet our specific requirements and enhance its performance in our context, and then we trained the custom CRNN model with the filled-in images from the year 2019 to 2027 to extract the expiration dates and assess the model performance of LCBVAE on the expiration date recognition. The pipeline of (LCBVAE + CRNN) can be then integrated into an automated sorting system for extracting the expiry dates and sorting the products accordingly during the manufacturing stage. In addition, it can overcome the manual entry of expiration dates that can be time-consuming and inefficient at the merchants. Due to the lack of available dotted Arabic expiration date images, we created an Arabic dot-matrix TrueType font for the generation of the synthetic images. We trained the model with unrealistic synthetic dates of 60,000 images and performed the testing on a realistic synthetic date of 3000 images from the year 2019 to 2027, represented as yyyy/mm/dd. We demonstrated the significance of the latent bottleneck layer by improving the generalization when the size is increased up to 1024 in downstream transfer learning tasks for image translation. The proposed approach achieved an accuracy of 97% on the image translation using the LCBVAE architecture that can be generalized for any downstream learning tasks for image translation and reconstruction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Multi-head attention with reinforcement learning for supervised video summarization.

Author: Kadam, Bhakti Deepak and Deshpande, Ashwini Mangesh
Subjects: *CONVOLUTIONAL neural networks, *VIDEO summarization, *COMPUTER vision, *REINFORCEMENT learning, *STREAMING video & television
Abstract: With the substantial surge in available internet video data, the intricate task of video summarization has consistently attracted the computer vision research community to summarize the videos meaningfully. Many recent summarization techniques leverage bidirectional long short-term memory for its proficiency in modeling temporal dependencies. However, its effectiveness is limited to short-duration video clips, typically up to 90 to 100 frames. To address this constraint, the proposed approach incorporates global and local multi-head attention, effectively capturing temporal dependencies at both global and local levels. This enhancement enables parallel computation, thereby improving overall performance for longer videos. This work considers video summarization as a supervised learning task and introduces a deep summarization architecture called multi-head attention with reinforcement learning (MHA-RL). The architecture comprises a pretrained convolutional neural network for extracting features from video frames, along with global and local multi-head attention mechanisms for predicting frame importance scores. Additionally, the network integrates an RL-based regressor network to consider the diversity and representativeness of the generated video summary. Extensive experimentation is conducted on benchmark datasets, such as TVSum and SumMe. The proposed method exhibits improved performance compared to the majority of state-of-the-art summarization techniques, as indicated by both qualitative and quantitative results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Light field salient object detection network based on feature enhancement and mutual attention.

Author: Zhu, Xi, Xia, Huai, Wang, Xucheng, and Zheng, Zhenrong
Subjects: *OBJECT recognition (Computer vision), *FEATURE extraction, *COMPUTER vision, *CONVOLUTIONAL neural networks, *GENERALIZATION
Abstract: Light field salient object detection (SOD) is an essential research topic in computer vision, but robust saliency detection in complex scenes is still very challenging. We propose a new method for accurate and robust light field SOD via convolutional neural networks containing feature enhancement modules. First, the light field dataset is extended by geometric transformations such as stretching, cropping, flipping, and rotating. Next, two feature enhancement modules are designed to extract features from RGB images and depth maps, respectively. The obtained feature maps are fed into a two-stream network to train the light field SOD. We propose a mutual attention approach in this process, extracting and fusing features from RGB images and depth maps. Therefore, our network can generate an accurate saliency map from the input light field images after training. The obtained saliency map can provide reliable a priori information for tasks such as semantic segmentation, target recognition, and visual tracking. Experimental results show that the proposed method achieves excellent detection performance in public benchmark datasets and outperforms the state-of-the-art methods. We also verify the generalization and stability of the method in real-world experiments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Fusion 3D object tracking method based on region and point cloud registration.

Author: Jin, Yixin, Zhang, Jiawei, Liu, Yinhua, Mo, Wei, and Chen, Hua
Subjects: *COST functions, *POINT cloud, *VISUAL fields, *COMPUTER vision, *SINGLE-degree-of-freedom systems, *OBJECT tracking (Computer vision), *TRACKING algorithms
Abstract: Tracking rigid objects in three-dimensional (3D) space and 6DoF pose estimating are essential tasks in the field of computer vision. In general, the region-based 3D tracking methods have emerged as the optimal solution for weakly textured objects tracking within intricate scenes in recent years. However, tracking robustness in situations such as partial occlusion and similarly colored backgrounds is relatively poor. To address this issue, an improved region-based tracking method is proposed for achieving accurate 3D object tracking in the presence of partial occlusion and similarly colored backgrounds. First, a regional cost function based on the correspondence line is adopted, and a step function is proposed to alleviate the misclassification of sampling points in scenes. Afterward, in order to reduce the influence of similarly colored background and partial occlusion on the tracking performance, a weight function that fuses color and distance information of the object contour is proposed. Finally, the transformation matrix of the inter-frame motion obtained by the above region-based tracking method is used to initialize the model point cloud, and an improved point cloud registration method is adopted to achieve accurate registration between the model point cloud and the object point cloud to further realize accurate object tracking. The experiments are conducted on the region-based object tracking (RBOT) dataset and the real scenes, respectively. The results demonstrate that the proposed method outperforms the state-of-the-art region-based 3D object tracking method. On the RBOT dataset, the average tracking success rate is improved by 0.5% across five image sequences. In addition, in real scenes with similarly colored backgrounds and partial occlusion, the average tracking accuracy is improved by 0.28 and 0.26 mm, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Joint merging and pruning: adaptive selection of better token compression strategy.

Author: Peng, Wei, Zeng, Liancheng, Zhang, Lizhuo, and Shen, Yue
Subjects: *TRANSFORMER models, *COMPUTER vision, *ARTIFICIAL intelligence, *COMPUTATIONAL complexity, *PREDICTION models
Abstract: Vision transformer (ViT) is widely used to handle artificial intelligence tasks, making significant advances in a variety of computer vision tasks. However, due to the secondary interaction between tokens, the ViT model is inefficient, which greatly limits the application of the ViT model in real scenarios. In recent years, people have noticed that not all tokens contribute equally to the final prediction of the model, so token compression methods have been proposed, which are mainly divided into token pruning and token merging. Yet, we believe that neither pruning only to reduce non-critical tokens nor merging to reduce similar tokens are optimal strategies for token compression. To overcome this challenge, this work proposes a token compression framework: joint merging and pruning (JMP), which adaptively selects a better token compression strategy based on the similarity between critical tokens and non-critical tokens in each sample. JMP effectively reduces computational complexity while maintaining model performance and does not require the introduction of additional trainable parameters, achieving a good balance between efficiency and performance. Taking DeiT-S as an example, JMP reduces floating point operations by 35% and increases throughput by more than 45% while only decreasing accuracy by 0.2% on ImageNet. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Three-dimensional human pose estimation based on contact pressure.

Author: Yin, Ning, Wang, Ke, Wang, Nian, Tang, Jun, and Bao, Wenxia
Subjects: *COMPUTER vision, *EUCLIDEAN distance, *SURFACE pressure, *RESEARCH personnel, *ANGLES, *DEEP learning
Abstract: Various daily behaviors usually exert pressure on the contact surface, such as lying, walking, and sitting. Obviously, the pressure data from the contact surface contain some important biological information for an individual. Recently, a computer vision task, i.e., pose estimation from contact pressure (PECP), has received more and more attention from researchers. Although several deep learning-based methods have been put forward in this field, they cannot achieve accurate prediction using the limited pressure information. To address this issue, we present a multi-task-based PECP model. Specifically, the autoencoder is introduced into our model for reconstructing input pressure data (i.e., the additional task), which can help our model generate high-quality features for the pressure data. Moreover, both the mean squared error and the spectral angle distance are adopted to construct the final loss function, whose aim is to eliminate the Euclidean distance and angle differences between the prediction and ground truth. Extensive experiments on the public dataset show that our method outperforms existing methods significantly in pose prediction from contact pressure. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Temporal residual neural radiance fields for monocular video dynamic human body reconstruction.

Author: Du, Tianle, Wang, Jie, Xie, Xiaolong, Li, Wei, Su, Pengxiang, and Liu, Jie
Subjects: *VIDEO signals, *VISUAL fields, *COMPUTER graphics, *HUMAN body, *SIGNAL-to-noise ratio
Abstract: In the field of computer vision and graphics, high-quality reconstruction of the human body in static scenes has been achieved in recent years by a single multilayer perceptron (MLP) in a number of approaches. However, MLPs have capacity limitations, requiring substantial training time and computational resources for dynamic scene reconstruction, and the quality of reconstruction is significantly constrained. We propose a method for effectively processing complex spatiotemporal signals in dynamic scene human three-dimensional (3D) modeling. The proposed method uses temporal residual neural radiance fields to achieve novel view rendering and new pose synthesis of human bodies. To address the problem of representing temporal signals in video sequences, we construct a temporal residual field that is not related to the MLP architecture. Second, to improve the reconstruction efficiency, we propose an integrated approach that reduces the trainable parameters and accelerates rendering, thereby enhancing the network's feature representation capability. Finally, we design a multi-dimensional loss function to accurately measure the loss between predicted and actual spatial pixel values. The experimental results show that our proposed approach improves the peak signal-to-noise ratio and structural similarity index accuracy metrics compared with the latest representative methods. It maintains a similar accuracy to Anim-NeRF and Neural Body while achieving a nearly 780-fold increase in time efficiency. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Adaptive sparse attention module based on reciprocal nearest neighbors.

Author: Sun, Zhonggui, Zhang, Can, and Zhang, Mingzhu
Subjects: *COMPUTER vision, *RECOMMENDER systems, *INFORMATION filtering, *SPINE, *RECIPROCITY (Psychology)
Abstract: The attention mechanism has become a crucial technique in deep feature representation for computer vision tasks. Using a similarity matrix, it enhances the current feature point with global context from the feature map of the network. However, the indiscriminate utilization of all information can easily introduce some irrelevant contents, inevitably hampering performance. In response to this challenge, sparsing, a common information filtering strategy, has been applied in many related studies. Regrettably, their filtering processes often lack reliability and adaptability. To address this issue, we first define an adaptive-reciprocal nearest neighbors (A-RNN) relationship. In identifying neighbors, it gains flexibility through learning adaptive thresholds. In addition, by introducing a reciprocity mechanism, the reliability of neighbors is ensured. Then, we use A-RNN to rectify the similarity matrix in the conventional attention module. In the specific implementation, to distinctly consider non-local and local information, we introduce two blocks: the non-local sparse constraint block and the local sparse constraint block. The former utilizes A-RNN to sparsify non-local information, whereas the latter uses adaptive thresholds to sparsify local information. As a result, an adaptive sparse attention (ASA) module is achieved, inheriting the advantages of flexibility and reliability from A-RNN. In the validation for the proposed ASA module, we use it to replace the attention module in NLNet and conduct experiments on semantic segmentation benchmarks including Cityscapes, ADE20K and PASCAL VOC 2012. With the same backbone (ResNet101), our ASA module outperforms the conventional attention module and its some state-of-the-art variants. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Stega4NeRF: cover selection steganography for neural radiance fields.

Author: Dong, Weina, Liu, Jia, Chen, Lifeng, Sun, Wenquan, and Pan, Xiaozhong
Subjects: *COMPUTER vision, *RADIANCE, *RADIATION, *CLASSIFICATION, *VIDEOS
Abstract: The implicit neural representation of visual data (such as images, videos, and 3D models) has become a current hotspot in computer vision research. This work proposes a cover selection steganography scheme for neural radiance fields (NeRFs). The message sender first trains an NeRF model selecting any viewpoint in 3D space as the viewpoint key Kv , to generate a unique secret viewpoint image. Subsequently, a message extractor is trained using overfitting to establish a one-to-one mapping between the secret viewpoint image and the secret message. To address the issue of securely transmitting the message extractor in traditional steganography, the message extractor is concealed within a hybrid model performing standard classification tasks. The receiver possesses a shared extractor key Ke , which is used to recover the message extractor from the hybrid model. Then the secret viewpoint image is obtained by NeRF through the viewpoint key Kv , and the secret message is extracted by inputting it into the message extractor. Experimental results demonstrate that the trained message extractor achieves high-speed steganography with a large capacity and attains a 100% message embedding. Additionally, the vast viewpoint key space of NeRF ensures the concealment of the scheme. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Reconstructing images with attention generative adversarial network against adversarial attacks.

Author: Shen, Xiong, Lu, Yiqin, Cheng, Zhe, Mao, Zhongshu, Yang, Zhang, and Qin, Jiancheng
Subjects: *GENERATIVE adversarial networks, *COMPUTER vision, *VISUAL fields, *PROBLEM solving, *ALGORITHMS, *DEEP learning
Abstract: Deep learning is widely used in the field of computer vision, but the emergence of adversarial examples threatens its application. How to effectively detect adversarial examples and correct their labels has become a problem to be solved in this application field. Generative adversarial networks (GANs) can effectively learn the features from images. Based on GAN, this work proposes a defense method called "Reconstructing images with GAN" (RIG). The adversarial examples are generated by attack algorithms reconstructed by the trained generator of RIG to eliminate the perturbations of the adversarial examples, which disturb the models for classification, so that the models can restore their labels when classifying the reconstructed images. In addition, to improve the defensive performance of RIG, the attention mechanism (AM) is introduced to enhance the defense effect of RIG, which is called reconstructing images with attention GAN (RIAG). Experiments show that RIG and RIAG can effectively eliminate the perturbations of the adversarial examples. The results also show that RIAG has a better defensive performance than RIG in eliminating the perturbations of adversarial examples, which indicates that the introduction of AM can effectively improve the defense effect of RIG. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Monocular 3D object detection for distant objects.

Author: Li, Jiahao and Han, Xiaohong
Subjects: *OBJECT recognition (Computer vision), *MONOCULAR vision, *STEREOSCOPIC cameras, *COMPUTER vision, *AUTONOMOUS vehicles
Abstract: Autonomous driving represents the future of transportation, and the precise detection of three-dimensional (3D) objects is a fundamental requirement for achieving autonomous driving capabilities. Presently, 3D object detection primarily relies on sensors, such as monocular cameras, stereo cameras, and LiDAR technology. In comparison to stereo cameras and LiDAR, monocular 3D object detection offers the advantages of a wider field of view and reduced cost. However, the existing monocular 3D object detection techniques exhibit limitations in terms of accuracy, particularly when detecting distant objects. To tackle this challenge, we introduce an innovative approach for monocular 3D object detection, specifically tailored for distant objects. The proposed method classifies objects into distant and nearby categories based on the initial depth estimation, employing distinct feature enhancement and refinement modules for each category. Subsequently, it extracts 3D features and, ultimately, derives precise 3D detection bounding boxes. Experimental results using the KITTI dataset demonstrate that this approach substantially enhances the detection accuracy of distant objects while preserving the detection efficacy for nearby objects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Robust classification with noisy labels using Venn–Abers predictors.

Author: Lemghari, Ichraq, Le Hégarat-Mascle, Sylvie, Aldea, Emanuel, and Vandoni, Jennifer
Subjects: *IMAGE recognition (Computer vision), *COMPUTER vision, *CLASSIFICATION, *NOISE, *ANNOTATIONS
Abstract: The advent of deep learning methods has led to impressive advances in computer vision tasks over the past decades, largely due to their ability to extract non-linear features that are well adapted to the task at hand. For supervised approaches, data labeling is essential to achieve a high level of performance; however, this task can be so fastidious or even troublesome in difficult contexts (e.g., specific defect detection, unconventional data annotations, etc.) that experts can sometimes erroneously provide the wrong ground truth label. Considering classification problems, this paper addresses the issue of handling noisy labels in datasets. Specifically, we first detect the noisy samples of a dataset using set-valued labels and then improve their classification using Venn–Abers predictors. The obtained results reach more than 0.99 and 0.90 accuracy for noisified versions of two widely used image classification datasets, digit MNIST and CIFAR-10 respectively with a 40% two-class pair-flip noise ratio and 0.87 accuracy for CIFAR-10 with 10-class uniform 40% noise ratio. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Flexible machine/deep learning microservice architecture for industrial vision-based quality control on a low-cost device.

Author: Toigo, Stefano, Kasi, Brendon, Fornasier, Daniele, and Cenedese, Angelo
Subjects: *COMPUTER vision, *MACHINE learning, *INDUSTRIAL architecture, *QUALITY control, *PHOTOELECTRIC cells, *DEEP learning
Abstract: This paper aims to delineate a comprehensive method that integrates machine vision and deep learning for quality control within an industrial setting. The proposed innovative approach leverages a microservice architecture that ensures adaptability and flexibility to different scenarios while focusing on the employment of affordable, compact hardware, and it achieves exceptionally high accuracy in performing the quality control task and keeping a minimal computation time. Consequently, the developed system operates entirely on a portable smart camera, eliminating the need for additional sensors such as photocells and external computation, which simplifies the setup and commissioning phases and reduces the overall impact on the production line. By leveraging the integration of the embedded system with the machinery, this approach offers real-time monitoring and analysis capabilities, facilitating the swift detection of defects and deviations from desired standards. Moreover, the low-cost nature of the solution makes it accessible to a wider range of manufacturing enterprises, democratizing quality processes in Industry 5.0. The system was successfully implemented and is fully operational in a real industrial environment, and the experimental results obtained from this implementation are presented in this work. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Generic neural architecture search toolkit for efficient and real-world deployment of visual inspection convolutional neural networks in industry.

Author: Pižurica, Nikola, Pavlović, Kosta, Kovačević, Slavko, Jovančević, Igor, and de Prado, Miguel
Subjects: *CONVOLUTIONAL neural networks, *INSPECTION & review, *COMPUTER vision, *INDUSTRIAL efficiency, *MANUFACTURING processes
Abstract: Visual inspection plays a pivotal role in numerous industrial production processes, and the pursuit of automation has surged with the rise of deep learning and convolutional neural networks (CNNs). Therein, the deployment of visual inspection CNNs on resource-constrained edge devices stands as a critical problem as these devices are the most affordable and well-suited for many industrial applications, e.g., production chains. Nonetheless, it faces challenges in meeting the computational demands of deep CNN models. Consequently, optimizing these models for efficient operation in such settings is imperative. Visual inspection tasks are often highly specialized, differing significantly from general computer vision tasks. As a result, state-of-the-art CNNs can be excessively large for achieving high accuracy on these specific datasets. To address this challenge, this paper introduces a novel approach utilizing neural architecture search (NAS) and hyperparameter optimization. We present the generic toolkit for NAS (GT-NAS), an open-source toolkit available for public use on GitLab (https://gitlab.com/pmf5/open-source/generic-toolkit-for-neural-architecture-search). We showcase the results of applying our methodology to two established state-of-the-art CNN models designed for surface defect detection, a problem that encompasses binary classification and segmentation of images.Our approach yields significantly smaller models relative to baselines, but with accuracy in line with the current state-of-the-art results, demonstrating the potential for enhanced efficiency in industrial visual inspection systems. In one experimental setting (optimizing the Mixed Supervision model on the KolektorSDD2 dataset), GT-NAS produced an architecture that is 6.2 times faster than the baseline while sacrificing only 0.25% of its average precision for binary classification. In another batch of experiments (optimizing the TriNet model on the SensumSODF dataset), GT-NAS also achieved remarkable results. It found a TriNet architecture five times smaller than the baseline, at a small cost of a 0.25% drop in the ROC-AUC classification score on the capsule subset of the SensumSODF dataset. Furthermore, on the softgel subset of the same dataset, GT-NAS produced a model that was 2.7 times smaller than the baseline, yet 0.19% more precise. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Posture recognition method of duty personnel based on human posture key points and convolutional neural network.

Author: Deng, Xiang-Yu, Sheng, Ying, Pei, Hao-Yuan, and Fan, You-Min
Subjects: *CONVOLUTIONAL neural networks, *FEATURE extraction, *INDUSTRIAL efficiency, *POSTURE, *OBJECT recognition (Computer vision)
Abstract: To guarantee the safety and efficiency of industrial production and prevent accidents or losses caused by personnel negligence or negligence, this work proposes a personnel on-duty status recognition method. The method combines a human pose estimation algorithm and a target detection algorithm, which can automatically discriminate six states of personnel on duty. First, the original image is processed using a high-resolution network (HRNet) to generate human pose keypoint maps. Then SE-VGG16 is constructed by combining the squeeze-excitation network and VGG16 for feature extraction of human pose keypoint maps. Finally, the design of the lightweight convolutional neural network for primary classification and you only look once version 5 is used for reclassification for behaviors with similar action features. The experimental results show that the method has an average recognition accuracy of 98.27% with good robustness and generalization ability for six kinds of personnel on-duty status in multiple environments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Enhancement of dark areas on the surface of scrap metals based on RGB-NIR image fusion.

Author: Ma, Tingtian, Ye, Wenhua, Li, Xinying, and He, Huanmin
Subjects: *IMAGE fusion, *SCRAP metals, *METALLIC surfaces, *COMPUTER vision, *IMAGE denoising
Abstract: The application of machine vision in object identification and classification has significantly enhanced recognition efficiency. Nevertheless, for non-ferrous scrap metals with poor surface smoothness, the unevenness of reflected light results in the generation of dark regions in the images, obscuring a considerable amount of detailed information and reducing the recognition accuracy. Addressing these challenges, we propose a method for enhancing the details of dark regions based on the RGB-NIR image fusion theory, integrating detailed information from NIR images into RGB images. First, a robust deep residual denoising network is constructed to estimate and remove noise in images. Subsequently, to address the difficulty of extracting structural features in dark regions, a multi-scale spatial deep structure feature extraction module based on channel attention blocks is developed. This module effectively extracts the structural features of RGB and NIR image pairs, with the target image serving as the supervisory signal. Finally, guided by the theory of structural inconsistency, multi-scale feature maps are fused. The image fusion network adopts an encoder-decoder architecture embedded with residual channel attention blocks. The experimental results indicate that the approach proposed in this study demonstrates notable efficacy in image denoising and detail enhancement. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Enhancing the transferability of adversarial examples on vision transformers.

Author: Guan, Yujiao, Yang, Haoyu, Qu, Xiaotong, and Wang, Xiaodong
Subjects: *TRANSFORMER models, *ARTIFICIAL neural networks, *IMAGE recognition (Computer vision)
Abstract: The advancement of adversarial attack techniques, particularly against neural network architectures, is a crucial area of research in machine learning. Notably, the emergence of vision transformers (ViTs) as a dominant force in computer vision tasks has opened avenues for exploring their vulnerabilities. In this context, we introduce dual gradient optimization for adversarial transferability (DGO-AT), a comprehensive strategy designed to enhance the transferability of adversarial examples in ViTs. DGO-AT incorporates two innovative components: attention gradient smoothing (AGS) and multi-layer perceptron gradient random dropout (GRD-MLP). AGS targets the attention layers of ViTs to smooth gradients and reduce noise, focusing on global features for improved transferability. GRD-MLP, on the other hand, introduces stochasticity into MLP gradient updates, broadening the adversarial examples' applicability. The synergy of these strategies in DGO-AT addresses the unique structural aspects of ViTs, leading to more effective and transferable adversarial attacks. Our comprehensive evaluations of a variety of ViT and CNN models, using the ImageNet dataset, demonstrate that DGO-AT significantly enhances the effectiveness and transferability of attacks, thereby contributing to the ongoing discourse on the adversarial robustness of advanced neural network models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. LET: a local enhancement transformer for low-light image enhancement.

Author: Pan, Lei, Tian, Jun, Zheng, Yuan, Fu, Qiang, and Zhao, Zhiqing
Subjects: *TRANSFORMER models, *IMAGE intensifiers, *FEATURE extraction, *COMPUTER vision, *DIGITAL images, *CUBES
Abstract: Digital images captured under insufficient lighting conditions may suffer from issues such as low contrast and poor visual quality. However, transformers treat images as one-dimensional sequential data, lacking the modeling of local visual structures, thus resulting in a shortage of feature extraction for degraded low-light images. In addition, transformer-based methods require longer training schedules to achieve better performance. We introduce a novel method, named local enhancement transformer (LET). By incorporating convolutions into transformer blocks, we improve our model's capability to extract features from degraded low-light images. Furthermore, we propose a multi-level enhancement block to adaptively fuse features with learnable correlations among different levels. With the support of these two designs, LET can extract more useful features while requiring less training time. Experimental evaluations conducted on LOL and MIT-5K datasets prove that LET is superior to the state-of-the-art. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Assisting RGB and depth salient object detection with nonconvolutional encoder: an improvement approach.

Author: Zhang, Shuo, Song, Mengke, and Li, Luming
Subjects: *CONVOLUTIONAL neural networks, *COMPUTER vision, *TRANSFORMER models, *VIDEO coding, *MULTIMODAL user interfaces
Abstract: RGB-D salient object detection is a challenging task in computer vision, and deep architectures have been widely adopted in the previous studies. However, current convolutional neural network (CNN)-based models struggle with capturing global long-distance features efficiently, whereas transformer-based methods are computationally intensive. To address these limitations, we propose a nonconvolutional feature encoder. This encoder captures long-distance dependencies while reducing computation costs, making it a potential alternative to CNNs and transformers. Additionally, we introduce a spatial info enhancing mechanism to overcome weakened local information while capturing long-range dependencies. This mechanism balances local and global information at different expansion rates by exploring multiscale feature fusion in the feature maps. Furthermore, we introduce a spatial info sensing module to enhance the compatibility of multimodal features in long-range dependencies and extract informative cues from depth features. Through comprehensive experiments on four widely used datasets, we demonstrate that our proposed involution encoder significantly outperforms previous state-of-the-art RGB-D salient object detection methods based on CNNs in four key metrics. Compared to transformer-based methods, our approach balances speed and efficiency favorably. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Double-attention mechanism-based segmentation grasping detection network.

Author: Li, Qinghua, Wang, Xuyang, Zhang, Kun, Yang, Yiran, and Feng, Chao
Subjects: *ARTIFICIAL neural networks, *PREHENSION (Physiology), *COMPUTER vision, *THUMB
Abstract: In practical scenarios, detecting and grasping objects accurately can be very challenging due to the uncertainty of their positions and orientations, as well as environmental interference. Especially when the target object is occluded by other objects, traditional machine vision methods have difficulty in accurately recognizing it. To address this problem, we propose the double-attention mechanism-based segmentation grasping detection network (DAM-SGNET). DAM-SGNET is a technique used for detecting and grasping objects accurately in cluttered environments. It utilizes a deep neural network that incorporates two attention mechanisms to predict the optimal grasping posture for RGB images at the pixel level without relying on depth images. The method begins by reannotating datasets, such as the Cornell dataset, cluttered scenes objects dataset, and VMRD dataset, with a new labeling method proposed by previous researchers. These datasets are then used to train an occlusion detection model. DAM-SGNET uses a residual network (SERESNET) with channel attention mechanisms to extract features from the images, and an adaptive decoder including a feature pyramid deformation network and an efficient channel attention module to enhance robustness in cluttered, unstructured open environments. DAM-SGNET ultimately achieves grasp detection accuracy of 99.43%, 99.24%, and 85.38% for the official Cornell grasp dataset, the cluttered scenes grasping dataset, and the VMRD grasping dataset, respectively. Real-world experiments demonstrate the efficacy of DAM-SGNET in self-built robotic arm platforms, achieving a single-target grasping success rate of 99.6%, and an average grasping success rate of 96.46% for cluttered stacked objects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. EBiDA-FPN: enhanced bi-directional attention feature pyramid network for object detection.

Author: Yang, Xiaobao, He, Yulong, Wu, Junsheng, Wang, Wentao, Sun, Wei, Ma, Sugang, and Hou, Zhiqiang
Subjects: *OBJECT recognition (Computer vision), *CONVOLUTIONAL neural networks, *PYRAMIDS, *COMPUTER vision
Abstract: As a fundamental task in computer vision, object detection has long been a challenging visual task. However, current object detection models lack attention to salient features when fusing the lateral connections and top-down information flows in feature pyramid networks (FPNs). To address this, we propose a method for object detection based on an enhanced bi-directional attention feature pyramid network, which aims to enhance the feature representation capability of lateral connections and top-down links in FPN. This method adopts the triplet module to give attention to salient features in the original multi-scale information in spatial and channel dimensions, establishing an enhanced triplet attention. In addition, it introduces improved top and down attention to fuse contextual information using the correlation of features between adjacent scales. Furthermore, adaptively spatial feature fusion and self-attention are introduced to expand the receptive field and improve the detection performance of deep levels. Extensive experiments conducted on the PASCAL VOC, MS COCO, KITTI, and CrowdHuman datasets demonstrate that our method achieves performance gains of 1.8%, 0.8%, 0.5%, and 0.2%, respectively. These results indicate that our method has significant effects and is competitive compared with advanced detectors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Pola4All: survey of polarimetric applications and an open-source toolkit to analyze polarization.

Author: Rodriguez, Joaquin, Lew-Yan-Voon, Lew-Fock-Chong, Martins, Renato, and Morel, Olivier
Subjects: *POSE estimation (Computer vision), *OPTICAL polarization, *COMPUTER vision, *IMAGE processing, *UNDERWATER navigation, *MICROGRIDS
Abstract: Polarization information of the light can provide rich cues for computer vision and scene understanding tasks, such as the type of material, pose, and shape of the objects. With the advent of new and cheap polarimetric sensors, this imaging modality is becoming accessible to a wider public for solving problems, such as pose estimation, 3D reconstruction, underwater navigation, and depth estimation. However, we observe several limitations regarding the usage of this sensorial modality, as well as a lack of standards and publicly available tools to analyze polarization images. Furthermore, although polarization camera manufacturers usually provide acquisition tools to interface with their cameras, they rarely include processing algorithms that make use of the polarization information. In this work, we review recent advances in applications that involve polarization imaging, including a comprehensive survey of recent advances on polarization for vision and robotics perception tasks. We also introduce a complete software toolkit that provides common standards to communicate with and process information from most of the existing micro-grid polarization cameras on the market. The toolkit also implements several image processing algorithms for this modality, and it is publicly available on GitHub. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Zernike moment invariants for hand vein pattern description from raw biometric data.

Author: Castro-Ortega, Raúl, Toxqui-Quitl, Carina, Padilla-Vivanco, Alfonso, Solís-Villarreal, Jose Francisco, and Orozco-Guillén, Eber Enrique
Subjects: *BIOMETRIC identification, *VEINS, *AFFINE transformations, *IMAGING systems, *IMAGE recognition (Computer vision)
Abstract: We propose an invariant description method based on Zernike moments to classify hand vein patterns from raw infrared (IR) images. Orthogonal moments provide linearly independent descriptors and are invariant to affine transformations, such as translation, rotation, and scaling. A mathematical expression is given to derive a set of moment invariants. The obtained features have all the properties of moment invariants with the additional feature of image contrast invariance. For dorsal hand vein pattern acquisition, an IR imaging system is implemented. Also, a public database is used for a palm vein recognition task. A correct rate classification (CRC) above 99.9% is achieved using a set of rotation, scale, and intensity Zernike moment invariants. Additionally, multilayer perceptron and K-nearest neighbors are used as classifiers having as input data the Zernike normalized moments. A discriminative feature evaluation of the image moments allows the reduction of the number of descriptors while maintaining a high classification rate of 99%. The efficiency of the moment descriptors is evaluated in terms of accuracy and reduced computational cost by (a) avoiding the necessity of a preprocessing stage and (b) reducing the feature vector dimension. Experimental results show that Zernike moment invariants are able to achieve hand vein recognition without image preprocessing or image normalization with respect to change of size, rotation, and intensity. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

27. Block-based automatic road defect recognition approach.

Author: Chen, Junde, Haq, Anwar Ul, and Zhang, Defu
Subjects: *ARTIFICIAL neural networks, *RANDOM walks, *IMAGE segmentation, *IMAGE analysis, *ROAD maintenance, *PATTERN recognition systems, *COMPUTER vision
Abstract: Efficient maintenance of road networks is an important factor for the smooth flow of traffic and is highly dependent on the timely assessment of road conditions. Road defects such as cracks and potholes cause significant safety and economic problems. Automation of these activities is vital for efficient and cost-effective maintenance of road networks. Image analysis techniques have widely been used and proven to be effective in fields such as medical image processing, face recognition, pattern recognition, and computer vision. We propose a two-stage modular approach for road defect recognition. The first module deals with image segmentation using a variant of random walk algorithm, which employs boost C-means clustering for selection of initial seeds. The second module conducts block-based image classification using an enhanced artificial neural network, which utilizes genetic algorithm for weight optimization. Experimental results show that the proposed approach is effective for the automatic recognition of pavement defects. The accuracy is not <73 % , even when multiple defect categories are considered. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

28. Real-time object detection algorithm based on improved YOLOv3.

Author: Zhang, Xiuling, Dong, Xiaopeng, Wei, Qijun, and Zhou, Kaixuan
Subjects: *K-means clustering, *COMPUTER vision, *ALGORITHMS, *FEATURE extraction
Abstract: Object detection is a challenging computer vision problem with numerous practical applications. Due to low accuracy and slow detection speed in object detection, we propose a real-time object detection algorithm based on YOLOv3. First, to solve the problem that features are likely to be lost in the feature extraction process of YOLOv3, a DB-Darknet-53 feature extraction network embedded in inception structure is designed, which effectively reduces the loss of features. Second, the detection network of YOLOv3 and the reuse of deep features in multiscale detection network are improved. Finally, the numbers and sizes of anchor boxes are selected by K-means clustering analysis, and the detection model is obtained by means of multiscale training. The improved algorithm has a mean average precision of 0.835 on the PASCAL VOC data set and a detection speed of 35.8 f / s, which is better than YOLOv3. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

29. Road detection using cycle-consistent adversarial networks.

Author: Wang, Yucheng, Zhang, Juan, Jiang, Hao, and Fang, Zhijun
Subjects: *COMPUTER vision, *DEEP learning, *ROADS
Abstract: Today, road detection is still a challenging task in intelligent driving. With the continuous improvement of computer vision, many methods of deep learning are used for road detection because they can achieve image features at a deeper level and discover road areas from raw RGB data. However, the method to detect the road areas accurately needs to be improved. We present a method that can extract the road area features and complete road detection tasks. Our method mainly includes the following points: (1) to introduce the cycle-consistent adversarial network to extract the road area features in a picture and complete image to image conversion and (2) to complete road detection by adding a new model and to improve the accuracy of detection. The results of our method are evaluated by uploading to the Karlsruhe Institute of Technology and Toyota Technological Institute road detection benchmark and named it as “road detection cycle-consistent adversarial networks.” Our method achieves an overall max F-measure of 88.63% and precision of 91.35%. In addition to high precision, our method also has a good robustness. Meanwhile, the accuracy for narrow road areas needs to be optimized in the future. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

30. Pose-invariant three-dimensional face reconstruction.

Author: Jiang, Lei, Wu, Xiao-Jun, and Kittler, Josef
Subjects: *IMAGE reconstruction algorithms, *POSE estimation (Computer vision), *IMAGE reconstruction, *VISUAL fields, *COMPUTER vision
Abstract: Three-dimensional (3-D) face reconstruction is an important task in the field of computer vision. Although 3-D face reconstruction has been developing rapidly in recent years, large pose face reconstruction is still a challenge. That is because much of the information about a face in a large pose will be unknowable. In order to address this issue, we propose a 3-D face reconstruction algorithm (PIFR) based on 3-D morphable model. A model alignment formulation is developed in which the original image and a normalized frontal image are combined to define a weighted loss in a landmark fitting process, with the intuition that the original image provides more expression and pose information, whereas the normalized image provides more identity information. Our method solves the problem of face reconstruction of a single image of a traditional method in a large pose, works on arbitrary pose and expressions, and greatly improves the accuracy of reconstruction. Experiments on the challenging AFW, LFPW, and AFLW database show that our algorithm significantly improves the accuracy of 3-D face reconstruction even under extreme poses (±90 yaw angles). [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

31. Intelligent terminal face spoofing detection algorithm based on deep belief network.

Author: Li, Yuancheng, Wang, Yuanyuan, Hao, Shuhua, and Zhao, Xiaoyu
Subjects: *HUMAN facial recognition software, *DEEP learning, *OPTICAL flow, *COMPUTER vision, *COMPUTER performance, *ALGORITHMS
Abstract: In recent years, face recognition has rapidly developed in the field of smartphones and control systems. It has been used to unlock the telephone and face-payment applications. With such rapid development, more and more demands are placed on the security of face recognition. However, face biometric-based recognition technologies are still vulnerable to spoofing attacks. Thus, developing robust and reliable antispoofing attack detection is critical to guarantee the security of facial analysis-based authentication. As deep learning techniques have achieved satisfactory performances in computer vision, they can also be employed to face spoofing detection. We present a multichannel linear local binary pattern optimization algorithm, which combines with Lucas–Kanade optical flow algorithm. The extracted facial features are fused and sent to a deep belief network classifier for classification and learning, and finally tested on the MSU-mobile facial spoofing database. Compared with existing deep leaning-based detection methods, our face spoofing detection algorithm has better scalability and robustness. To evaluate the performance of the proposed algorithm, the experiments are conducted on three crossed standard spoofing databases and excellent performance is also achieved. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

32. Joint feature fusion and optimization via deep discriminative model for mobile palmprint verification.

Author: Izadpanahkakhk, Mahdieh, Razavi, Seyyed Mohammad, Taghipour-Gorjikolaie, Mehran, Zahiri, Seyyed Hamid, and Uncini, Aurelio
Subjects: *COMPUTER vision, *ERROR rates, *BIG data
Abstract: With recent advances in pattern recognition and computer vision, mobile palmprint authentication has become an emerging field to provide better facilities and ubiquitous computing for scientific and commercial communities. To effectively streamline this issue, researchers focus on improving authentication performance by designing deep convolutional neural networks. Despite the high potential of the state-of-the-art methods, the challenges of preprocessing computation cost, lack of training samples for big data application, and discriminative feature optimization remain to be carefully addressed. A deep mobile palmprint verification framework focusing on discriminative feature representation is proposed. To this end, an automatic feature mapping is learned from two well-known deep architectures via an effective weighted loss function. Thereafter, a convolution-based feature fusion block is followed by a surrogate model in the feature-matching phase for palmprint verification. From a practical point of view, our framework is cost-effective and can represent discriminative features with high performance. We demonstrate the effectiveness of our framework and mobile database for palmprint verification task beating the state-of-the-art on standard benchmarks. Moreover, experimental results show that our model outperforms previous ones, especially for the few-shot learning application, achieving equal error rates of 0.0281% and 0.0197% for IIT Delhi Touchless Palmprint Database and Hong Kong PolyU Palmprint databases, respectively. It is notable that all codes are open-source and may be accessed online. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

33. Development and analysis of a real-time system for automated detection of improvised explosive device indicators from ground vehicles.

Author: de Wouw, Dennis W. J. M. van, Haar, Frank B. ter, Dubbelman, Gijs, and de With, Peter H. N.
Subjects: *IMPROVISED explosive devices, *GRAPHICAL user interfaces, *SYSTEM analysis, *STEREOSCOPIC cameras, *REAL-time computing, *IN-vehicle computing
Abstract: We propose a real-time change detection system to be used as a vehicle-mounted early-warning system for indicators of improvised explosive devices. Within the context of military route clearance, the system automatically detects suspicious changes in the environment with respect to a previous patrol. For this purpose, historical images of the live scene are retrieved from a database and registered to the live image through 2.5-D view synthesis, using the three-dimensional (3-D) scene geometry acquired from a stereo camera. Changes are then found using local-area statistics in the CIE-Lab color space. A set of spatiotemporal filters is used to reject irrelevant alarms, resulting in a limited set of confident changes to be presented to the operator through an interactive graphical user interface. Next to the algorithmic contributions, we elaborate on the real-time design, featuring graphical processing units for the most time-consuming processing tasks, a pipelined architecture to increase the system throughput, and we split the system into a live and offline processing chain. This way, real-time change detection at 3.5 fps is achieved on images of 1920 × 1440 pixels. Finally, an extensive system validation featuring realistic experiments shows promising detection capabilities and robustness to, e.g., lateral displacements of up to 6 m. © 2019 SPIE and IS&T [DOI: 10.1117/1.JEI.28.4.043009] [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

34. Disparity estimation using multilevel and global information.

Author: Zhang, Yaru, Lin, Hongbin, Wu, Chao, and Liu, Bin
Subjects: *STEREO vision (Computer science), *PATTERN recognition systems, *COMPUTER vision
Abstract: Deep convolutional neural networks have shown prominent performance in stereo matching. However, current network architectures lack performance in exploiting context and global information to finding corresponding points in ill-posed regions. A stereo matching network without postprocessing is proposed to solve this problem. This network combines the improved multilevel feature pyramid pooling module with the light two-dimensional (2-D) convolution subnetwork to efficiently utilize multilevel information and global information. In the multilevel feature pyramid pooling module, the base image feature is extracted by cascading three small convolution filters. Features of a stereo image pair are calculated by hierarchically fusing and pooling features information of the same scale after using the residual network. Multilevel semantic information is fully utilized to improve the robustness of image feature representation in multilevel feature pyramid pooling module. In the light 2-D convolution subnetwork, low-level structural information is obtained from the target image by three concatenated convolution layers with small convolution filters. Low-level information is used to rectify matching cost with global view to improve matching accuracy. The experimental results on the Scene Flow dataset, the MPI Sintel dataset, and the Middlebury dataset show that the performance obtained by the proposed network can be improved in the ill-posed regions. Matching accuracy is competitive compared to other results obtained by end-to-end networks without postprocessing. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

35. Object codetection based on a higher-order conditional random field.

Author: Jiang, Linfeng, Zhong, Weilin, Ji, Jinsheng, and Xiong, Huilin
Subjects: *CONDITIONAL random fields, *OBJECT recognition (Computer vision), *OPTICAL engineering, *OBJECT tracking (Computer vision)
Abstract: In recent years, codetecting objects through the use of contextual information across multiple images has attracted considerable attention. We introduce an object codetection method that exploits contextual information among multiple images through a higher-order conditional random field (CRF). First, we obtain object candidates from each image of a test set by using a pretrained detector. Second, we feed the object candidates into a higher-order CRF that captures the appearance similarity using pairwise potentials and object category cooccurrence constraints using higher-order potentials. Finally, we jointly predict the category labels of all object candidates through the mean field inference in the CRF. Experimental results on the Caltech Pedestrian, PASCAL VOC 2007, PASCAL VOC 2012, and COCO datasets demonstrate the effectiveness of the proposed method compared to the baseline method. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

36. Layered approach for improving the quality of free-viewpoint depth-image-based rendering images.

Author: Smirnov, Sergey, Battisti, Federica, and Gotchev, Atanas
Subjects: *COLOR, *COMPUTER vision, *OPTICAL engineering
Abstract: In free-viewpoint rendering systems, one of the most challenging goals is the creation of virtual views based on available color texture (RGB) and depth data. Conventional depth-image-based rendering (DIBR) approaches have assumed that the virtual camera can only be displaced horizontally, thus leading to fairly simple disocclusion artifacts. However, in free-viewpoint DIBR, the virtual camera can be positioned in an arbitrary way and the respective disocclusion artifacts can exhibit complicated anisotropic appearances. Consequently, conventional approaches for compensating disocclusion holes usually fail in such arbitrary camera motion. We present a disocclusion compensation technique based on texture inpainting. We propose a layered representation of both the color and depth images in local foreground, background, and undefined segments (a trimap). This representation allows for employing an efficient alpha-matting approach for reconstructing the underlying opacity layer followed by a background compensation and layered rendering. The performance of the proposed method is evaluated with respect to the state-of-the-art through objective and subjective tests. The achieved results, especially for large camera displacements, outperform the state-of-the-art. Those results assess the effectiveness of the proposed method and highlight the need for new quality metrics able to address the impairments of this type of content. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

37. Modeling realistic optical aberrations to reuse existing drive scene recordings for autonomous driving validation.

Author: Lehmann, Matthias, Wittpahl, Christian, Zakour, Hatem Ben, and Braun, Alexander
Subjects: *OPTICAL aberrations, *COMPUTER software reusability, *OPTICAL transfer function, *DRIVER assistance systems, *TECHNOLOGY, *FOURIER transform optics
Abstract: Training autonomous vehicles requires lots of driving sequences in all situations. Collecting and labeling these drive scenes is a very time-consuming and expensive process. Currently, it is not possible to reuse these drive scenes with different optical properties, because there exists no numerically efficient model for the transfer function of the optical system. We present a numerical model for the point spread function (PSF) of an optical system that can efficiently model both experimental measurements and lens design simulations of the PSF. The numerical basis for this model is a nonlinear regression of the PSF with an artificial neural network. The novelty lies in the portability and the parameterization of this model. We present a lens measurement series, yielding a numerical function for the PSF that depends only on the parameters defocus, field, and azimuth. By convolving existing images and videos with this PSF, we generate images as if seen through the measured lens. The methodology applies to any optical scenario, but we focus on the context of autonomous driving, where the quality of the detection algorithms depends directly on the optical quality of the used camera system. With this model, it is possible to reuse existing recordings, with the potential to avoid millions of test drive miles. The parameterization of the optical model allows fora method to validate the functional and safety limits of camera-based advanced driver assistance systems based on the real, measured lens actually used in the product. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

38. AMEMD-FSL: fuse attention mechanism and earth mover's distance metric network to deep learning for few-shot image recognition.

Author: Liang, Yong, Chen, Zetao, Cui, Qi, Li, Xinhai, Lin, Daoqian, and Tan, Junwen
Subjects: *DEEP learning, *IMAGE recognition (Computer vision), *COMPUTER vision, *STORAGE & moving industry, *PROBLEM solving, *BIG data
Abstract: In computer vision, image recognition is one of the classic tasks. Currently, with the foundation of big data and advanced hardware, deep learning has achieved high accuracy. However, deep learning often fails to perform well when faced with a small number of samples. Therefore, few-shot learning has become a key technology to solve this problem. The learning paradigm of few-shot learning is different from that of deep learning. It aims to learn a universal representation from multiple training categories, used for recognition in new categories. Each few-shot learning training instance consists of a group of images and an unlabeled sample. The goal is to enable the model to perform well in recognizing new categories. To achieve this, the model needs to extract representative and highly generalizable features that enable the correct recognition of new category samples. To address the problem of small sample space being unable to describe enough dataset's semantic features, we propose the attention mechanism and earth mover's distance for few-shot learning (AMEMD-FSL) method. First, we fuse the attention mechanism (AM) to deep learning to help the model extract more semantically rich features. Then we use the earth mover's distance (EMD) metric method to calculate the distance between samples, enabling better classification. Finally, we combine the deep-learning residual network and AMEMD to perform few-shot learning. We validate our algorithm on the Caltech-UCSD Birds-200-2011 dataset and the few-shot public dataset mini-ImageNet, which comes from the DeepMind team. The experimental results demonstrate that we have proposed an end-to-end and effective method in the field of few-shot image classification. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

39. Aerial tracking of camouflaged people in woodlands.

Author: Liu, Yang, Wang, Cong-Qing, Xu, Bin, and Zhou, Yong-Jun
Subjects: *ARTIFICIAL satellite tracking, *COMPUTER vision, *FORESTS & forestry, *DYNAMIC programming, *DRONE aircraft
Abstract: With the remarkable advances of unmanned aerial vehicles (UAVs) and machine vision, aerial tracking has attracted wide attention from scholars. Previous tracking methods were mostly implemented in clean and well-lit environments, making it challenging to track camouflaged people rapidly and accurately in woodlands. We develop a framework for camouflaged people aerial tracking (CPAT) based on transformer. Specifically, a camouflaged people discovery strategy is proposed to rapidly generate training samples from the unlabeled videos captured by the UAV. Dynamic programming is also employed to filter noises to generate smooth candidate frames. To exploit multilevel feature information, a transformer fusion framework is designed to integrate shallow spatial information and in-depth semantic features. For reducing computing consumption, the spatial attention reduction mechanism is embedded in the multihead attention for fast tracking. Further, we build a dataset for evaluating the effect of camouflaged people tracking called Cam235, which consists of 85 manually labeled test sequences and more than 100k frames of the unlabeled training set. Exhaustive experiments on Cam235-test and popular tracking datasets prove that the CPAT is superior to other trackers for practical application. Under the most challenging condition of camouflaged people tracking, the CPAT achieves the precision of 67.9%, surpassing the state-of-the-art trackers by large margins. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

40. Abnormal event detection based on masked reconstruction and dual-channel adversarial prediction.

Author: Tan, Lunzheng, Weng, Yu, Xia, Limin, and Xiao, Jiusheng
Subjects: *ARTIFICIAL neural networks, *IMAGE reconstruction, *COMPUTER vision, *DEEP learning, *FORECASTING, *VIDEO coding
Abstract: Abnormal event detection in computer vision addresses the task of identifying events that deviate from expected behavior in video scenes. Issues, such as occlusion in crowded scenes, the powerful generalization capabilities of deep neural networks, and the heavy reliance on contextual information, make this task particularly challenging. To address these issues, we propose a cascaded form of abnormal detection framework that combines the paradigms of reconstruction and prediction in this paper. First, stochastic masking techniques are employed for image reconstruction to alleviate the overgeneralization of neural networks under abnormal conditions. Second, an innovative motion characterization of frame-difference streak streams is introduced to better characterize the motion of video frames in crowded scenes. Finally, a dual-channel autoencoder-based prediction network is introduced to jointly learn appearance and motion features. This network captures contextual information to better generate predictive features. Meanwhile, adversarial learning is introduced for abnormal inference to improve the detection performance. Experimental results on several benchmark datasets validate the effectiveness of our approach. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

41. Image inpainting based on double joint predictive filtering and Wasserstein generative adversarial networks.

Author: Liu, Yuanchen and Pan, Zhongliang
Subjects: *GENERATIVE adversarial networks, *INPAINTING, *FILTERS & filtration, *COMPUTER vision, *PERCEPTUAL learning, *SIGNAL-to-noise ratio
Abstract: Image inpainting is promising but challenging in computer vision tasks; it aims to fill in missing regions of corrupted images with semantically sensible content. By utilizing generative adversarial networks (GAN), state-of-the-art methods have achieved great improvements, but the ordinary GAN generally suffers from difficulties in training and unstable gradients, leading to unsatisfactory inpainting results. Image-level predictive filtering is a widely used restoration method that adaptively predicts the weights of pixels around a target pixel and then linearly combines these pixels to generate the image, but it cannot fill larger missing regions. Thus, we extend image-level predictive filtering to the deep feature level through an encoder–decoder network and embed adaptive channel attention and spatial attention modules in the encoder network. We use Wasserstein GAN instead of normal GAN due to its superior properties and then combine it with image-level predictive filtering and deep feature-level predictive filtering, which ultimately leads to a significant improvement in image inpainting. We validate our method on two public datasets: CelebA-HQ and Places2. Our method demonstrates good performance across four metrics: peak signal-to-noise ratio, L1, structural similarity index measure, and learned perceptual image patch similarity. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

42. CrackF-Net: a pixel-level segmentation network for pavement crack detection.

Author: Luan, Shen, Gao, Xingen, Wang, Chen, Zhang, Hongyi, Chao, Fei, Lin, Juqiang, Huang, Junqi, Jiang, Huali, and Lin, Feng
Subjects: *CRACKING of pavements, *CONVOLUTIONAL neural networks, *COMPUTER vision, *IMAGE segmentation, *ADAPTIVE filters, *FEATURE selection
Abstract: Detecting pavement cracks from images is a complex computer vision task due to their varying shapes, backgrounds, and sizes. We propose CrackF-Net, an end-to-end convolutional neural network for automatic crack detection in road images. We construct the CrackF-Net network using an encoder–decoder architecture to extract image features in convolutional blocks with residuals and fuse the multiscale convolutional features produced by the decoder. Convolutional blocks with residuals are used to capture the strong semantic features of cracks, and an adaptive filter fusion module is proposed to assist the network make a selection of filter fusion features on the channels. CrackF-Net fuses the multiscale features in decoder to improve crack detection performance. The proposed CrackF-Net is compared to other advanced crack detection methods using three public datasets. The experimental results show that CrackF-Net achieves state-of-the-art performance, which obtains F-measures of 0.866, 0.737, and 0.852 on the three datasets. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

43. (Retracted) Deep belief network-based image processing for local directional segmentation in brain tumor detection.

Author: Doshi, Ruchi, Hiran, Kamal Kant, Doppala, Bhanu Prakash, and Vyas, Ajay Kumar
Subjects: *BRAIN tumors, *COMPUTER vision, *IMAGE processing, *FUZZY algorithms, *DATABASES
Abstract: The Editor-in-Chief and the publisher have retracted this article, which was submitted as part of a guest-edited special section. An investigation uncovered evidence of systematic manipulation of the publication process, including compromised peer review. The Editor and publisher no longer have confidence in the results and conclusions of the article. RD, KKH, BPD, and AKV either did not respond directly or could not be reached. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

44. Deformable element-wise dynamic convolution.

Author: Kim, Wonjik, Tanaka, Masayuki, Sasaki, Yoko, and Okutomi, Masatoshi
Subjects: *IMAGE recognition (Computer vision), *COMPUTER vision, *COMPUTER performance, *NETWORK performance, *MATHEMATICAL convolutions, *ADAPTIVE control systems
Abstract: The shape and values of a typical static convolution kernel remain fixed once the network is trained. Recently, dynamic convolutions were proposed to change the kernel's values depending on the input during the test phase. We aim to extend the concept of dynamic convolutions by introducing an element-wise dynamic convolution approach. This method enables adaptive changes in kernel values for each output data element. Furthermore, a deformable element-wise dynamic convolution is proposed to enable simultaneous changes in kernel shape and value. The proposed deformable dynamic convolution is compatible with the static convolution in terms of input–output relationships. The capability of existing network architectures can be enhanced by replacing the static convolution with the suggested deformable dynamic convolution. Extensive experiments demonstrate that the proposed deformable dynamic convolution can improve the network performance in various computer vision tasks, including image classification and semantic segmentation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

45. Low-illumination image contrast enhancement using adaptive gamma correction and deep learning model for person identification and verification.

Author: Tommandru, Suresh and Sandanam, Domnic
Subjects: *DEEP learning, *HUMAN facial recognition software, *IMAGE intensifiers, *COMPUTER vision, *VISUAL fields
Abstract: Person verification based on face detection and face recognition is a very important research area in the field of computer vision as it provides authentication before permitting access to resources ensuring safety and security. It is a challenging task to identify and verify a person in low-illumination images, this is because the facial features of a person in a low-illumination image are not clear as the image is of poorer quality than that of an image taken with good illumination. The existing hand-crafted feature-based approaches and deep learning models for low-illumination image contrast enhancement are typically unsatisfactory in the applications of person verification either due to over-enhancing of the image or restricting the contrast of the image while dealing with light illumination. To achieve more accurate face detection and face recognition in low light images, a new approach based on adaptive gamma correction and deep learning model is proposed in this research paper. In this work, two methods: feature-based adaptive gamma correction (FAGC) and deep learning-based adaptive gamma correction (DLAGC) are proposed for contrast enhancement. The proposed approach uses the new adaptive gamma correction-based methods (FAGC, DLAGC) for the image contrast enhancement and applies deep learning models to detect and recognize the face in the enhanced image. The enhancement of the brightness difference between objects and their backgrounds achieved by the proposed adaptive gamma correction-based methods enables the deep learning model to extract the quality semantic information, which improves the accuracy of person verification. The proposed approach is evaluated on Extended Yale Face (EYF) dataset, which is a low-illumination image dataset. The proposed framework with FAGC and DLAGC for person verification achieves an improvement of 24% and 30%, respectively, on EYF dataset and 2.5% and 10%, respectively, on Specs on Faces dataset when compared to the existing techniques. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

46. Circle detection algorithm based on neighborhood density clustering.

Author: Li, Ziliang, Wang, Tao, Zhang, Jinzhu, Bai, Jianxin, Shi, Wei, and Huang, Qingxue
Subjects: *CIRCLE, *HOUGH transforms, *PATTERN recognition systems, *COMPUTER vision, *NEIGHBORHOODS, *ALGORITHMS, *ARTIFICIAL intelligence
Abstract: Circle detection in images is one of the key technologies in machine vision, pattern recognition, and artificial intelligence. However, conventional circle detection methods are sensitive to complex scenes, noise, and occlusion in images. Solving the impact of these situations is still the focus of circle detection algorithm research. Therefore, a circle detection algorithm based on neighborhood density clustering (NDC) is proposed. The proposed algorithm calculates circle parameters of connected regions after extracting corners, corroding the corners, and marking the connected regions. Then, NDC of the circle parameters is executed to classify arcs belonging to the same circle into one category to acquire a virtual connected region. And the circle parameters of the virtual connected region are clustered using NDC again to obtain the circle parameter dataset of the circle to be detected. The precise circle parameters are further estimated by calculating the centroid of each category. To prevent false positives, candidate circles are verified through a ratio rule. Extensive experiments using both synthetic and real images were performed. The results compared with those of representative state-of-the-art methods demonstrate that the proposed algorithm can be applied to a variety of complex scenes and has several advantages: good anti-occlusion effect, more robustness against noise, high accuracy, and better performance. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

47. Research on plant seeds recognition based on fine-grained image classification.

Author: Yuan, Min, Dong, Yongkang, Lu, Fuxiang, Zhan, Kun, Zhu, Liye, Shen, Jiacheng, Ren, Dingbang, Hu, Xiaowen, and Lv, Ningning
Subjects: *IMAGE recognition (Computer vision), *CONVOLUTIONAL neural networks, *ARTIFICIAL intelligence, *PLANT classification, *TRANSFORMER models, *DEEP learning, *COMPUTER vision
Abstract: Seed phenomics is a comprehensive assessment of complex seed traits, and seed classification is an indispensable step. Plant seed recognition is of great significance in agricultural production, ecological environment, and biodiversity. However, some traditional artificial plant seed classification methods are expensive, time consuming, and laborious. Therefore, there is a need that cannot be ignored for a method to improve the situation. Artificial intelligence is making a huge impact on various fields through its perception, reasoning, and learning capabilities. A challenge in pratacultural research, the rapid auto-identification of plant seeds, might be better resolved by the integration of computer vision. For the lack of a public seed dataset for the training of models, we established a dataset called LZUPSD, which includes images of 88 different species of seeds. We explored methods to achieve fine-grained seed classification using convolutional neural networks and tried to apply a transformer to it. The method has the highest accuracy of more than 95%. The method is able to identify plant seeds automatically with high speed, low cost, and high accuracy. It results in a more efficient plant seed recognition method. At the same time, we have established a platform where users can upload pictures to obtain seed information. In addition, our dataset will be released to the public in the next phase in order to share with interested researchers. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

48. Video spatio-temporal generative adversarial network for local action generation.

Author: Liu, Xuejun, Guo, Jiacheng, Cui, Zhongji, Liu, Ling, Yan, Yong, and Sha, Yun
Subjects: *GENERATIVE adversarial networks, *RECURRENT neural networks, *COMPUTER vision, *CONVOLUTIONAL neural networks, *TIME-varying networks
Abstract: Generating action videos in future scenes based on static images can make computer vision systems to be better applied for video understanding and intelligent decision-making. However, current models pay more attention to the motion trend of the generated objects, and the processing effect on local details is not ideal. The local features of the generated video will have the problem of blurred frames and incoherent motion. This paper proposes a two-stage model, video spatio-temporal generative adversarial network (VSTGAN), which consists of two GAN networks, such as temporal network and spatial network (S-net). The model fully combines the advantages of CNNs, recurrent neural networks (RNNs), and GANs to decompose the complex spatiotemporal generation problem into temporal and spatial dimensions. Therefore, VSTGAN can focus on local features from the above dimensions respectively. In the temporal dimension, we propose an RNN unit, the convolutional attention unit (ConvAU), which uses the convolutional attention module to dynamically generate weights to update the hidden state. Thus, T-net uses the ConvAU to generate local dynamics. In the spatial dimension, S-net uses CNNs and attention modules to perform resolution reconstruction of the generated local dynamics for video generation. We build two small-sample datasets and validate our approach on these two new datasets and the KTH public dataset. The results show that our approach can effectively generate local details in future action videos and that the model performance on small-sample datasets is competitive with the state-of-the-art in video generation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

49. Efficient data-centric pest images identification method based on Mahalanobis entropy for intelligent agriculture.

Author: Zhu, Longtao, Li, Zhengjian, and Hu, Siyuan
Subjects: *DEEP learning, *SUPERVISED learning, *PESTS, *PEST control, *AGRICULTURAL pests, *ENTROPY, *COMPUTER vision
Abstract: The frequent outbreak of crop pests is one of the main factors affecting crop yield. The accurate identification of pests is significant for effective pest control, which is the basis for the safe growth of crops and is also an important guarantee for high crop quality and crop yield. Computer vision based on supervised deep learning enables intelligent identification of pests, whose success is still inseparable from a large amount of labeled data, causing a large number of resource consumption due to data labeling. Therefore, an in-depth study on maximizing the value of data is essential when lacking labeled data. A new data evaluation method based on Mahalanobis distance and entropy is proposed to address the problem of lacking labeled data in intelligent pest identification. This method enables filtering high-value data, thus achieving effective pest identification performance with a small data volume. The experiment is conducted on a dataset we collected called PD-20, which shows the proposed method achieves baseline accuracy of 100% using only about 60% of the original data. Moreover, the proposed method can save at least 10% of the data volume compared with three comparison methods. To facilitate the deep integration of smart agriculture with artificial intelligence (AI), we designed an interactive framework of active learning for pest identification based on the proposed method, which lays the foundation for the application of AI in direction of agriculture. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

50. Leaf counting in the presence of occlusion in Arabidopsis thaliana plant using convolutional neural networks.

Author: Štaka, Zorana and Mišić, Marko
Subjects: *CONVOLUTIONAL neural networks, *COLOR of plants, *COMPUTER vision, *COUNTING, *AGRICULTURE
Abstract: Plants are crucial in providing sufficient food for the increasing global population. To be able to provide an appropriate amount of food, the maximization of agricultural output is needed while input needs to be minimized. For these purposes, plant phenotyping techniques, i.e., measuring and analyzing the physical and biochemical characteristics of plants, can be employed. One of the most essential indicators of the general health and development of the plant is the color, shape, and number of leaves. To analyze plant images and capture essential plant traits, various algorithms have been developed. However, one of the important challenges in developing these algorithms is the occlusion or overlapping of leaves and biomass. We present a solution for leaf counting in the presence of occlusion in the plant Arabidopsis thaliana that includes four different convolutional neural network architectures. Datasets from the Computer Vision Problems in Plant Phenotyping (CVPPP) 2017 challenge and Photon System Instruments were used. The results are discussed in detail and compared with the existing solutions. Results showed that our solutions for leaf counting are superior to the previous winners of the CVPPP challenges. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Region

Database

Publisher

2,194 results on '"Computer vision"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources