Descriptor: "Computer vision" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Computer vision"' showing total 734,684 results

Start Over Descriptor "Computer vision"

734,684 results on '"Computer vision"'

351. Modified Double Inertial Extragradient-like Approaches for Convex Bilevel Optimization Problems with VIP and CFPP Constraints.

Author: Zeng, Yue, Ceng, Lu-Chuan, Zheng, Liu-Fang, and Wang, Xie
Subjects: *BILEVEL programming, *FRACTIONAL programming, *COMPUTER vision, *NONEXPANSIVE mappings, *CONSTRAINT programming
Abstract: Convex bilevel optimization problems (CBOPs) exhibit a vital impact on the decision-making process under the hierarchical setting when image restoration plays a key role in signal processing and computer vision. In this paper, a modified double inertial extragradient-like approach with a line search procedure is introduced to tackle the CBOP with constraints of the CFPP and VIP, where the CFPP and VIP represent a common fixed point problem and a variational inequality problem, respectively. The strong convergence analysis of the proposed algorithm is discussed under certain mild assumptions, where it constitutes both sections that possess a mutual symmetry structure to a certain extent. As an application, our proposed algorithm is exploited for treating the image restoration problem, i.e., the LASSO problem with the constraints of fractional programming and fixed-point problems. The illustrative instance highlights the specific advantages and potential infect of the our proposed algorithm over the existing algorithms in the literature, particularly in the domain of image restoration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

352. Synthetic Image Generation Using Deep Learning: A Systematic Literature Review.

Author: Zulfiqar, Aisha, Muhammad Daudpota, Sher, Shariq Imran, Ali, Kastrati, Zenun, Ullah, Mohib, and Sadhwani, Suraksha
Subjects: *ARTIFICIAL neural networks, *GENERATIVE adversarial networks, *COMPUTER vision, *TRANSFORMER models, *IMAGE processing
Abstract: The advent of deep neural networks and improved computational power have brought a revolutionary transformation in the fields of computer vision and image processing. Within the realm of computer vision, there has been a significant interest in the area of synthetic image generation, which is a creative side of AI. Many researchers have introduced innovative methods to identify deep neural network‐based architectures involved in image generation via different modes of input, like text, scene graph layouts and so forth to generate synthetic images. Computer‐generated images have been found to contribute a lot to the training of different machine and deep‐learning models. Nonetheless, we have observed an immediate need for a comprehensive and systematic literature review that encompasses a summary and critical evaluation of current primary studies' approaches toward image generation. To address this, we carried out a systematic literature review on synthetic image generation approaches published from 2018 to February 2023. Moreover, we have conducted a systematic review of various datasets, approaches to image generation, performance metrics for existing methods, and a brief experimental comparison of DCGAN (deep convolutional generative adversarial network) and cGAN (conditional generative adversarial network) in the context of image generation. Additionally, we have identified applications related to image generation models with critical evaluation of the primary studies on the subject matter. Finally, we present some future research directions to further contribute to the field of image generation using deep neural networks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

353. Lung Sounds Anomaly Detection with Respiratory Cycle Segmentation.

Author: Constantinescu, Constantin, Brad, Remus, and Bărglăzan, Adrian
Subjects: *MACHINE learning, *COMPUTER vision, *LUNGS, *DIAGNOSTIC imaging, *SOUNDS, *DEEP learning
Abstract: Employing machine learning algorithms in the medical field has proven successful for some time now. Mostly computer vision techniques have been applied to medical images, while medical sound data has been somewhat overlooked. By using electronic stethoscopes, it is now possible to process both heartbeats and lung sounds. While some products are available for detecting anomalies in heartbeats, addressing lung-related anomalies presents a more intricate challenge. Applying a deep learning approach is hindered by insufficient data. Although some datasets do exist, the size and diversity of the data are too small for comprehensive analysis. This paper introduces a novel technique for detecting anomalies in lung sounds: first by combining two datasets, second by automatically segmenting each sound into respiratory cycles, and third by employing GFCCs as sound features. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

354. In the Pipeline: Recursion's Approach to AI and Machine Learning.

Author: Philippidis, Alex
Subjects: *GENERATIVE artificial intelligence, *ARTIFICIAL intelligence, *NEUROFIBROMATOSIS 2, *ADENOMATOUS polyposis coli, *PROTEIN kinase C, *COMPUTER vision
Abstract: The article from GEN Biotechnology discusses Recursion's innovative approach to AI and machine learning in drug development. Recursion aims to revolutionize the drug discovery process by utilizing its Recursion Operating System, which integrates hardware, software, datasets, and proprietary tools to explore biology's complex search space. The company plans to announce clinical data from several pipeline candidates, including REC-994 for cerebral cavernous malformation, and is set to merge with Exscientia to further enhance its capabilities. Recursion's CEO, Chris Gibson, emphasizes the importance of AI in improving the efficiency and success rate of drug development, aiming to challenge the industry's low success rate through innovative approaches. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

355. Sentiment Analysis in the AI-Based Social Networks.

Author: Kohneh Shahri, Kamelya Dehghani, Afshar Kazemi, Mohammad Ali, and Pourebrahimi, Ali Reza
Subjects: *NATURAL language processing, *MACHINE learning, *COMPUTER vision, *SENTIMENT analysis, *ARTIFICIAL intelligence
Abstract: Recent developments in emerging technologies have enabled users to interact with social networks. Nowadays, one of the ways of interaction is to understand the real feelings of people at the moment, the outcome of which, based on the people’s reaction and attitude, appears in analyzing feelings like facial features, type of speech, or the people’s jobs such as video, photograph, voice, and text. In this research, through deep learning and machine learning in the AI, the sentiment analysis has been studied and evaluated using AI and deep learning algorithms like motion detection, body language recognition, image processing, sound and text processing, computer vision, natural language processing and different network techniques. The paper, providing a new conceptual model design, has provided more details about sentiment analysis in social networks by incorporating AI techniques in social networks with high speed and accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

356. 鱼肉切割技术的发展现状与展望.

Author: 肖哲非, 马田田, and 沈建
Subjects: *LABOR market, *COMPUTER vision, *MULTISENSOR data fusion, *ARTIFICIAL intelligence, *FISHERY processing, *DEEP learning
Abstract: Cutting is a pivotal step in the initial processing of fish products, encompassing beheading, trimming, slicing, and dicing. Traditional metal-knife cutting methods are marred by inefficiency, imprecision, and a propensity for bacterial growth, failing to meet market demands for precision and quality. Additionally, the reluctance of workers to perform manual labor in wet conditions exacerbates labor shortages and inefficiency. Adopting innovative cutting technologies, complemented by intelligent controls, is thus imperative. This study reviews the advancements and applications of waterjet and ultrasonic knives in sustainable cutting methods within the fish and food industries. It evaluates their respective merits and demerits, noting that waterjets excel in cutting hard-textured fish. At the same time, ultrasonic knives are adept at handling fish's viscous, elastic, and adhesive properties. The abstract further explores the integration of intelligent technologies in fish cutting, such as machine vision for precise cutting paths, simulation technology for adjusting process parameters, and multi-sensor data fusion for decision-making, which could potentially replace human labor. The study also addresses the current challenges and future directions for these technologies, highlighting the potential of artificial intelligence, machine learning, and deep learning to enhance the autonomy and robustness of fish-cutting equipment. By reducing operational and maintenance costs and integrating advanced technologies, the study envisions a future where fish cutting is more automated, intelligent, and capable of producing high-quality products efficiently to satisfy escalating market demands. This research is a valuable reference for industry professionals and researchers aiming to innovate in fish product processing, thereby enhancing the automation and intelligence of fish-cutting processes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

357. Marine predators optimization with deep learning model for video‐based facial expression recognition.

Author: Prasad, Mal Hari and Swarnalatha, P.
Subjects: *CONVOLUTIONAL neural networks, *COMPUTER vision, *FACIAL expression, *EMOTION recognition, *EMOTIONS
Abstract: Video‐based facial expression recognition (VFER) technique intends to categorize an input video into different kinds of emotions. It remains a challenging issue because of the gap between visual features and emotions, problems in handling the delicate movement of muscles, and restricted datasets. One of the effective solutions to solve this problem is the exploitation of efficient features defining facial expressions to carry out FER. Generally, the VFER find useful in several areas like unmanned driving, venue management, urban safety management, and senseless attendance. Recent advances in computer vision and deep learning (DL) techniques enable the design of automated VFER models. In this aspect, this study establishes a new Marine Predators Optimization with Deep Learning Model for Video‐based Facial Expression Recognition (MPODL‐VFER) technique. The presented MPODL‐VFER technique mainly aims to classify different kinds of facial emotions in the video. To accomplish this, the presented MPODL‐VFER technique derives features using the deep convolutional neural network based densely connected network (DenseNet) model. The presented MPODL‐VFER technique employs MPO technique for the hyperparameter adjustment of the DenseNet model. Finally, Elman Neural Network (ENN) model is exploited for emotion recognition purposes. For assuring the enhanced recognition performance of the MPODL‐VFER approach, a comparison study was developed on benchmark dataset. The comprehensive results have shown the significant outcome of MPODL‐VFER model over other approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

358. Real-time flash flood detection employing the YOLOv8 model.

Author: Quang, Nguyen Hong, Lee, Hanna, Kim, Namhoon, and Kim, Gihong
Subjects: *COMPUTER vision, *APPLICATION software, *COMPUTER simulation, *MODEL railroads, *MODEL validation, *DEEP learning
Abstract: Human lives and property are threatened by Flash floods (FF) worldwide and as a result of the unprecedented conditions of the climate change effects the losses are predicted to increase in the future. As it seems difficult to avoid and prevent them, real-time flash flood detections could be an appropriate solution for damage reduction and better management. Currently, the development of computer vision applications such as deep learning and AI has been advanced. Although AI models have been developed for applications in many fields, their implementations for geosciences are limited based on large amounts of training data and the highly required computational infrastructure. Hence, this work aims to train the latest YOLOv8 model and apply it to real-time flash flood detection for regions of Korea and possibly for other nations. To overcome the shortage of training data, we created small on-site flash flood models and took pictures and footage of them. More than 1500 photos of FF were used for model trains and validations gaining a model mean average precision of above 60% of all training depths (25, 50, 75, and 100 epochs). Despite some model false positives and missed false positive detections using the Korean FF test dataset, the YOLOv8 best model generated bounding boxes (BB) with high confidence values in most FF events. Furthermore, the robustness of the model is highlighted by its ability to smoothly detect the precise positions of the FF areas with high confidence values (best 0.86) when applied for input footage and webcam streams. It is highly encouraged to establish a real-time FF warning system to reduce their negative effects. Although YOLO is effective and fast, like other deep learning models, it requires large input data to ensure higher accuracy and confidence. Future works might explore this aspect, particularly the data acquired in light inefficiency to improve the model detections at night time. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

359. Artificial Intelligence Applications for Imaging Metabolic Bone Diseases.

Author: Isaac, Amanda, Akdogan, Asli Irmak, Dalili, Danoob, Saber, Nuran, Drobny, David, Guglielmi, Giuseppe, Modat, Marc, and Bazzocchi, Alberto
Subjects: *NATURAL language processing, *METABOLIC bone disorders, *MACHINE learning, *ARTIFICIAL intelligence, *COMPUTER vision, *OSTEOMALACIA
Abstract: Artificial intelligence (AI) has significantly impacted the field of medical imaging, particularly in diagnosing and managing metabolic bone diseases (MBDs) such as osteoporosis and osteopenia, Paget's disease, osteomalacia, and rickets, as well as rare conditions such as osteitis fibrosa cystica and osteogenesis imperfecta. This article provides an in-depth analysis of AI techniques used in imaging these conditions, recent advancements, and their clinical applications. It also explores ethical considerations and future perspectives. Through comprehensive examination and case studies, we highlight the transformative potential of AI in enhancing diagnostic accuracy, improving patient outcomes, and contributing to personalized medicine. By integrating AI with existing imaging techniques, we can significantly enhance the capabilities of medical imaging in diagnosing, monitoring, and treating MBDs. We also provide a comprehensive overview of the current state, challenges, and future prospects of AI applications in this crucial area of health care. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

360. An Explainable AI-Based Modified YOLOv8 Model for Efficient Fire Detection.

Author: Hasan, Md. Waliul, Shanto, Shahria, Nayeema, Jannatun, Rahman, Rashik, Helaly, Tanjina, Rahman, Ziaur, and Mehedi, Sk. Tanzir
Subjects: *ARTIFICIAL intelligence, *COMPUTER vision, *FEATURE extraction, *DEEP learning, *PROPERTY damage, *FIRE detectors
Abstract: Early fire detection is the key to saving lives and limiting property damage. Advanced technology can detect fires in high-risk zones with minimal human presence before they escalate beyond control. This study focuses on providing a more advanced model structure based on the YOLOv8 architecture to enhance early recognition of fire. Although YOLOv8 is excellent at real-time object detection, it can still be better adjusted to the nuances of fire detection. We achieved this advancement by incorporating an additional context-to-flow layer, enabling the YOLOv8 model to more effectively capture both local and global contextual information. The context-to-flow layer enhances the model's ability to recognize complex patterns like smoke and flames, leading to more effective feature extraction. This extra layer helps the model better detect fires and smoke by improving its ability to focus on fine-grained details and minor variation, which is crucial in challenging environments with low visibility, dynamic fire behavior, and complex backgrounds. Our proposed model achieved a 2.9% greater precision rate, 4.7% more recall rate, and 4% more F1-score in comparison to the YOLOv8 default model. This study discovered that the architecture modification increases information flow and improves fire detection at all fire sizes, from tiny sparks to massive flames. We also included explainable AI strategies to explain the model's decision-making, thus adding more transparency and improving trust in its predictions. Ultimately, this enhanced system demonstrates remarkable efficacy and accuracy, which allows additional improvements in autonomous fire detection systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

361. Full-Body Pose Estimation of Humanoid Robots Using Head-Worn Cameras for Digital Human-Augmented Robotic Telepresence.

Author: Cho, Youngdae, Son, Wooram, Bak, Jaewan, Lee, Yisoo, Lim, Hwasup, and Cha, Youngwoon
Subjects: *DIGITAL cameras, *COMPUTER vision, *TELECOMMUTING, *AUGMENTED reality, *TELEPRESENCE, *POSE estimation (Computer vision)
Abstract: We envision a telepresence system that enhances remote work by facilitating both physical and immersive visual interactions between individuals. However, during robot teleoperation, communication often lacks realism, as users see the robot's body rather than the remote individual. To address this, we propose a method for overlaying a digital human model onto a humanoid robot using XR visualization, enabling an immersive 3D telepresence experience. Our approach employs a learning-based method to estimate the 2D poses of the humanoid robot from head-worn stereo views, leveraging a newly collected dataset of full-body poses for humanoid robots. The stereo 2D poses and sparse inertial measurements from the remote operator are optimized to compute 3D poses over time. The digital human is localized from the perspective of a continuously moving observer, utilizing the estimated 3D pose of the humanoid robot. Our moving camera-based pose estimation method does not rely on any markers or external knowledge of the robot's status, effectively overcoming challenges such as marker occlusion, calibration issues, and dependencies on headset tracking errors. We demonstrate the system in a remote physical training scenario, achieving real-time performance at 40 fps, which enables simultaneous immersive and physical interactions. Experimental results show that our learning-based 3D pose estimation method, which operates without prior knowledge of the robot, significantly outperforms alternative approaches requiring the robot's global pose, particularly during rapid headset movements, achieving markerless digital human augmentation from head-worn views. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

362. Optimizing Convolutional Neural Network Architectures.

Author: Balderas, Luis, Lastra, Miguel, and Benítez, José M.
Subjects: *MACHINE learning, *CONVOLUTIONAL neural networks, *NATURAL language processing, *COMPUTER vision, *ARTIFICIAL intelligence
Abstract: Convolutional neural networks (CNNs) are commonly employed for demanding applications, such as speech recognition, natural language processing, and computer vision. As CNN architectures become more complex, their computational demands grow, leading to substantial energy consumption and complicating their use on devices with limited resources (e.g., edge devices). Furthermore, a new line of research seeking more sustainable approaches to Artificial Intelligence development and research is increasingly drawing attention: Green AI. Motivated by an interest in optimizing Machine Learning models, in this paper, we propose Optimizing Convolutional Neural Network Architectures (OCNNA). It is a novel CNN optimization and construction method based on pruning designed to establish the importance of convolutional layers. The proposal was evaluated through a thorough empirical study including the best known datasets (CIFAR-10, CIFAR-100, and Imagenet) and CNN architectures (VGG-16, ResNet-50, DenseNet-40, and MobileNet), setting accuracy drop and the remaining parameters ratio as objective metrics to compare the performance of OCNNA with the other state-of-the-art approaches. Our method was compared with more than 20 convolutional neural network simplification algorithms, obtaining outstanding results. As a result, OCNNA is a competitive CNN construction method which could ease the deployment of neural networks on the IoT or resource-limited devices. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

363. Programming and Setting Up the Object Detection Algorithm YOLO to Determine Feeding Activities of Beef Cattle: A Comparison between YOLOv8m and YOLOv10m.

Author: Guarnido-Lopez, Pablo, Ramirez-Agudelo, John-Fredy, Denimal, Emmanuel, and Benaouda, Mohammed
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *AGRICULTURE, *ALGORITHMS, *LIVESTOCK farms, *CATTLE feeding & feeds
Abstract: Simple Summary: This study addresses the challenge of accurately monitoring the feeding behavior of cattle, which is crucial for their health and productivity. The aim was to compare two versions of a computer vision algorithm, YOLO (v8 vs. v10), which identifies objects in images, to evaluate how well they can recognize the feeding activities of beef cattle. By recording videos of bulls on a farm and analyzing them using YOLO algorithms, we found that both versions were effective at detecting these behaviors, but the latest version was slightly better and faster at learning. This new version also showed a reduced tendency to repeat errors. The conclusion is that the latest version of YOLO is more efficient and reliable for real-world use on farms. This advancement is valuable to society as it helps farmers better monitor and manage cattle feeding, leading to healthier animals and more efficient farming practices. This study highlights the importance of monitoring cattle feeding behavior using the YOLO algorithm for object detection. Videos of six Charolais bulls were recorded on a French farm, and three feeding behaviors (biting, chewing, visiting) were identified and labeled using Roboflow. YOLOv8 and YOLOv10 were compared for their performance in detecting these behaviors. YOLOv10 outperformed YOLOv8 with slightly higher precision, recall, mAP50, and mAP50-95 scores. Although both algorithms demonstrated similar overall accuracy (around 90%), YOLOv8 reached optimal training faster and exhibited less overfitting. Confusion matrices indicated similar patterns of prediction errors for both versions, but YOLOv10 showed better consistency. This study concludes that while both YOLOv8 and YOLOv10 are effective in detecting cattle feeding behaviors, YOLOv10 exhibited superior average performance, learning rate, and speed, making it more suitable for practical field applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

364. Behavioral Coding of Captive African Elephants (Loxodonta africana): Utilizing DeepLabCut and Create ML for Nocturnal Activity Tracking.

Author: Lund, Silje Marquardsen, Nielsen, Jonas, Gammelgård, Frej, Nielsen, Maria Gytkjær, Jensen, Trine Hammer, and Pertoldi, Cino
Subjects: *MACHINE learning, *OBJECT recognition (Computer vision), *AFRICAN elephant, *BEHAVIORAL assessment, *CLOSED-circuit television
Abstract: Simple Summary: This paper presents a way to automate computer vision processes applied to behavior recognition on closed-circuit television (CCTV) footage of two captive African elephants. Object detection software using both Create ML and DeepLabCut was used to control the accuracy of using such models, and those models were subsequently used to analyze seven days' worth of nighttime footage to assess the general behavioral patterns of the elephants, showcasing the possibility of using automated tools for behavioral analysis. This study investigates the possibility of using machine learning models created in DeepLabCut and Create ML to automate aspects of behavioral coding and aid in behavioral analysis. Two models with different capabilities and complexities were constructed and compared to a manually observed control period. The accuracy of the models was assessed by comparison with manually scoring, before being applied to seven nights of footage of the nocturnal behavior of two African elephants (Loxodonta africana). The resulting data were used to draw conclusions regarding behavioral differences between the two elephants and between individually observed nights, thus proving that such models can aid researchers in behavioral analysis. The models were capable of tracking simple behaviors with high accuracy, but had certain limitations regarding detection of complex behaviors, such as the stereotyped behavior sway, and displayed confusion when deciding between visually similar behaviors. Further expansion of such models may be desired to create a more capable aid with the possibility of automating behavioral coding. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

365. Integrating Automated Labeling Framework for Enhancing Deep Learning Models to Count Corn Plants Using UAS Imagery.

Author: Katari, Sushma, Venkatesh, Sandeep, Stewart, Christopher, and Khanal, Sami
Subjects: *TRANSFORMER models, *CROP management, *CROPS, *COMPUTER vision, CORN development
Abstract: Plant counting is a critical aspect of crop management, providing farmers with valuable insights into seed germination success and within-field variation in crop population density, both of which are key indicators of crop yield and quality. Recent advancements in Unmanned Aerial System (UAS) technology, coupled with deep learning techniques, have facilitated the development of automated plant counting methods. Various computer vision models based on UAS images are available for detecting and classifying crop plants. However, their accuracy relies largely on the availability of substantial manually labeled training datasets. The objective of this study was to develop a robust corn counting model by developing and integrating an automatic image annotation framework. This study used high-spatial-resolution images collected with a DJI Mavic Pro 2 at the V2–V4 growth stage of corn plants from a field in Wooster, Ohio. The automated image annotation process involved extracting corn rows and applying image enhancement techniques to automatically annotate images as either corn or non-corn, resulting in 80% accuracy in identifying corn plants. The accuracy of corn stand identification was further improved by training four deep learning (DL) models, including InceptionV3, VGG16, VGG19, and Vision Transformer (ViT), with annotated images across various datasets. Notably, VGG16 outperformed the other three models, achieving an F1 score of 0.955. When the corn counts were compared to ground truth data across five test regions, VGG achieved an R2 of 0.94 and an RMSE of 9.95. The integration of an automated image annotation process into the training of the DL models provided notable benefits in terms of model scaling and consistency. The developed framework can efficiently manage large-scale data generation, streamlining the process for the rapid development and deployment of corn counting DL models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

366. Deep Recyclable Trash Sorting Using Integrated Parallel Attention.

Author: Lin, Hualing, Zhang, Xue, Yu, Junchen, Xiang, Ji, and Shen, Hui-Liang
Subjects: *CONVOLUTIONAL neural networks, *IMAGE recognition (Computer vision), *POLLUTION, *COMPUTER vision, *ENVIRONMENTAL protection
Abstract: Sorting recyclable trash is critical to reducing energy consumption and mitigating environmental pollution. Currently, trash sorting heavily relies on manpower. Computer vision technology enables automated trash sorting. However, existing trash image classification datasets contain a large number of images without backgrounds. Moreover, the models are vulnerable to background interference when categorizing images with complex backgrounds. In this work, we provide a recyclable trash dataset that supports model training and design a model specifically for trash sorting. Firstly, we introduce the TrashIVL dataset, an image dataset for recyclable trash sorting encompassing five classes (TrashIVL-5). All images are collected from public trash datasets, and the original images were captured by RGB imaging sensors, containing trash items with real-life backgrounds. To achieve refined recycling and improve sorting efficiency, the TrashIVL dataset can be further categorized into 12 classes (TrashIVL-12). Secondly, we propose the integrated parallel attention module (IPAM). Considering the susceptibility of sensor-based systems to background interference in real-world trash sorting scenarios, our IPAM is specifically designed to focus on the essential features of trash images from both channel and spatial perspectives. It can be inserted into convolutional neural networks (CNNs) as a plug-and-play module. We have constructed a recyclable trash sorting network building upon the IPAM, which produces an acuracy of 97.42% on TrashIVL-5 and 94.08% on TrashIVL-12. Our work is an effective attempt of computer vision in recyclable trash sorting. It makes a positive contribution to environmental protection and sustainable development. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

367. Attention-Guided Sample-Based Feature Enhancement Network for Crowded Pedestrian Detection Using Vision Sensors.

Author: Tang, Shuyuan, Zhou, Yiqing, Li, Jintao, Liu, Chang, and Shi, Jinglin
Subjects: *CONVOLUTIONAL neural networks, *IMAGE sensors, *COMPLEX variables, *PEDESTRIANS, *COMPUTER engineering
Abstract: Occlusion presents a major obstacle in the development of pedestrian detection technologies utilizing computer vision. This challenge includes both inter-class occlusion caused by environmental objects obscuring pedestrians, and intra-class occlusion resulting from interactions between pedestrians. In complex and variable urban settings, these compounded occlusion patterns critically limit the efficacy of both one-stage and two-stage pedestrian detectors, leading to suboptimal detection performance. To address this, we introduce a novel architecture termed the Attention-Guided Feature Enhancement Network (AGFEN), designed within the deep convolutional neural network framework. AGFEN improves the semantic information of high-level features by mapping it onto low-level feature details through sampling, creating an effect comparable to mask modulation. This technique enhances both channel-level and spatial-level features concurrently without incurring additional annotation costs. Furthermore, we transition from a traditional one-to-one correspondence between proposals and predictions to a one-to-multiple paradigm, facilitating non-maximum suppression using the prediction set as the fundamental unit. Additionally, we integrate these methodologies by aggregating local features between regions of interest (RoI) through the reuse of classification weights, effectively mitigating false positives. Our experimental evaluations on three widely used datasets demonstrate that AGFEN achieves a 2.38% improvement over the baseline detector on the CrowdHuman dataset, underscoring its effectiveness and potential for advancing pedestrian detection technologies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

368. Geo-Sensing-Based Analysis of Urban Heat Island in the Metropolitan Area of Merida, Mexico.

Author: Sánchez-Sánchez, Francisco A., Vega-De-Lille, Marisela, Castillo-Atoche, Alejandro A., López-Maldonado, José T., Cruz-Fernandez, Mayra, Camacho-Pérez, Enrique, and Rodríguez-Reséndiz, Juvenal
Subjects: *LAND surface temperature, *URBAN heat islands, *REMOTE-sensing images, *COMPUTER vision, *REMOTE sensing
Abstract: Urban Heat Islands are a major environmental and public health concern, causing temperature increase in urban areas. This study used satellite imagery and machine learning to analyze the spatial and temporal patterns of land surface temperature distribution in the Metropolitan Area of Merida (MAM), Mexico, from 2001 to 2021. The results show that land surface temperature has increased in the MAM over the study period, while the urban footprint has expanded. The study also found a high correlation ( r > 0.8) between changes in land surface temperature and land cover classes (urbanization/deforestation). If the current urbanization trend continues, the difference between the land surface temperature of the MAM and its surroundings is expected to reach 3.12 °C ± 1.11 °C by the year 2030. Hence, the findings of this study suggest that the Urban Heat Island effect is a growing problem in the MAM and highlight the importance of satellite imagery and machine learning for monitoring and developing mitigation strategies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

369. Applications of Blockchain and Smart Contracts to Address Challenges of Cooperative, Connected, and Automated Mobility.

Author: Kontos, Christos, Panagiotakopoulos, Theodor, and Kameas, Achilles
Subjects: *COMPUTER vision, *DISTRIBUTED computing, *INTERNET of things, *CITIES & towns, *BLOCKCHAINS, *INTELLIGENT transportation systems, *COMPARATIVE studies
Abstract: Population growth and environmental burden have turned the efforts of cities globally toward smarter and greener mobility. Cooperative and Connected Automated Mobility (CCAM) serves as a concept with the power and potential to help achieve these goals building on technological fields like Internet of Things, computer vision, and distributed computing. However, its implementation is hindered by various challenges covering technical parameters such as performance and reliability in tandem with other issues, such as safety, accountability, and trust. To overcome these issues, new distributed and decentralized approaches like blockchain and smart contracts are needed. This paper identifies a comprehensive inventory of CCAM challenges including technical, social, and ethical challenges. It then describes the most prominent methodologies using blockchain and smart contracts to address them. A comparative analysis of the findings follows, to draw useful conclusions and discuss future directions in CCAM and relevant blockchain applications. The paper contributes to intelligent transportation systems' research by offering an integrated view of the difficulties in substantiating CCAM and providing insights on the most popular blockchain and smart contract technologies that tackle them. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

370. Monocular 3D Multi-Person Pose Estimation for On-Site Joint Flexion Assessment: A Case of Extreme Knee Flexion Detection.

Author: Yan, Guihai, Yan, Haofeng, Yao, Zhidong, Lin, Zhongliang, Wang, Gang, Liu, Changyong, and Yang, Xincong
Subjects: *BUILDING sites, *COMPUTER vision, *SUPERVISORS, *DEEP learning, *LABOR productivity, *POSE estimation (Computer vision)
Abstract: Work-related musculoskeletal disorders (WMSDs) represent a significant health challenge for workers in construction environments, often arising from prolonged exposure to ergonomic risks associated with manual labor, awkward postures, and repetitive motions. These conditions not only lead to diminished worker productivity but also incur substantial economic costs for employers and healthcare systems alike. Thus, there is an urgent need for effective tools to assess and mitigate these ergonomic risks. This study proposes a novel monocular 3D multi-person pose estimation method designed to enhance ergonomic risk assessments in construction environments. Leveraging advanced computer vision and deep learning techniques, this approach accurately captures and analyzes the spatial dynamics of workers' postures, with a focus on detecting extreme knee flexion, a critical indicator of work-related musculoskeletal disorders (WMSDs). A pilot study conducted on an actual construction site demonstrated the method's feasibility and effectiveness, achieving an accurate detection rate for extreme flexion incidents that closely aligned with supervisory observations and worker self-reports. The proposed monocular approach enables universal applicability and enhances ergonomic analysis through 3D pose estimation and group pose recognition for timely interventions. Future efforts will focus on improving robustness and integration with health monitoring to reduce WMSDs and promote worker health. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

371. Light‐adaptive Mimicking Retina with In Situ Image Memorization via Resistive Switching Photomemristor Arrays.

Author: Hu, Zijun, Hu, Ying, Su, Li, and Fang, Xiaosheng
Subjects: *VISUAL memory, *COMPUTER vision, *ROBOT vision, *RF values (Chromatography), *MEMORIZATION
Abstract: Artificial visual memory systems have been of particular interest since the development of machine vision and bionic robots. Ordinarily, the conventional system architecture requires the complex integration of two functional modules, a photo‐sensor and a memory device, which greatly limits the operating efficiency and increases the extra energy consumption. Nonetheless, other simply configured optoelectronics memory devices generally face challenges of adaption in complex light environments. Here, a resistive switching (RS) perovskite‐based photomemristor is presented that mimics the retina function. The dual function of light perception and in situ storage are both achieved. In the dark condition, it exhibits impressive memory performance with a high ON/OFF ratio of 104, a long retention time of over 104 s, and a low operating voltage of 0.38 V. With illumination, it shows self‐powered, broadband photo‐detecting characteristics with responsivity of 70 mA W−1 and detectivity of 7.5 × 1010 Jones. More importantly, benefiting from the material dual‐phase configuration, the highly steady photo‐adjusted RS windows are achieved. Its light‐adaptive memory application in dynamic environments is further demonstrated using the mimicking retina for a machine eye. This work can provide a strategy for enhanced RS photomemristor and its application in changing and varied scenarios. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

372. SSN: Scale Selection Network for Multi-Scale Object Detection in Remote Sensing Images.

Author: Lin, Zhili and Leng, Biao
Subjects: *OBJECT recognition (Computer vision), *REMOTE sensing, *IMAGE converters, *IMAGE processing, *MILITARY surveillance, *DEEP learning
Abstract: The rapid growth of deep learning technology has made object detection in remote sensing images an important aspect of computer vision, finding applications in military surveillance, maritime rescue, and environmental monitoring. Nonetheless, the capture of remote sensing images at high altitudes causes significant scale variations, resulting in a heterogeneous range of object scales. These varying scales pose significant challenges for detection algorithms. To solve the scale variation problem, traditional detection algorithms compute multi-layer feature maps. However, this approach introduces significant computational redundancy. Inspired by the mechanism of cognitive scaling mechanisms handling multi-scale information, we propose a novel Scale Selection Network (SSN) to eliminate computational redundancy through scale attentional allocation. In particular, we have devised a lightweight Landmark Guided Scale Attention Network, which is capable of predicting potential scales in an image. The detector only needs to focus on the selected scale features, which greatly reduces the inference time. Additionally, a fast Reversible Scale Semantic Flow Preserving strategy is proposed to directly generate multi-scale feature maps for detection. Experiments demonstrate that our method facilitates the acceleration of image pyramid-based detectors by approximately 5.3 times on widely utilized remote sensing object detection benchmarks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

373. Plant Species Classification and Biodiversity Estimation from UAV Images with Deep Learning.

Author: Conciatori, Marco, Tran, Nhung Thi Cam, Diez, Yago, Valletta, Alessandro, Segalini, Andrea, and Lopez Caceres, Maximo Larry
Subjects: *IMAGE recognition (Computer vision), *PLANT classification, *CONVOLUTIONAL neural networks, *COMPUTER vision, *SPECIES diversity
Abstract: Biodiversity is a characteristic of ecosystems that plays a crucial role in the study of their evolution, and to estimate it, the species of all plants need to be determined. In this study, we used Unmanned Aerial Vehicles to gather RGB images of mid-to-high-altitude ecosystems in the Zao mountains (Japan). All the data-collection missions took place in autumn so the plants present distinctive seasonal coloration. Patches from single trees and bushes were manually extracted from the collected orthomosaics. Subsequently, Deep Learning image-classification networks were used to automatically determine the species of each tree or bush and estimate biodiversity. Both Convolutional Neural Networks (CNNs) and Transformer-based models were considered (ResNet, RegNet, ConvNeXt, and SwinTransformer). To measure and estimate biodiversity, we relied on the Gini–Simpson Index, the Shannon–Wiener Index, and Species Richness. We present two separate scenarios for evaluating the readiness of the technology for practical use: the first scenario uses a subset of the data with five species and a testing set that has a very similar percentage of each species to those present in the training set. The models studied reach very high performances with over 99 Accuracy and 98 F1 Score (the harmonic mean of Precision and Recall) for image classification and biodiversity estimates under 1% error. The second scenario uses the full dataset with nine species and large variations in class balance between the training and testing datasets, which is often the case in practical use situations. The results in this case remained fairly high for Accuracy at 90.64% but dropped to 51.77% for F1 Score. The relatively low F1 Score value is partly due to a small number of misclassifications having a disproportionate impact in the final measure, but still, the large difference between the Accuracy and F1 Score highlights the complexity of finely evaluating the classification results of Deep Learning Networks. Even in this very challenging scenario, the biodiversity estimation remained with relatively small (6–14%) errors for the most detailed indices, showcasing the readiness of the technology for practical use. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

374. Computer vision-based cascade detection of peeling and cracking in masonry house walls.

Author: Zhang, Chun, Lin, Shuai, Yu, Jian, and Zhang, Tongbo
Subjects: *RURAL housing, *CASCADE connections, *COMPUTER vision, *MASONRY, *RURAL geography
Abstract: AbstractSelf-built masonry structures are commonly used as houses in rural areas of the central plains of China. These structures pose significant safety risks due to the low level of self-building and non-standard designs. The primary type of damage to the walls of masonry houses is cracks, while the decorative layers of the walls may also experience peeling or cracking. The shapes, sizes, and surface characteristics of these damages are complex and unique to each type. To achieve fast identification of surface damage on masonry house walls, this paper proposes a cascade detection model that combines a multilevel cascade classifier with a parallel sub-segmentation network. The VGG16 neural network is used as the backbone for a multi-level series classifier. The model is trained by using a damage image set of the wall decorative layers in rural masonry houses. The classification of background and damage, as well as the classification of cracks and peels, are sequentially completed. Then, the parallel sub-segmentation model uses EfficientNet-B7 as the encoder and combines it with the U-Net framework skeleton to perform pixel-level segmentation of peeling and cracks. Finally, the output of parallel sub-segmentation networks is superimposed and fused to generate a complete image containing segmentation information of peeling and cracks. The cascade form network structure adopted in this model significantly reduces the training difficulty and enhances the model accuracy. Compared with Segformer and Deeplabv3+ network, the average IoU of the proposed method for on-site masonry wall images can be increased by 0.29 and 0.08, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

375. Message-Passing Monte Carlo: Generating low-discrepancy point sets via graph neural networks.

Author: Rusch, T. Konstantin, Kirk, Nathan, Bronstein, Michael M., Lemieux, Christiane, and Rus, Daniela
Subjects: *GRAPH neural networks, *COMPUTER vision, *DEEP learning, *COMPUTER graphics, *POINT set theory
Abstract: Discrepancy is a well-known measure for the irregularity of the distribution of a point set. Point sets with small discrepancy are called low discrepancy and are known to efficiently fill the space in a uniform manner. Low-discrepancy points play a central role in many problems in science and engineering, including numerical integration, computer vision, machine perception, computer graphics, machine learning, and simulation. In this work, we present a machine learning approach to generate a new class of low-discrepancy point sets named Message-Passing Monte Carlo (MPMC) points. Motivated by the geometric nature of generating low-discrepancy point sets, we leverage tools from Geometric Deep Learning and base our model on graph neural networks. We further provide an extension of our framework to higher dimensions, which flexibly allows the generation of custom-made points that emphasize the uniformity in specific dimensions that are primarily important for the particular problem at hand. Finally, we demonstrate that our proposed model achieves state-of-the-art performance superior to previous methods by a significant margin. In fact, MPMC points are empirically shown to be either optimal or near-optimal with respect to the discrepancy for low dimension and small number of points, i.e., for which the optimal discrepancy can be determined. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

376. Trajectory planning and automatic docking of LNG five-axis loading arm.

Author: Jiang, FaGuang, Chen, Kebing, Chen, Yang, and Tian, Cheng
Subjects: *COMPUTER vision, *CHARACTERISTIC functions, *NATURAL gas, *LIQUEFIED gases, *FLANGES
Abstract: Purpose: In response to the challenges posed by the conventional manual flange docking method in the LNG (Liquefied Natural Gas) loading process, such as low positioning accuracy, constraints on production efficiency and safety hazards, this study analyzed the LNG five-axis loading arm's main functions and structural characteristics. Design/methodology/approach: An automated solution for the joints of the LNG loading arm was designed. The forward kinematic model of the LNG loading arm was established using the Denavit–Hartenberg (D-H) parameter method, and its workspace was analyzed. The Newton–Raphson iteration method was employed to solve the inverse kinematics of the LNG loading arm, facilitating trajectory planning. The relationship between the target position and the joint variables was established to verify the stability of the arm's motion. Flange center identification was achieved using the Hough transform function. Based on the ROS platform, combined with Gazebo and Rviz, an experimental simulation of automatic docking of the LNG loading arm was conducted. Findings: The docking errors in the XYZ directions were all less than 0.8 mm, meeting the required docking accuracy. Moreover, the motion performance of the loading arm during docking was smooth and free of abrupt changes, validating its capability to accomplish the automatic docking task. Originality/value: The proposed trajectory planning and automatic docking scheme can be used for the rapid filling of LNG filling arms and LNG tankers to improve the efficiency of LNG transportation. In guiding the docking, the proposed automatic docking scheme is an accurate and efficient way to improve safety. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

377. A machine vision‐based intelligent segmentation method for dam underwater cracks using swarm optimization algorithm and deep learning.

Author: Zhu, Yantao, Niu, Xinqiang, and Tian, Jinzhang
Subjects: *DAM failures, *EARTH dams, *OPTIMIZATION algorithms, *MACHINE learning, *COMPUTER vision, *DEEP learning, *DAMS, *SUBMERSIBLES
Abstract: Ensuring the safety of water networks is a research hotspot in the current water conservancy industry, and dams are an important part. However, over time, the dam is prone to varying degrees of aging and disease, most of which are structural cracks. If they cannot be discovered and repaired in time, the normal operation of the dam will be affected, and even catastrophic accidents such as dam failure will occur. However, complex backgrounds and blurred images can easily lead to misjudgments by machine vision detection models, and high‐efficiency and accurate detection and evaluation technology are urgently needed. This paper combines the deep semantic segmentation network and the model hyperparameters optimization algorithm to propose a data‐intelligent perception method of dam underwater cracks driven by knowledge coupling. Taking the underwater detection of a concrete face rockfill dam as an example, the effectiveness of the model is verified by using the underwater vehicle as the carrier. Experimental results indicate that the developed method achieves an intersection‐union ratio of 0.9301, a precision rate of 0.9678, a precision rate of 0.9472, and a recall rate of 0.9577 in the test set. This shows that the constructed method has a high crack fine detection performance. In addition, the developed method has better segmentation performance in different complex underwater crack scenes, which further illustrates the high performance of the developed method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

378. Casting-DETR: An End-to-End Network for Casting Surface Defect Detection.

Author: Pu, Quan-cheng, Zhang, Hui, Xu, Xiang-rong, Zhang, Long, Gao, Ju, Rodić, Aleksandar, Petrovic, Petar B., Wang, Hai-yan, Xu, Shan-shan, and Wang, Zhi-xiong
Subjects: *SURFACE defects, *COMPUTER vision, *DEEP learning, *TRANSFORMER models
Abstract: The task of utilizing machine vision for the detection of casting surface defects is characterized by small targets, real-time performance, and ease of mobility. The direct application of current mainstream object detection networks for defect detection presents issues of low accuracy and efficiency. Consequently, in this paper, we introduce Casting-DETR, an end-to-end network designed for casting surface defect detection. To assess and validate the model's performance, 554 images of casting samples with surface defects were employed. Casting-DETR achieved an impressive detection rate of 98.97% on the test set, with a single image detection time of 91.5ms. Furthermore, a real-time detection system, built using PyQT6, was tested in four different environments. Casting-DETR exhibited exceptional performance, maintaining a single-frame detection time of approximately 90 ms, demonstrating the model's high robustness and suitability for real-time detection. The Casting-DETR network proposed in this paper is an end-to-end solution with rapid convergence, superior detection accuracy, and swift detection speeds, offering a fresh perspective for similar detection tasks within the industry. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

379. Synthetic Data for Video Surveillance Applications of Computer Vision: A Review.

Author: Delussu, Rita, Putzu, Lorenzo, and Fumera, Giorgio
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *IMAGE analysis, *BEHAVIORAL assessment, *APPLICATION software, *VIDEO surveillance, *DEEP learning
Abstract: In recent years, there has been a growing interest in synthetic data for several computer vision applications, such as automotive, detection and tracking, surveillance, medical image analysis and robotics. Early use of synthetic data was aimed at performing controlled experiments under the analysis by synthesis approach. Currently, synthetic data are mainly used for training computer vision models, especially deep learning ones, to address well-known issues of real data, such as manual annotation effort, data imbalance and bias, and privacy-related restrictions. In this work, we survey the use of synthetic training data focusing on applications related to video surveillance, whose relevance has rapidly increased in the past few years due to their connection to security: crowd counting, object and pedestrian detection and tracking, behaviour analysis, person re-identification and face recognition. Synthetic training data are even more interesting in this kind of application, to address further, specific issues arising, e.g., from typically unconstrained image or video acquisition conditions and cross-scene application scenarios. We categorise and discuss the existing methods for creating synthetic data, analyse the synthetic data sets proposed in the literature for each of the considered applications, and provide an overview of their effectiveness as training data. We finally discuss whether and to what extent the existing synthetic data sets mitigate the issues of real data, highlight existing open issues, and suggest future research directions in this field. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

380. Benchmarking Object Detection Robustness against Real-World Corruptions.

Author: Liu, Jiawei, Wang, Zhijie, Ma, Lei, Fang, Chunrong, Bai, Tongtong, Zhang, Xufan, Liu, Jia, and Chen, Zhenyu
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *DATA augmentation, *DEEP learning, *TRANSFORMER models
Abstract: With the rapid recent development, deep learning based object detection techniques have been applied to various real-world software systems, especially in safety-critical applications like autonomous driving. However, few studies are conducted to systematically investigate the robustness of state-of-the-art object detection techniques against real-world image corruptions and yet few benchmarks of object detection methods in terms of robustness are publicly available. To bridge this gap, we initiate to create a public benchmark of COCO-C and BDD100K-C, composed of sixteen real-world corruptions according to the real damages in camera sensors and image pipeline. Based on that, we further perform a systematic empirical study and evaluation of twelve representative object detectors covering three different categories of architectures (i.e., two-stage, one-stage, transformer architectures) to identify the current challenges and explore future opportunities. Our key findings include (1) the proposed real-world corruptions pose a threat to object detectors, especially for the corruptions involving colour changes, (2) a detector with a high mAP may still be vulnerable to real-world corruptions, (3) if there are potential cross-scenarios applications, the one-stage detectors are recommended, (4) when object detection architectures suffer from real-world corruptions, the effectiveness of existing robustness enhancement methods is limited, and (5) two-stage and one-stage object detection architectures are more likely to miss detect objects compared with transformer-based methods against the proposed corruptions. Our results highlight the need for designing robust object detection methods against real-world corruption and the need for more effective robustness enhancement methods for existing object detectors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

381. Instant3D: Instant Text-to-3D Generation.

Author: Li, Ming, Zhou, Pan, Liu, Jia-Wei, Keppo, Jussi, Lin, Min, Yan, Shuicheng, and Xu, Xiangyu
Subjects: *COMPUTER vision, *RADIANCE, *ALGORITHMS, *COST
Abstract: Text-to-3D generation has attracted much attention from the computer vision community. Existing methods mainly optimize a neural field from scratch for each text prompt, relying on heavy and repetitive training cost which impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. In particular, we propose to combine three key mechanisms: cross-attention, style injection, and token-to-plane transformation, which collectively ensure precise alignment of the output with the input text. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The code, data, and models are available at https://ming1993li.github.io/Instant3DProj/. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

382. ManiCLIP: Multi-attribute Face Manipulation from Text.

Author: Wang, Hao, Lin, Guosheng, del Molino, Ana García, Wang, Anran, Feng, Jiashi, and Shen, Zhiqi
Subjects: *GENERATIVE adversarial networks, *COMPUTER vision, *ENTROPY
Abstract: In this paper we present a novel multi-attribute face manipulation method based on textual descriptions. Previous text-based image editing methods either require test-time optimization for each individual image or are restricted to single attribute editing. Extending these methods to multi-attribute face image editing scenarios will introduce undesired excessive attribute change, e.g., text-relevant attributes are overly manipulated and text-irrelevant attributes are also changed. In order to address these challenges and achieve natural editing over multiple face attributes, we propose a new decoupling training scheme where we use group sampling to get text segments from same attribute categories, instead of whole complex sentences. Further, to preserve other existing face attributes, we encourage the model to edit the latent code of each attribute separately via an entropy constraint. During the inference phase, our model is able to edit new face images without any test-time optimization, even from complex textual prompts. We show extensive experiments and analysis to demonstrate the efficacy of our method, which generates natural manipulated faces with minimal text-irrelevant attribute editing. Code and pre-trained model are available at https://github.com/hwang1996/ManiCLIP. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

383. GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions.

Author: Wang, Tao, Zhang, Kaihao, Shao, Ziqian, Luo, Wenhan, Stenger, Bjorn, Lu, Tong, Kim, Tae-Kyun, Liu, Wei, and Li, Hongdong
Subjects: *IMAGE reconstruction, *COMPUTER vision, *WEATHER, *SOURCE code, *LEARNING ability
Abstract: Image restoration in adverse weather conditions is a difficult task in computer vision. In this paper, we propose a novel transformer-based framework called GridFormer which serves as a backbone for image restoration under adverse weather conditions. GridFormer is designed in a grid structure using a residual dense transformer block, and it introduces two core designs. First, it uses an enhanced attention mechanism in the transformer layer. The mechanism includes stages of the sampler and compact self-attention to improve efficiency, and a local enhancement stage to strengthen local information. Second, we introduce a residual dense transformer block (RDTB) as the final GridFormer layer. This design further improves the network's ability to learn effective features from both preceding and current local features. The GridFormer framework achieves state-of-the-art results on five diverse image restoration tasks in adverse weather conditions, including image deraining, dehazing, deraining & dehazing, desnowing, and multi-weather restoration. The source code and pre-trained models will be released. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

384. Breast tumor segmentation using neural cellular automata and shape guided segmentation in mammography images.

Author: Ali, Mudassar, Wu, Tong, Hu, Haoji, and Mahmood, Tariq
Subjects: *IMAGE segmentation, *COMPUTER vision, *DIAGNOSTIC imaging, *CELLULAR automata, *BREAST cancer, *BREAST
Abstract: Purpose: Using computer-aided design (CAD) systems, this research endeavors to enhance breast cancer segmentation by addressing data insufficiency and data complexity during model training. As perceived by computer vision models, the inherent symmetry and complexity of mammography images make segmentation difficult. The objective is to optimize the precision and effectiveness of medical imaging. Methods: The study introduces a hybrid strategy combining shape-guided segmentation (SGS) and M3D-neural cellular automata (M3D-NCA), resulting in improved computational efficiency and performance. The implementation of Shape-guided segmentation (SGS) during the initialization phase, coupled with the elimination of convolutional layers, enables the model to effectively reduce computation time. The research proposes a novel loss function that combines segmentation losses from both components for effective training. Results: The robust technique provided aims to improve the accuracy and consistency of breast tumor segmentation, leading to significant improvements in medical imaging and breast cancer detection and treatment. Conclusion: This study enhances breast cancer segmentation in medical imaging using CAD systems. Combining shape-guided segmentation (SGS) and M3D-neural cellular automata (M3D-NCA) is a hybrid approach that improves performance and computational efficiency by dealing with complex data and not having enough training data. The approach also reduces computing time and improves training efficiency. The study aims to improve breast cancer detection and treatment methods in medical imaging technology. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

385. Selective feature block and joint IoU loss for object detection.

Author: Wang, Junyi, Hua, Ruzhao, Jiang, Xuezheng, Song, Kechen, Meng, Qinggang, and Saada, Mohamad
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *VISUAL fields, *PROBLEM solving, *DETECTORS
Abstract: Object detection is an important problem in the field of computer vision, and feature fusion and bounding box regression are indispensable in mainstream object detection approaches. However, some detectors adopt Feature Pyramid Network, which increases training and detection time. In terms of the regression loss function, some recent techniques based on Intersection over Union (IoU) loss have negative effects on bounding box regression. To overcome these shortcomings, we propose Selective Feature Block (SFBlock) and Joint IoU (JIoU) loss in this article. The proposed SFBlock adaptively selects the features extracted from the Backbone and fuses them into a new feature. We add a penalty term of the intersection area between the prediction box and the target box on Generalized IoU (GIoU) loss to solve the problem that GIoU loss degenerates into IoU loss when the prediction box and the target box are surrounded by each other. A large number of ablation experiments and comparative experiments are carried out to prove the effectiveness of the proposed methods on various models and datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

386. Building, benchmarking, and exploring perturbative maps of transcriptional and morphological data.

Author: Celik, Safiye, Hütter, Jan-Christian, Carlos, Sandra Melo, Lazar, Nathan H., Mohan, Rahul, Tillinghast, Conor, Biancalani, Tommaso, Fay, Marta M., Earnshaw, Berton A., and Haque, Imran S.
Subjects: *COMPUTER vision, *CRISPRS, *RESEARCH personnel, *RNA sequencing, *ROBOTICS
Abstract: The continued scaling of genetic perturbation technologies combined with high-dimensional assays such as cellular microscopy and RNA-sequencing has enabled genome-scale reverse-genetics experiments that go beyond single-endpoint measurements of growth or lethality. Datasets emerging from these experiments can be combined to construct perturbative "maps of biology", in which readouts from various manipulations (e.g., CRISPR-Cas9 knockout, CRISPRi knockdown, compound treatment) are placed in unified, relatable embedding spaces allowing for the generation of genome-scale sets of pairwise comparisons. These maps of biology capture known biological relationships and uncover new associations which can be used for downstream discovery tasks. Construction of these maps involves many technical choices in both experimental and computational protocols, motivating the design of benchmark procedures to evaluate map quality in a systematic, unbiased manner. Here, we (1) establish a standardized terminology for the steps involved in perturbative map building, (2) introduce key classes of benchmarks to assess the quality of such maps, (3) construct 18 maps from four genome-scale datasets employing different cell types, perturbation technologies, and data readout modalities, (4) generate benchmark metrics for the constructed maps and investigate the reasons for performance variations, and (5) demonstrate utility of these maps to discover new biology by suggesting roles for two largely uncharacterized genes. Author summary: Due to the rapid advancements in genetic perturbation, laboratory robotics, sequencing, and computer vision, more researchers are now generating datasets that capture cellular responses to genetic perturbations. These datasets can be powerful discovery tools for examining known biological relationships and revealing new associations in an unbiased manner when paired with a computational pipeline that can assemble the data into a digestible format. However, the challenge arises from the variety of cellular models, assay designs, terminologies, codebases, and analysis methods involved. In this work we define a unified framework for building and benchmarking perturbative maps, benchmark four different datasets assembled into 18 different maps, explore the impact of different design decisions, and demonstrate how these maps can be used to elucidate gene functions. Our goal is to facilitate comparisons across various technologies and methods by introducing a shared language for the field. The open-source codebase, capable of incorporating new methods, aims to be a resource for researchers developing laboratory or computational methodology. While we caution against definitive recommendations due to numerous variables at play, we hope to stimulate studies directly comparing methods under controlled conditions. Our framework can also help evaluate combining maps across modalities as the field progresses. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

387. Automatic classification of bird species related to power line faults using deep convolution features and ECOC‐SVM model.

Author: Qiu, Zhibin, Zhou, Zhibiao, and Wan, Zhoutao
Subjects: *CONVOLUTIONAL neural networks, *BIRD classification, *ELECTRIC lines, *COMPUTER vision, *AUTOMATIC classification
Abstract: Bird‐related outages greatly threaten the safety of overhead transmission and distribution lines, while electrocution and collisions of birds with power lines, especially endangered species, are significant environmental concerns. Automatic bird recognition can be helpful to mitigate this contradiction. This paper proposes a method for automatic classification of bird species related to power line faults combining deep convolution features with error‐correcting output codes support vector machine (ECOC‐SVM). An image dataset of about 20 high‐risk and 20 low‐risk bird species was constructed, and the feed‐forward denoising convolutional neural network was used for image preprocessing. The deep convolution features of bird images were extracted by DarkNet‐53, and taken as inputs of the ECOC‐SVM for model training and bird species classification. The gradient‐weighted class activation mapping was used for visual explanations of the model decision region. The experimental results indicate that the average accuracy of the proposed method can reach 94.39%, and its performance was better than other models using different feature extraction networks and classification algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

388. Near‐Infrared Imaging Highly Enhanced by Pixel‐Level Integrated Plasmonic Metasurfaces on CMOS Image Sensors.

Author: Nan, Xianghong, Zheng, Qilin, Dong, Yajin, Liu, Yongjun, Pan, Dahui, Chen, Bojun, Wang, Haiquan, He, Huifan, Gong, Yunyang, Wen, Long, and Chen, Qin
Subjects: *CMOS image sensors, *LIGHT absorption, *OPTICAL radar, *LIDAR, *COMPUTER vision, *PHOTOTHERMAL effect
Abstract: Near‐infrared (NIR) photodetection and imaging have sparked significant interests across a wide range of applications. While silicon photodiodes are commonly employed, the small light absorption coefficients of Si in NIR severely limit the performance, especially in the case of thin active Si layers. Although various light harvesting techniques are proposed to increase light absorption of Si, pixel‐level strategy for enhanced NIR imaging is still challenging in CMOS image sensors (CISs) with a pixel size in only a micron scale. In this paper, plasmonic metasurfaces are intimately integrated on top of 2.3 µm thick Si active regions of the pixels of a backside illumination (BI)‐CIS for NIR imaging for the first time. 200% improved photoresponsivity is obtained in experiments in such a planar Si layer rather than patterning the Si layer with potential damage to the active region. Numerical simulation results reveal highly enhanced light intensity in the thin active Si layer due to the presence of plasmonic metasurfaces. Significantly improved imaging brightness and signal‐to‐noise ratio of NIR imaging are demonstrated under both laser and LED illumination. This CMOS‐compatible technique is expected to hold promising potentials in applications including machine vision, iris certification, light detection and ranging (LiDAR), and optical communication in data centers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

389. Batch-transformer for scene text image super-resolution.

Author: Sun, Yaqi, Xie, Xiaolan, Li, Zhi, and Yang, Kai
Subjects: *COMPUTER vision, *CONVOLUTIONAL neural networks, *HIGH resolution imaging, *SIGNAL-to-noise ratio, *VISION disorders
Abstract: Recognizing low-resolution text images is challenging as they often lose their detailed information, leading to poor recognition accuracy. Moreover, the traditional methods, based on deep convolutional neural networks (CNNs), are not effective enough for some low-resolution text images with dense characters. In this paper, a novel CNN-based batch-transformer network for scene text image super-resolution (BT-STISR) method is proposed to address this problem. In order to obtain the text information for text reconstruction, a pre-trained text prior module is employed to extract text information. Then a novel two pipeline batch-transformer-based module is proposed, leveraging self-attention and global attention mechanisms to exert the guidance of text prior to the text reconstruction process. Experimental study on a benchmark dataset TextZoom shows that the proposed method BT-STISR achieves the best state-of-the-art performance in terms of structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) metrics compared to some latest methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

390. Non-corresponding and topology-free 3D face expression transfer.

Author: Liu, Shanghuan, Gai, Shaoyan, and Da, Feipeng
Subjects: *POINT processes, *COMPUTER vision, *POINT cloud, *COMPUTER graphics, *PROBLEM solving, *FACE perception
Abstract: Expression transfer is an important task in computer graphics and vision. Existing 3D face models constructed on registered meshes or shapes with corresponding vertices cannot transfer expression over practical data. While recent learning-based works achieved pose transfer between 3D unorganized point clouds, they cannot transfer 3D face expressions well because of weak geometry-perceiving ability and lack of ground truth expression faces for training. To solve the problems, we propose an effective framework that can transfer expressions on non-corresponding and topology-free 3D faces for the first time. The framework includes a novel autoencoder that directly processes unordered point clouds to extract identity and expression features and fuse them to generate desired target faces. Multiple geometry-perception operators are introduced to the autoencoder's encoders to obtain 3D faces' valuable geometry information without repetitive modulations in previous methods. Besides, our decoder utilizes cross-attention's powerful interactive perception capability to fuse extracted features and deform target faces in feature space. To train the autoencoder in a supervised manner, we present a submodule that generates pseudo-ground truth expression faces using pre-trained deep models and their latent operations. The experiments demonstrate the proposed method's outstanding 3D face expression transfer performances. Our code and data are available at https://github.com/SEULSH/Non-corresponding-and-Topology-free-3D-Face-Expression-Transfer. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

391. Gaussian-based adaptive frame skipping for visual object tracking.

Author: Gao, Fei, You, Shengzhe, Ge, Yisu, and Zhang, Shifeng
Subjects: *TRACKING algorithms, *COMPUTER vision, *ALGORITHMS, *OBJECT tracking (Computer vision), *VIDEO surveillance, *FORECASTING
Abstract: Visual object tracking is a basic computer vision problem, which has been greatly developed in recent years. Although the accuracy of object tracking algorithms has been improved, the efficiency of most trackers is hard to meet practical requirements, especially for devices with limited computational power. To improve visual object tracking efficiency with no or little loss of accuracy, a frame skipping method is proposed for correlation filter-based trackers, which includes an adaptive tracking-skipping algorithm and Gaussian-based movement prediction. According to the movement state of objects in the previous frames, the position of objects in the next frame can be predicted, and whether or not the tracking process should be skipped is determined by the predicted position. Experiments are conducted on both practical video surveillance and well-known public data sets to evaluate the proposed method. Experimental results show that the proposed method can almost double the tracking efficiency of correlation filter-based trackers with no or little accuracy loss. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

392. Underwater image enhancement via color conversion and white balance-based fusion.

Author: Xu, Hanning, Mu, Pan, Liu, Zheyuan, and Cheng, Shichao
Subjects: *CONVOLUTIONAL neural networks, *IMAGE intensifiers, *COMPUTER vision, *LIGHT absorption, *REFRACTION (Optics), *IMAGE enhancement (Imaging systems)
Abstract: The task of enhancing underwater images presents a significant challenge due to the refraction and absorption of light in water, resulting in images that often appear bluish or greenish with diminished contrast. Furthermore, the scarcity of underwater datasets complicates the achievement of robust generalization capacity to address complex underwater scenarios. In this study, we introduce generalized underwater image enhancement model with color-guided adaptive feature fusion (GU-CAFF), designed to rectify various degraded underwater images, utilizing a minimal amount of training data. GU-CAFF primarily comprises two modules: a multi-level color-feature encoder (MCE) and a white balance-based fusion (WBF) module. The MCE integrates physical models to extract features from underwater images exhibiting different color deviations, emphasizing essential features while preserving their structural information. In addition, WBF, in conjunction with a statistical model, is proposed to fuse the features extracted by the encoder and rectify the color distortion of specific pixels in degraded images. The proposed method can be trained once on our developed dataset and exhibits robust generalization capabilities on other datasets. Quantitative and qualitative comparisons are conducted with several state-of-the-art underwater image enhancement models, demonstrating our superior performance in enhancing underwater images.The source code will be available at https://github.com/shiningZZ/GU-CAFF. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

393. Learning geometric invariants through neural networks.

Author: Rai, Arpit
Subjects: *CONVOLUTIONAL neural networks, *COMPUTER vision, *IMAGE recognition (Computer vision), *DATA augmentation, *ROTATIONAL motion
Abstract: Convolution neural networks have become a fundamental model for solving various computer vision tasks. However, these operations are only invariant to translations of objects and their performance suffer under rotation and other affine transformations. This work proposes a novel neural network that leverages geometric invariants, including curvature, higher-order differentials of curves extracted from object boundaries at multiple scales, and the relative orientations of edges. These features are invariant to affine transformation and can improve the robustness of shape recognition in neural networks. Our experiments on the smallNORB dataset with a 2-layer network operating over these geometric invariants outperforms a 3-layer convolutional network by 9.69% while being more robust to affine transformations, even when trained without any data augmentations. Notably, our network exhibits a mere 6% degradation in test accuracy when test images are rotated by 40 ∘ , in contrast to significant drops of 51.7 and 69% observed in VGG networks and convolution networks, respectively, under the same transformations. Additionally, our models show superior robustness than invariant feature descriptors such as the SIFT-based bag-of-words classifier, and its rotation invariant extension, the RIFT descriptor that suffer drops of 35 and 14.1% respectively, under similar image transformations. Our experimental results further show improved robustness against scale and shear transformations. Furthermore, the multi-scale extension of our geometric invariant network, that extracts curve differentials of higher orders, show enhanced robustness to scaling and shearing transformations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

394. Using scale-equivariant CNN to enhance scale robustness in feature matching.

Author: Liao, Yun, Liu, Peiyu, Wu, Xuning, Pan, Zhixuan, Zhu, Kaijun, Zhou, Hao, Liu, Junhui, and Duan, Qing
Subjects: *COMPUTER vision, *CONVOLUTIONAL neural networks, *TRANSFORMER models, *PROBLEM solving, *IMAGE registration
Abstract: Image matching is an important task in computer vision. The detector-free dense matching method is an important research direction of image matching due to its high accuracy and robustness. The classical detector-free image matching methods utilize convolutional neural networks to extract features and then match them. Due to the lack of scale equivariance in CNNs, this method often exhibits poor matching performance when the images to be matched undergo significant scale variations. However, large-scale variations are very common in practical problems. To solve the above problem, we propose SeLFM, a method that combines scale equivariance and the global modeling capability of transformer. The two main advantages of this method are scale-equivariant CNNs can extract scale-equivariant features, while transformer also brings global modeling capability. Experiments prove that this modification improves the performance of the matcher in matching image pairs with large-scale variations and does not affect the general matching performance of the matcher. The code will be open-sourced at this link: https://github.com/LiaoYun0x0/SeLFM/tree/main [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

395. Image Motion Blur Removal Algorithm Based on Generative Adversarial Network.

Author: Kim, Jongchol, Kim, Myongchol, Kim, Insong, Han, Gyongwon, Jong, Myonghak, and Ri, Gwuangwon
Subjects: *GENERATIVE adversarial networks, *COMPUTER vision, *OBJECT recognition (Computer vision), *DEEP learning, *VISUAL fields, *IMAGE reconstruction
Abstract: The restoration of blurred images is a crucial topic in the field of machine vision, with far-reaching implications for enhancing information acquisition quality, improving algorithmic accuracy and enriching image texture. Efforts to mitigate the phenomenon of blur have progressed from statistical approaches to those utilizing deep learning techniques. In this paper, we propose a Generative Adversarial Network (GAN)-based image restoration method to address the limitations of existing techniques in restoring color and detail in motion-blurred images. To reduce the computational complexity of generative adversarial networks and the vanishing gradient during learning, an U-net-based generator is used, and it is configured to emphasize the channel and spatial characteristics of the original information through a proposed CSAR(Channel and Spatial Attention Residual) blocks module rather than a simple concatenate operation. To validate the efficacy of the algorithm, comprehensive comparative experiments have been conducted on the GoPro dataset. Experimental results show that the peak signal-to-noise ratio is improved compared to SRN and MPRNet algorithms with good image restoration ability. Objects detection experiments using Yolo V3 showed that the proposed algorithms can generate deblerring images with higher information quality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

396. Enhancing generalizability of a machine learning model for infrared thermographic defect detection by using 3d numerical modeling.

Author: Chulkov, Arsenii, Moskovchenko, Alexey, and Vavilov, Vladimir
Subjects: *MACHINE learning, *ARTIFICIAL neural networks, *THERMAL diffusivity, *COMPUTER vision, *ARTIFICIAL intelligence, *DEEP learning, *THERMAL conductivity
Abstract: This article explores the use of machine learning models in infrared thermographic nondestructive testing (NDT) for defect detection. The study found that incorporating temperature contrast data improved the efficiency of the models, achieving sensitivity rates of over 98% across all test datasets. However, the study also emphasized the importance of carefully selecting training data parameters and implementing proper data processing to avoid negative outcomes. The article provides references for further research in the field and discusses the potential of deep learning and other techniques in infrared thermography. [Extracted from the article]
Published: 2024
Full Text: View/download PDF

397. Imagining machine vision: Four visual registers from the Chinese AI industry.

Author: de Seta, Gabriele and Shchetvina, Anya
Subjects: *COMPUTER vision, *HUMAN facial recognition software, *ARTIFICIAL intelligence, *INFORMATION technology industry, *HIGH technology industries, *FACE perception
Abstract: Machine vision is one of the main applications of artificial intelligence. In China, the machine vision industry makes up more than a third of the national AI market, and technologies like face recognition, object tracking and automated driving play a central role in surveillance systems and social governance projects relying on the large-scale collection and processing of sensor data. Like other novel articulations of technology and society, machine vision is defined, developed and explained by different actors through the work of imagination. In this article, we draw on the concept of sociotechnical imaginaries to understand how Chinese companies represent machine vision. Through a qualitative multimodal analysis of the corporate websites of leading industry players, we identify a cohesive sociotechnical imaginary of machine vision, and explain how four distinct visual registers contribute to its articulation. These four registers, which we call computational abstraction, human–machine coordination, smooth everyday, and dashboard realism, allow Chinese tech companies to articulate their global ambitions and competitiveness through narrow and opaque representations of machine vision technologies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

398. Artificial Intelligence in Facial Plastic and Reconstructive Surgery: A Systematic Review.

Author: Espinosa Reyes, Jorge Alberto, Puerta Romero, Mauricio, Cobo, Roxana, Heredia, Nicolas, Solís Ruiz, Luis Alberto, and Corredor Zuluaga, Diego Andres
Subjects: *ARTIFICIAL intelligence, *COMPUTER vision, *CONVOLUTIONAL neural networks, *PLASTIC surgery, *DATABASES
Abstract: Artificial intelligence (AI) is a technology that is evolving rapidly and is changing the world and medicine as we know it. After reviewing the PROSPERO database of systematic reviews, there is no article related to this topic in facial plastic and reconstructive surgery. The objective of this article was to review the literature regarding AI applications in facial plastic and reconstructive surgery. A systematic review of the literature about AI in facial plastic and reconstructive surgery using the following keywords: Artificial Intelligence, robotics, plastic surgery procedures, and surgery plastic and the following databases: PubMed, SCOPUS, Embase, BVS, and LILACS. The inclusion criteria were articles about AI in facial plastic and reconstructive surgery. Articles written in a language other than English and Spanish were excluded. In total, 17 articles about AI in facial plastic met the inclusion criteria; after eliminating the duplicated papers and applying the exclusion criteria, these articles were reviewed thoroughly. The leading type of AI used in these articles was computer vision, explicitly using models of convolutional neural networks to objectively compare the preoperative with the postoperative state in multiple interventions such as facial lifting and facial transgender surgery. In conclusion, AI is a rapidly evolving technology, and it could significantly impact the treatment of patients in facial plastic and reconstructive surgery. Legislation and regulations are developing slower than this technology. It is imperative to learn about this topic as soon as possible and that all stakeholders proactively promote discussions about ethical and regulatory dilemmas. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

399. Automatic diagnosis for adenomyosis in ultrasound images by deep neural networks.

Author: Zhao, Qinghong, Yang, Tongyu, Xu, Changyong, Hu, Jiaqi, Shuai, Yu, Zou, Hua, and Hu, Wei
Subjects: *ARTIFICIAL neural networks, *TRANSVAGINAL ultrasonography, *COMPUTER vision, *DEEP learning, *ARTIFICIAL intelligence
Abstract: • Adenomyosis is a common gynecologic disease with some difficulty in diagnosing treatment due to up to a third of patients being asymptomatic. • The methods of medical image analysis based on deep learning have become a research hotspot in the field of computer vision in recent years, the research of adenomyosis by artificial intelligence is relatively rare. • This study presents a new noninvasive technique for automatic diagnosis of adenomyosis, using a novel end-to-end unified network framework based on transformer networks. This technique is effective and easy to adopt with a potential to contribute to shared decision-making. To present a new noninvasive technique for automatic diagnosis of adenomyosis, using a novel end-to-end unified network framework based on transformer networks. This is a prospective descriptive study conducted at a university hospital.1654 patients were recruited to the study according to adenomyosis diagnosed by transvaginal ultrasound (TVS). For adenomyosis characteristics and ultrasound images, automatic identification of adenomyosis were performed based on deep learning methods. We called this unique technique A2DNet: Adenomyosis Auto Diagnosis Network. The A2DNet exhibits excellent performance in diagnosis of adenomyosis, achieving an accuracy of 92.33%, a precision of 96.06%, a recall of 91.71% and an F1 score of 93.80% in the test group. The confusion matrix of experimental results show that the A2DNet can achieve a correct diagnosis rate of 92% or more for both normal and adenomyosis samples, which demonstrate the superiority of the A2DNet comparing with the state-of-the-arts. The A2DNet is a safe and effective technique to aid in automatic diagnosis of adenomyosis. The technique which is nondestructive and non-invasive, is new and unique due to the advantages of artificial intelligence. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

400. Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method.

Author: Fan, Yuhe, Zhang, Lixun, Zheng, Canxing, Wang, Xingyuan, Zhu, Jinghui, and Wang, Lan
Abstract: Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Category

Publication Type

Journal

Region

Database

Publisher

734,684 results on '"Computer vision"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources