13,027 results on '"Object Recognition"'
Search Results
2. EventBind: Learning a Unified Representation to Bind Them All for Event-Based Open-World Understanding
- Author
-
Zhou, Jiazhou, Zheng, Xu, Lyu, Yuanhuiyi, Wang, Lin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
3. DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model
- Author
-
Song, Shezheng, Li, Shasha, Yu, Jie, Zhao, Shan, Li, Xiaopeng, Ma, Jun, Liu, Xiaodong, Li, Zhuo, Mao, Xiaoguang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Smart Chapeau for Visually Impaired Person
- Author
-
Choudhari, Yash, Kudale, Ankush, Singh, Chandrani, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Lin, Frank, editor, Pastor, David, editor, Kesswani, Nishtha, editor, Patel, Ashok, editor, Bordoloi, Sushanta, editor, and Koley, Chaitali, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Sex differences in the context dependency of episodic memory.
- Author
-
Le, Aliza, Palmer, Linda, Chavez, Jasmine, Gall, Christine, and Lynch, Gary
- Subjects
behavior ,context ,episodic memory ,female ,mouse ,object recognition ,sex differences ,unsupervised learning - Abstract
Context contributes to multiple aspects of human episodic memory including segmentation and retrieval. The present studies tested if, in adult male and female mice, context influences the encoding of odors encountered in a single unsupervised sampling session of the type used for the routine acquisition of episodic memories. The three paradigms used differed in complexity (single vs. multiple odor cues) and period from sampling to testing. Results show that males consistently encode odors in a context-dependent manner: the mice discriminated novel from previously sampled cues when tested in the chamber of initial cue sampling but not in a distinct yet familiar chamber. This was independent of the interval between cue encounters or the latency from initial sampling to testing. In contrast, female mice acquired both single cues and the elements of multi-cue episodes, but recall of that information was dependent upon the surrounding context only when the cues were presented serially. These results extend the list of episodic memory features expressed by rodents and also introduce a striking and unexpected sex difference in context effects.
- Published
- 2024
6. The novel estrogen receptor beta agonist EGX358 and APOE genotype influence memory, vasomotor, and anxiety outcomes in an Alzheimer's mouse model.
- Author
-
Schwabe, M. R., Fleischer, A. W., Kuehn, R. K., Chaudhury, S., York, J. M., Sem, D. S., Donaldson, W. A., LaDu, M. J., and Frick, K. M.
- Abstract
Introduction: Alzheimer's disease (AD) prevalence and severity are associated with increased age, female sex, and apolipoprotein E4 (APOE4) genotype. Although estrogen therapy (ET) effectively reduces symptoms of menopause including hot flashes and anxiety, and can reduce dementia risk, it is associated with increased risks of breast and uterine cancer due to estrogen receptor alpha (ERα)-mediated increases in cancer cell proliferation. Because ERβ activation reduces this cell proliferation, selective targeting of ERβ may provide a safer method of improving memory and reducing hot flashes in menopausal women, including those with AD. APOE genotype influences the response to ET, although it is unknown whether effects of ERβ activation vary by genotype. Methods: Here, we tested the ability of long-term oral treatment with a novel highly selective ERβ agonist, EGX358, to enhance object recognition and spatial recognition memory, reduce drug-induced hot flashes, and influence anxiety-like behaviors in female mice expressing 5 familial AD mutations (5xFAD-Tg) and human APOE3 (E3FAD) or APOE3 and APOE4 (E3/4FAD). Mice were ovariectomized at 5 months of age and were then treated orally with vehicle (DMSO) or EGX358 (10 mg/kg/day) via hydrogel for 8 weeks. Spatial and object recognition memory were tested in object placement (OP) and object recognition (OR) tasks, respectively, and anxiety-like behaviors were tested in the open field (OF) and elevated plus maze (EPM). Hot flash-like symptoms (change in tail skin temperature) were measured following injection of the neurokinin receptor agonist senktide (0.5 mg/kg). Results: EGX358 enhanced object recognition memory in E3FAD and E3/4FAD mice but did not affect spatial recognition memory. EGX358 also reduced senktide-induced tail temperature elevations in E3FAD, but not E3/4FAD, females. EGX358 did not influence anxiety-like behaviors or body weight. Discussion: These data indicate that highly selective ERβ agonism can facilitate object recognition memory in both APOE3 homozygotes and APOE3/4 heterozygotes, but only reduce the magnitude of a drug-induced hot flash in APOE3 homozygotes, suggesting that APOE4 genotype may blunt the beneficial effects of ET on hot flashes. Collectively, these data suggest a potentially beneficial effect of selective ERβ agonism for memory and hot flashes in females with AD-like pathology, but that APOE genotype plays an important role in responsiveness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. The Number of Pictograms About Side Effects on the Medication Package Influences Medication Risk Perception.
- Author
-
Laasner Vogt, Lea, Reijnen, Ester, Kühne, Swen J., and Sulser, Marc
- Subjects
- *
RISK perception , *PICTURE-writing , *RECOGNITION (Psychology) - Abstract
Abstract:Introduction and aim: Pictograms can make taking medication safer. However, little is known about how pictograms on a medication package influence the subjective assessment of a medication. Methods: In this online study, 276 participants were presented with a fictitious package that contained 0 to 5 pictograms of possible side effects. Participants had to assess the probability of side effects occurring as well as the benefits and harms of the medication, both before and after consulting the package insert. Results: The number of pictograms (leveling out at 2 pictograms) influenced the assessment of the probability of side effects occurring. In addition, the assessment of this measure served as an anchor for assessing all subsequent measures (e.g., benefit). Although participants adjusted their measures after package insert consultation - these adjustments were insufficient (as expected from a normative probability account). Discussion and conclusion: Pictograms influence medication assessment, and humans can process only a limited number of pictograms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Mobile Augmented Reality Interface for Instruction-based Disaster Preparedness Guidelines.
- Author
-
De León Aguilar, Sergio, Yuki Matsuda, and Keiichi Yasumoto
- Subjects
EMERGENCY management ,COLLEGE curriculum ,INFORMATION science ,OBJECT recognition (Computer vision) ,TELECOMMUNICATION ,EARTHQUAKE resistant design ,AUGMENTED reality - Abstract
The article offers an examination of augmented reality (AR)-assisted disaster preparedness guidelines designed to improve public awareness and engagement. Topics include the use of AR to enhance disaster preparedness by incorporating object recognition for environmental hazard identification, a comparison of AR-based guidelines with traditional paper-based ones in terms of usability and task performance, and the findings from testing these interfaces across different age groups.
- Published
- 2024
- Full Text
- View/download PDF
9. Effects of 5-ion 6-beam sequential irradiation in the presence and absence of hindlimb or control hindlimb unloading on behavioral performances and plasma metabolic pathways of Fischer 344 rats.
- Author
-
Raber, Jacob, Chaudhari, Mitali, De la Torre, Alexis, Holden, Sarah, Kessler, Kat, Glaeser, Breanna, Lenarczyk, Marek, Leonard, Scott Willem, Borg, Alexander, Kwok, Andy, Patel, Chirayu, Kronenberg, Amy, Olsen, Christopher M., Willey, Jeffrey S., Morré, Jeffrey, Choi, Jaewoo, Stevens, Jan Frederik, Bobe, Gerd, Minnier, Jessica, and Baker, John
- Abstract
Introduction: Effects and interactions between different spaceflight stressors are expected to be experienced by crew on missions when exposed to microgravity and galactic cosmic rays (GCRs). One of the limitations of previous studies on simulated weightlessness using hindlimb unloading (HU) is that a control HU condition was not included. Methods: We characterized the behavioral performance of male Fischer rats 2 months after sham or total body irradiation with a simplified 5-ion 6-mixed-beam exposure representative of GCRs in the absence or presence of HU. Six months later, the plasma, hippocampus, and cortex were processed to determine whether the behavioral effects were associated with long-term alterations in the metabolic pathways. Results: In the open field without and with objects, interactions were observed for radiation × HU. In the plasma of animals that were not under the HU or control HU condition, the riboflavin metabolic pathway was affected most for sham irradiation vs. 0.75 Gy exposure. Analysis of the effects of control HU on plasma in the sham-irradiated animals showed that the alanine, aspartate, glutamate, riboflavin, and glutamine metabolisms as well as arginine biosynthesis were affected. The effects of control HU on the hippocampus in the sham-irradiated animals showed that the phenylalanine, tyrosine, and tryptophan pathway was affected the most. Analysis of effects of 0.75 Gy irradiation on the cortex of control HU animals showed that the glutamine and glutamate metabolic pathway was affected similar to the hippocampus, while the riboflavin pathway was affected in animals that were not under the control HU condition. The effects of control HU on the cortex in sham-irradiated animals showed that the riboflavin metabolic pathway was affected. Animals receiving 0.75 Gy of irradiation showed impaired glutamine and glutamate metabolic pathway, whereas animals receiving 1.5 Gy of irradiation showed impaired riboflavin metabolic pathways. A total of 21 plasma metabolites were correlated with the behavioral measures, indicating that plasma and brain biomarkers associated with behavioral performance are dependent on the environmental conditions experienced. Discussion: Phenylalanine, tyrosine, and tryptophan metabolism as well as phenylalanine and tryptophan as plasma metabolites are biomarkers that can be considered for spaceflight as they were revealed in both Fischer and WAG/Rij rats exposed to simGCRsim and/or HU. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Vision-based robotic peg-in-hole research: integrating object recognition, positioning, and reinforcement learning.
- Author
-
Chen, Chengjun, Wang, Hao, Pan, Yong, and Li, Dongnian
- Subjects
- *
RECOGNITION (Psychology) , *INSPECTION & review , *ROBOTICS , *ROBOTS , *PIXELS - Abstract
The peg-in-hole task is important in robotics. Visual inspection is a crucial method for recognition and positioning during this task. Currently, vision-based peg-in-hole techniques suffer from limited applicability and low positioning accuracy. Therefore, this study introduced a general vision-based approach for robotic peg-in-hole tasks. This approach delineates the process into two stages. First, during the object recognition, positioning, and approach phases, a coarse adjustment technique for the assembly pose of the robot's end effector was proposed based on object recognition. This method determines the pose of the hole to be assembled through ellipse fitting, thereby guiding the robot to approach the hole. Second, a Q-learning-based method was introduced to fine adjust the robot end effector's pose and position. Q-learning was applied to the scenario of small-scale adjustment of the robotic peg-in-hole, the reward function based on the pixel area of the gap and the included angle between central axes of peg and hole are designed. Finally, the feasibility and efficacy of this method are substantiated through a series of assembly experiments. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Swin‐fisheye: Object detection for fisheye images.
- Author
-
Zhang, Dawei, Yang, Tingting, and Zhao, Bokai
- Subjects
- *
OBJECT recognition (Computer vision) , *TRANSFORMER models , *COMPUTER vision , *CAMERAS , *ALGORITHMS - Abstract
Fisheye cameras have been widely used in autonomous navigation, visual surveillance, and automatic driving. Due to severe geometric distortion, fisheye images cannot be processed effectively by conventional methods. The existing object detection algorithms cannot better detect the small targets or the objects with large distortion in the fisheye images. The size and scene of available fisheye datasets (such as WoodScape and VOC‐360) cannot satisfy the training of robust network models. Herein, the authors propose Swin‐Fisheye, an end‐to‐end object detection algorithm based on Swin Transformer. A feature pyramid module based on deformable convolution (DFPM) is designed to obtain richer contextual information from the multi‐scale feature maps. In addition, a projection transformation algorithm (PTA) is proposed, which can convert rectilinear images into fisheye images more accurately, and then create a fisheye image dataset (COCO‐Fish). The results of extensive experiments conducted on VOC‐360, WoodScape, and COCO‐Fish demonstrate that the proposed algorithm can achieve satisfactory results compared with state‐of‐the‐art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Tailhook Recognition for Carrier-Based Aircraft Based on YOLO with Bi-Level Routing Attention.
- Author
-
Lu, Aiguo, Liu, Pandi, Yang, Jie, Li, Zhe, and Wang, Ke
- Subjects
- *
OBJECT recognition (Computer vision) , *LEARNING ability , *MODEL airplanes , *ROUTING algorithms - Abstract
To address the problems of missed and false detections caused by target occlusion and lighting variations, this paper proposes a recognition model based on YOLOv5 with bi-level routing attention to achieve precise real-time small object recognition, using the problem of tailhook recognition for carrier-based aircraft as a representative application. Firstly, a module called D_C3, which combines deformable convolution, was integrated into the backbone network to enhance the model's learning ability and adaptability in specific scenes. Secondly, a bi-level routing attention mechanism was employed to dynamically focus on the regions of the feature map that are more likely to contain the target, leading to more accurate target localization and classification. Additionally, the loss function was optimized to accelerate the bounding box regression process. The experimental results on the self-constructed CATHR-DET and the public VOC dataset demonstrate that the proposed method outperforms the baselines in overall performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Sequential encoding aids working memory for meaningful objects' identities but not for their colors.
- Author
-
Chung, Yong Hoon, Brady, Timothy F., and Störmer, Viola S.
- Subjects
- *
PROMPTS (Psychology) , *TASK performance , *COLOR vision , *DESCRIPTIVE statistics , *MEMORY , *SHORT-term memory , *VISUAL perception , *COMPARATIVE studies , *SEMANTIC memory - Abstract
Previous studies have found that real-world objects' identities are better remembered than simple features like colored circles, and this effect is particularly pronounced when these stimuli are encoded one by one in a serial, item-based way. Recent work has also demonstrated that memory for simple features like color is improved if these colors are part of real-world objects, suggesting that meaningful objects can serve as a robust memory scaffold for their associated low-level features. However, it is unclear whether the improved color memory that arises from the colors appearing on real-world objects is affected by encoding format, in particular whether items are encoded sequentially or simultaneously. We test this using randomly colored silhouettes of recognizable versus unrecognizable scrambled objects that offer a uniquely controlled set of stimuli to test color working memory of meaningful versus non-meaningful objects. Participants were presented with four stimuli (silhouettes of objects or scrambled shapes) simultaneously or sequentially. After a short delay, they reported either which colors or which shapes they saw in a two-alternative forced-choice task. We replicated previous findings that meaningful stimuli boost working memory performance for colors (Exp. 1). We found that when participants remembered the colors (Exp. 2) there was no difference in performance across the two encoding formats. However, when participants remembered the shapes and thus identity of the objects (Exp. 3), sequential presentation resulted in better performance than simultaneous presentation. Overall, these results show that different encoding formats can flexibly impact visual working memory depending on what the memory-relevant feature is. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Improvement of Bounding Box and Instance Segmentation Accuracy Using ResNet-152 FPN with Modulated Deformable ConvNets v2 Backbone-based Mask Scoring R-CNN.
- Author
-
Shanmugasundaram, Suresh and Palaniappan, Natarajan
- Subjects
- *
CONVOLUTIONAL neural networks , *OBJECT recognition (Computer vision) , *NETWORK performance , *SPINE , *DEEP learning - Abstract
A challenging task is to make sure that the deep learning network learns prediction accuracy by itself. Intersection-over-Union (IoU) amidst ground truth and instance mask determines mask quality. There is no relationship between classification score and mask quality. The mission is to investigate this problem and learn the predicted instance mask's accuracy. The proposed network regresses the MaskIoU by comparing the predicted mask and the respective instance feature. The mask scoring strategy determines the disorder among mask score and mask quality, then adjusts the parameters accordingly. Adaptation ability to the object's geometric variations decides deformable convolutional network's performance. Using increased modeling power and stronger training, focusing ability on pertinent image regions is improved by a reformulated Deformable ConvNets. The introduction of modulation technique, which broadens the deformation modeling scope, and the integration of deformable convolution comprehensively within the network enhance the modeling power. The features which resemble region-based convolutional neural network (R-CNN) feature's classification capability and its object focus are learned by the network with the help of feature mimicking scheme of DCNv2. Feature mimicking scheme of DCNv2 guides the network training to efficiently control this enhanced modeling capability. The backbone of the proposed Mask Scoring R-CNN network is designed with ResNet-152 FPN and DCNv2 network. The proposed Mask Scoring R-CNN network with DCNv2 network is also tested with other backbones ResNet-50 and ResNet-101. Instance segmentation and object detection on COCO benchmark and Cityscapes dataset are achieved with top accuracy and improved performance using the proposed network. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. GAM-YOLOv8n: enhanced feature extraction and difficult example learning for site distribution box door status detection.
- Author
-
Zhao, Song, Cai, TaiWei, Peng, Bao, Zhang, Teng, and Zhou, XiaoBing
- Subjects
- *
OBJECT recognition (Computer vision) , *BUILDING sites , *FEATURE extraction , *INFORMATION networks , *LEARNING ability - Abstract
The detection of distribution box doors on construction sites is particularly important in site safety management, but the size and posture of distribution boxes vary in different scenarios, and there are still challenges. This article proposes an improved YOLOv8n construction site distribution box door status detection and recognition method. Firstly, Global Attention Mechanism is introduced to reduce information dispersion and enhance global interaction representation, preserving the correlation between spatial and channel information to strengthen the network's feature extraction capability during the detection process. Secondly, to tackle the problem of class imbalance in construction site distribution box door state detection, the Focal_EIoU detection box loss function is used to replace the CIoU loss function, optimizing the model's ability to learn from difficult samples.Lastly,the proposed method is evaluated on a dataset of distribution boxes with different shapes and sizes collected from various construction scenes. Experimental results demonstrate that the improved YOLOv8n algorithm achieves an average precision (mAP) of 82.1% at a speed of 66.7 frames per second, outperforming other classical object detection networks and the original network. This improved method provides an efficient and accurate solution for practical detection tasks in smart chemical sites, especially in enhancing feature extraction and processing difficult sample cases, which has made significant progress. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition.
- Author
-
Chen, Keyan, Jiang, Xiaolong, Wang, Haochen, Yan, Cilin, Gao, Yan, Tang, Xu, Hu, Yao, and Xie, Weidi
- Subjects
- *
OBJECT recognition (Computer vision) , *GENERALIZATION , *CLASSIFICATION , *FORECASTING , *ANCHORS - Abstract
In this paper, we endeavor to localize all potential objects in an image and infer their visual categories, attributes, and shapes, even in instances where certain objects have not been encompassed in the model's supervised training. This is similar to the challenge posed by open-vocabulary object detection and recognition. The proposed OV-DAR framework, in contrast to previous object detection and recognition frameworks, offers superior advantages and performance in terms of generalization, universality, and granularity expression. Specifically, OV-DAR disentangles the open-vocabulary object detection and recognition problem into two components: class-agnostic object proposal and open-vocabulary classification. It employs co-training to maintain a balance between the performance of these two components. For the former, we construct class-agnostic object proposal networks based on the anchor/query with the SAM foundation model, which demonstrates robust generalization in object proposing and masking. For the latter, we merge available object-centered category classification and attribute prediction data, take co-learning for efficient fine-tuning of CLIP, and subsequently augment the open-vocabulary capability on object-centered category/attribute prediction tasks using freely accessible online image–text pairs. To ensure the efficiency and accuracy of open-vocabulary classification, we devise a structure akin to Faster R-CNN and fully exploit the knowledge of object-centered CLIP for end-to-end multi-object open-vocabulary category and attribute prediction by knowledge distillation. We conduct comprehensive experiments on VAW, MS-COCO, LSA, and OVAD datasets. The results not only illustrate the complementarity of semantic category and attribute recognition for visual scene understanding but also underscore the generalization capability of OV-DAR in localizing, categorizing, attributing, and masking tasks and open-world scene perception. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. 영상분석 기반 임산부 인증 수유실 자동 출입 관리.
- Author
-
조은솔, 강채은, 김서윤, 김연수, and 강지헌
- Abstract
This paper proposes a deep learning-based automatic access management system for nursing rooms using maternity authentication. The system aims to enhance the convenience and safety of pregnant women and caregivers accompanied by infants. It verifies pregnant women through a smartphone application and is designed to automatically open the nursing room door after detecting whether they are accompanied by an infant using object recognition and posture detection technologies. The deep learning-based object recognition and posture detection algorithms demonstrated high accuracy in real-world tests, ensuring safe access for both pregnant women and caregivers with infants. This system has been evaluated to significantly improve convenience and security compared to existing nursing room access management methods. Moreover, positive user feedback has confirmed the system's effectiveness and necessity. This system can be applied to public transportation and public facilities, enhancing accessibility for pregnant women and caregivers, while contributing to the spread of a culture of social consideration. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images.
- Author
-
Steinke, Dirk, Ratnasingham, Sujeevan, Agda, Jireh, Ait Boutou, Hamzah, Box, Isaiah C. H., Boyle, Mary, Chan, Dean, Feng, Corey, Lowe, Scott C., McKeown, Jaclyn T. A., McLeod, Joschka, Sanchez, Alan, Smith, Ian, Walker, Spencer, Wei, Catherine Y.-Y., and Hebert, Paul D. N.
- Abstract
The taxonomic identification of organisms from images is an active research area within the machine learning community. Current algorithms are very effective for object recognition and discrimination, but they require extensive training datasets to generate reliable assignments. This study releases 5.6 million images with representatives from 10 arthropod classes and 26 insect orders. All images were taken using a Keyence VHX-7000 Digital Microscope system with an automatic stage to permit high-resolution (4K) microphotography. Providing phenotypic data for 324,000 species derived from 48 countries, this release represents, by far, the largest dataset of standardized arthropod images. As such, this dataset is well suited for testing the efficacy of machine learning algorithms for identifying specimens into higher taxonomic categories. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. A Comparative Study on Detection and Recognition of Nonuniform License Plates.
- Author
-
Arshid, Mehak, Azam, Muhammad Raees, and Mahmood, Zahid
- Abstract
This paper presents a comparative study on license plate detection and recognition algorithms in unconstrained environments, which include varying illuminations, nonstandard plate templates, and different English language fonts. A prime objective of this study is to assess how well these models handle such challenges. These problems are common in developing countries like Pakistan where diverse license plates, styles, and abrupt changes in illuminations make license plates detection and recognition a challenging task. To analyze the license plate detection problem Faster-RCNN and end-to-end (E2E) methods are implemented. For the license plate recognition task, deep neural network and the CA-CenterNet-based methods are compared. Detailed simulations were performed on authors' own collected dataset of Pakistani license plates, which contains substantially different multi-styled license plates. Our study concludes that, for the task of license plate detection, Faster-RCNN yields a detection accuracy of 98.35%, while the E2E method delivers 98.48% accuracy. Both detection algorithms yielded a mean detection accuracy of 98.41%. For license plate recognition task, the DNN-based method yielded a recognition accuracy of 98.90%, while the CA-CenterNet-based method delivered a high accuracy of 98.96%. In addition, a detailed computational complexity comparison on various image resolutions revealed that E2E and the CA-CenterNet are more efficient than their counterparts during detection and recognition tasks, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. On-Chip Data Reduction and Object Detection for a Feature-Extractable CMOS Image Sensor †.
- Author
-
Morikaku, Yudai, Ujiie, Ryuichi, Morikawa, Daisuke, Shima, Hideki, Yoshida, Kota, and Okura, Shunsuke
- Subjects
RECOGNITION (Psychology) ,IMAGE recognition (Computer vision) ,FEATURE extraction ,DATA reduction ,INTERNET of things ,CMOS image sensors - Abstract
In order to improve image recognition technologies in an IoT environment, we propose a data reduction scheme for a feature-extractable CMOS image sensor and present simulation results for object recognition using feature data. We evaluated the accuracy of the simulated feature data in object recognition based on YOLOX trained with a feature dataset. According to our simulation results, the obtained object recognition accuracy was 56.6 % with the large-scale COCO dataset, even though the amount of data was reduced by 97.7 % compared to conventional RGB color images. When the dataset was replaced with the RAISE RAW image dataset for more accurate simulation, the object recognition accuracy improved to 76.3 % . Furthermore, the feature-extractable CMOS image sensor can switch its operation mode between RGB color image mode and feature data mode. When the trigger for switching from feature data mode to RGB color image mode was set to the detection of a large-sized person, the feature data achieved an accuracy of 93.5 % with the COCO dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. DEEP LEARNING DRIVEN REAL-TIME AIRSPACE MONITORING USING SATELLITE IMAGERY.
- Author
-
SINGH, ANIRUDH, KUMAR, SATYAM, and CHOUDHURY, DEEPJYOTI
- Subjects
AERIAL surveillance ,REMOTE-sensing images ,REMOTE sensing ,DEEP learning ,SUPPORT vector machines - Abstract
Detecting aircraft in remote sensing images poses a formidable challenge due to the diverse characteristics of aircraft, including type, size, pose, and intricate backgrounds. Traditional algorithms encounter difficulties in manually extracting features from numerous candidate regions. This paper introduces an innovative aircraft detection approach that combines corner clustering with a diverse set of Deep Learning (DL) models. The proposed method involves two main stages: region proposal and classification. In the region proposal stage, initial candidate regions are generated using a mean-shift clustering algorithm applied to corners detected on binary images. Subsequently, a comprehensive set of classifiers, encompassing CNN, DenseNet, MobileNetV2, Inception v3, Random Forest (R.F), ResNet50, ResNeXT, Support Vector Machine (SVM), VGG16, Xception, EfficientNet, and InceptionResNetv2, is employed for feature extraction and classification. The presented approach demonstrates superior accuracy and efficiency compared to conventional methods. By leveraging the autonomous learning capabilities of CNN and DL models on extensive datasets, the methodology generates a reduced yet high-quality set of candidate regions. Inspired by the detection methodology employed by image analysts, the approach adopts a coarse-to-fine strategy using CNN and DL models. The first CNN proposes coarse candidate regions, and the second identifies individual airplanes within these regions in finer detail. This framework results in a decreased number of candidate regions compared to existing literature while extracting distinctive deep features. Experimental evaluations on Google Earth images validate the efficiency of the proposed method, underscoring its potential for practical applications in both civilian and military contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. FishFocusNet: An improved method based on YOLOv8 for underwater tropical fish identification.
- Author
-
Lu, Zhaoxuan, Zhu, Xiaolong, Guo, Haitao, Xie, Xingang, Chen, Xiangzi, and Quan, Xiangqian
- Subjects
- *
OBJECT recognition (Computer vision) , *RECOGNITION (Psychology) , *IDENTIFICATION of fishes , *CORAL reefs & islands , *CORALS , *MARINE biodiversity - Abstract
Accurately identifying tropical fish serves as a crucial indicator, offering an insight into the state of marine biodiversity and the condition of coral reef ecosystems. However, the current detection networks are prone to omission and misidentification due to occlusion between fish and the complex underwater environment. This paper proposes a modified approach named FishFocusNet, in which alterable kernel convolution modules, asymptotic feature pyramid network (AFPN), and Shape‐IoU are integrated into YOLOv8. To extract a more comprehensive set of fish features, AKConv modules with arbitrary kernel sizes are proposed to take the place of the conventional fixed‐shaped kernels in the backbone for downsampling. AFPN is adopted as the feature integration structure in the neck, which enhances feature fusion and adaptive spatial fusion between non‐adjacent layers. In the detector head, Shape‐IoU is employed to achieve precise localization of fish targets. The superiorities of these modifications are proved by ablation experiments and comparative experiments. The experimental results show that the optimized approach obtained an mAP of 81.8% accompanied by 2.4 MB parameters and 3.6 GB FLOPS. Meanwhile, compared with more complicated models of similar scale, the proposed method can enhance recognition accuracy to 84.2% and significantly reduce computational costs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Development and Evaluation of an Electronic Travel Aid System for Improving Mobility of Individuals with Visual Impairments: A Trial Study in a University Building.
- Author
-
Kim, In-Ju and Quteineh, Heba H.
- Subjects
- *
INDOOR positioning systems , *AIDS to navigation , *MOBILE apps , *VISION disorders , *COLLEGE buildings - Abstract
This study developed a prototype for an electronic travel aids (ETAs) system named Navigation Assisting Vest (NaVest) using Bluetooth Low Energy (BLE) beacons and a smartphone application to address the challenges faced by blind or visually impaired individuals (BVIP) whilst navigating unfamiliar indoor environments. The NaVest employs ultrasonic sensors for obstacle avoidance and indoor navigation functions. Testing and evaluation of the prototype were conducted with 12 BVIP and blindfolded participants in a local university building in the United Arab Emirates. The developed prototype improved the efficiency and safety of navigation tasks, and participants were overall satisfied with the system. Future research should focus on developing ETA systems that combine several functions to help BVIP travel more independently in indoor or outdoor environments. Additionally, ETA systems with multi-language support should be considered to improve accessibility for BVIP. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Method for Noise Reduction by Averaging the Filtering Results on Circular Displacements Using Wavelet Transform and Local Binary Pattern.
- Author
-
Ciotirnae, Petrica, Dumitrescu, Catalin, Chiva, Ionut Cosmin, Semenescu, Augustin, Popovici, Eduard Cristian, and Dranga, Diana
- Subjects
NOISE control ,ALGORITHMS ,LIGHTING ,WAVELET transforms - Abstract
Algorithms for noise reduction that use the translation invariant wavelet transform indirectly are spatially selective filtering algorithms in the wavelet domain. These algorithms use the undecimated wavelet transform to accurately determine the coefficients corresponding to the contours in the images, these being processed differently from the other wavelet coefficients. The use of the undecimated wavelet transform in image noise reduction applications leads not only to an improvement in terms of Mean Square Error (MSE), but also in terms of the content quality of the processed images. In the case of noise reduction procedures by truncation of wavelet coefficients, artifacts appear, especially in the approximation of singularities, due to some pseudo-Gibbs phenomena. These artifacts, which appear locally, are troublesome in the case of object recognition applications from images acquired in conditions of nonuniform illumination and low contrast. In this work we propose a method of feature extractor based on undecimated wavelet transform (UWT) and local binary pattern (LBP). The results obtained on images acquired from drones in adverse conditions show promising results in terms of accuracy. The authors show that the displacement-invariant wavelet transform is an very good method of compression and noise reduction in signals. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Distinct but related abilities for visual and haptic object recognition.
- Author
-
Chow, Jason K., Palmeri, Thomas J., and Gauthier, Isabel
- Subjects
- *
RECOGNITION (Psychology) , *CONFIRMATORY factor analysis , *OPTICAL information processing , *CEREBRAL cortex , *LATENT variables - Abstract
People vary in their ability to recognize objects visually. Individual differences for matching and recognizing objects visually is supported by a domain-general ability capturing common variance across different tasks (e.g., Richler et al., Psychological Review, 126, 226–251, 2019). Behavioral (e.g., Cooke et al., Neuropsychologia, 45, 484–495, 2007) and neural evidence (e.g., Amedi, Cerebral Cortex, 12, 1202–1212, 2002) suggest overlapping mechanisms in the processing of visual and haptic information in the service of object recognition, but it is unclear whether such group-average results generalize to individual differences. Psychometrically validated measures are required, which have been lacking in the haptic modality. We investigate whether object recognition ability is specific to vision or extends to haptics using psychometric measures we have developed. We use multiple visual and haptic tests with different objects and different formats to measure domain-general visual and haptic abilities and to test for relations across them. We measured object recognition abilities using two visual tests and four haptic tests (two each for two kinds of haptic exploration) in 97 participants. Partial correlation and confirmatory factor analyses converge to support the existence of a domain-general haptic object recognition ability that is moderately correlated with domain-general visual object recognition ability. Visual and haptic abilities share about 25% of their variance, supporting the existence of a multisensory domain-general ability while leaving a substantial amount of residual variance for modality-specific abilities. These results extend our understanding of the structure of object recognition abilities; while there are mechanisms that may generalize across categories, tasks, and modalities, there are still other mechanisms that are distinct between modalities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Using 3D Hand Pose Data in Recognizing Human–Object Interaction and User Identification for Extended Reality Systems.
- Author
-
Hamid, Danish, Ul Haq, Muhammad Ehatisham, Yasin, Amanullah, Murtaza, Fiza, and Azam, Muhammad Awais
- Subjects
- *
RECOGNITION (Psychology) , *JOINTS (Anatomy) , *FEATURE extraction , *DEEP learning , *VIRTUAL reality , *POSE estimation (Computer vision) - Abstract
Object detection and action/gesture recognition have become imperative in security and surveillance fields, finding extensive applications in everyday life. Advancement in such technologies will help in furthering cybersecurity and extended reality systems through the accurate identification of users and their interactions, which plays a pivotal role in the security management of an entity and providing an immersive experience. Essentially, it enables the identification of human–object interaction to track actions and behaviors along with user identification. Yet, it is performed by traditional camera-based methods with high difficulties and challenges since occlusion, different camera viewpoints, and background noise lead to significant appearance variation. Deep learning techniques also demand large and labeled datasets and a large amount of computational power. In this paper, a novel approach to the recognition of human–object interactions and the identification of interacting users is proposed, based on three-dimensional hand pose data from an egocentric camera view. A multistage approach that integrates object detection with interaction recognition and user identification using the data from hand joints and vertices is proposed. Our approach uses a statistical attribute-based model for feature extraction and representation. The proposed technique is tested on the HOI4D dataset using the XGBoost classifier, achieving an average F1-score of 81% for human–object interaction and an average F1-score of 80% for user identification, hence proving to be effective. This technique is mostly targeted for extended reality systems, as proper interaction recognition and users identification are the keys to keeping systems secure and personalized. Its relevance extends into cybersecurity, augmented reality, virtual reality, and human–robot interactions, offering a potent solution for security enhancement along with enhancing interactivity in such systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Semantic object processing is modulated by prior scene context.
- Author
-
Krugliak, Alexandra, Draschkow, Dejan, Võ, Melissa L.-H., and Clarke, Alex
- Subjects
- *
BRAIN physiology , *COMPUTER simulation , *PROMPTS (Psychology) , *RESEARCH funding , *ELECTROENCEPHALOGRAPHY , *SEMANTICS , *VISUAL perception - Abstract
Objects that are congruent with a scene are recognised more efficiently than objects that are incongruent. Further, semantic integration of incongruent objects elicits a stronger N300/N400 EEG component. Yet, the time course and mechanisms of how contextual information supports access to semantic object information is unclear. We used computational modelling and EEG to test how context influences semantic object processing. Using representational similarity analysis, we established that EEG patterns dissociated between objects in congruent or incongruent scenes from around 300 ms. By modelling the semantic processing of objects using independently normed properties, we confirm that the onset of semantic processing of both congruent and incongruent objects is similar (∼150 ms). Critically, after ∼275 ms, we discover a difference in the duration of semantic integration, lasting longer for incongruent compared to congruent objects. These results constrain our understanding of how contextual information supports access to semantic object information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Timed picture naming norms for 800 photographs of 200 objects in English.
- Author
-
van Hoef, Rens, Lynott, Dermot, and Connell, Louise
- Subjects
- *
WORD frequency , *RECOGNITION (Psychology) , *STIMULUS & response (Psychology) , *PHOTOGRAPHS , *PICTURES - Abstract
The present study presents picture-naming norms for a large set of 800 high-quality photographs of 200 natural objects and artefacts spanning a range of categories, with four unique images per object. Participants were asked to provide a single, most appropriate name for each image seen. We report recognition latencies for each image, and several normed variables for the provided names: agreement, H-statistic (i.e. level of naming uncertainty), Zipf word frequency and word length. Rather than simply focusing on a single name per image (i.e. the modal or most common name), analysis of recognition latencies showed that it is important to consider the diversity of labels that participants may ascribe to each pictured object. The norms therefore provide a list of candidate labels per image with weighted measures of word length and frequency per image that incorporate all provided names, as well as modal measures based on the most common name only. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Measuring object recognition ability: Reliability, validity, and the aggregate z-score approach.
- Author
-
Smithson, Conor J. R., Chow, Jason K., Chang, Ting-Yun, and Gauthier, Isabel
- Subjects
- *
RECOGNITION (Psychology) , *STRUCTURAL equation modeling , *LATENT variables , *EXPERIMENTAL psychology , *INDIVIDUAL differences - Abstract
Measurement of domain-general object recognition ability (o) requires minimization of domain-specific variance. One approach is to model o as a latent variable explaining performance on a battery of tests which differ in task demands and stimuli; however, time and sample requirements may be prohibitive. Alternatively, an aggregate measure of o can be obtained by averaging z-scores across tests. Using data from Sunday et al., Journal of Experimental Psychology: General, 151, 676–694, (2022), we demonstrated that aggregate scores from just two such object recognition tests provide a good approximation (r =.79) of factor scores calculated from a model using a much larger set of tests. Some test combinations produced correlations of up to r =.87 with factor scores. We then revised these tests to reduce testing time, and developed an odd one out task, using a unique object category on nearly every trial, to increase task and stimuli diversity. To validate our measures, 163 participants completed the object recognition tests on two occasions, one month apart. Providing the first evidence that o is stable over time, our short aggregate o measure demonstrated good test–retest reliability (r =.77). The stability of o could not be completely accounted for by intelligence, perceptual speed, and early visual ability. Structural equation modeling suggested that our tests load significantly onto the same latent variable, and revealed that as a latent variable, o is highly stable (r =.93). Aggregation is an efficient method for estimating o, allowing investigation of individual differences in object recognition ability to be more accessible in future studies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Saliency-guided meta-hallucinator for few-shot learning.
- Author
-
Zhang, Hongguang, Liu, Chun, Wang, Jiandong, Ma, Linru, Koniusz, Piotr, Torr, Philip H. S., and Yang, Lin
- Abstract
Learning novel object concepts from limited samples remains a considerable challenge in deep learning. The main directions for improving the few-shot learning models include (i) designing a stronger backbone, (ii) designing a powerful (dynamic) meta-classifier, and (iii) using a larger pre-training set obtained by generating or hallucinating additional samples from the small scale dataset. In this paper, we focus on item (iii) and present a novel meta-hallucination strategy. Presently, most image generators are based on a generative network (i.e., GAN) that generates new samples from the captured distribution of images. However, such networks require numerous annotated samples for training. In contrast, we propose a novel saliency-based end-to-end meta-hallucinator, where a saliency detector produces foregrounds and backgrounds of support images. Such images are fed into a two-stream network to hallucinate feature samples directly in the feature space by mixing foreground and background feature samples. Then, we propose several novel mixing strategies that improve the quality and diversity of hallucinated feature samples. Moreover, as not all saliency maps are meaningful or high quality, we further introduce a meta-hallucination controller that decides which foreground feature samples should participate in mixing with backgrounds. To our knowledge, we are the first to leverage saliency detection for few-shot learning. Our proposed network achieves state-of-the-art results on publicly available few-shot image classification and anomaly detection benchmarks, and outperforms competing sample mixing strategies such as the so-called Manifold Mixup. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Sygnały i systemy wizyjne w modelu automatyzacji pojazdów.
- Author
-
BALCEREK, Julian, DĄBROWSKI, Adam, and PAWŁOWSKI, Paweł
- Subjects
IMAGE processing ,OBJECT recognition (Computer vision) ,AUTOMATION - Abstract
Copyright of Przegląd Elektrotechniczny is the property of Przeglad Elektrotechniczny and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
32. Task offloading scheme in Mobile Augmented Reality using hybrid Monte Carlo tree search (HMCTS)
- Author
-
Anitha Jebamani Soundararaj and Godfrey Winster Sathianesan
- Subjects
Task offloading ,Mobile augmented reality ,Edge computing ,Monte Carlo Search ,Genetic algorithm ,Object recognition ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Mobile Augmented Reality (MAR) applications enhance user experiences by providing realistic information about the current location through mobile devices. However, MAR applications are computationally intensive, leading to high energy consumption and latency issues. To address these challenges, this research presents a Hybrid Monte Carlo Tree Search (HMCTS) based task offloading scheme, combining a genetic algorithm with Monte Carlo tree search for efficient task management. The proposed method uses YoloV7 for object recognition and aims to reduce energy consumption, response time, and migration time. Experimental results demonstrate that the HMCTS approach significantly reduces energy consumption to 1290 kJ, response time to 24 ms, and migration time to 0.52 ms, outperforming existing techniques. These improvements highlight the potential of the HMCTS method for enhancing the performance of MAR applications. Proposed hybrid approach aims to improve the efficiency and effectiveness of task offloading in MAR applications. The HMCTS model dynamically offloads tasks to edge servers, optimizing scheduling time, response time, and energy consumption.
- Published
- 2024
- Full Text
- View/download PDF
33. Swin‐fisheye: Object detection for fisheye images
- Author
-
Dawei Zhang, Tingting Yang, and Bokai Zhao
- Subjects
computer vision ,object detection ,object recognition ,Photography ,TR1-1050 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Fisheye cameras have been widely used in autonomous navigation, visual surveillance, and automatic driving. Due to severe geometric distortion, fisheye images cannot be processed effectively by conventional methods. The existing object detection algorithms cannot better detect the small targets or the objects with large distortion in the fisheye images. The size and scene of available fisheye datasets (such as WoodScape and VOC‐360) cannot satisfy the training of robust network models. Herein, the authors propose Swin‐Fisheye, an end‐to‐end object detection algorithm based on Swin Transformer. A feature pyramid module based on deformable convolution (DFPM) is designed to obtain richer contextual information from the multi‐scale feature maps. In addition, a projection transformation algorithm (PTA) is proposed, which can convert rectilinear images into fisheye images more accurately, and then create a fisheye image dataset (COCO‐Fish). The results of extensive experiments conducted on VOC‐360, WoodScape, and COCO‐Fish demonstrate that the proposed algorithm can achieve satisfactory results compared with state‐of‐the‐art methods.
- Published
- 2024
- Full Text
- View/download PDF
34. FishFocusNet: An improved method based on YOLOv8 for underwater tropical fish identification
- Author
-
Zhaoxuan Lu, Xiaolong Zhu, Haitao Guo, Xingang Xie, Xiangzi Chen, and Xiangqian Quan
- Subjects
convolutional neural nets ,object detection ,object recognition ,Photography ,TR1-1050 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Accurately identifying tropical fish serves as a crucial indicator, offering an insight into the state of marine biodiversity and the condition of coral reef ecosystems. However, the current detection networks are prone to omission and misidentification due to occlusion between fish and the complex underwater environment. This paper proposes a modified approach named FishFocusNet, in which alterable kernel convolution modules, asymptotic feature pyramid network (AFPN), and Shape‐IoU are integrated into YOLOv8. To extract a more comprehensive set of fish features, AKConv modules with arbitrary kernel sizes are proposed to take the place of the conventional fixed‐shaped kernels in the backbone for downsampling. AFPN is adopted as the feature integration structure in the neck, which enhances feature fusion and adaptive spatial fusion between non‐adjacent layers. In the detector head, Shape‐IoU is employed to achieve precise localization of fish targets. The superiorities of these modifications are proved by ablation experiments and comparative experiments. The experimental results show that the optimized approach obtained an mAP of 81.8% accompanied by 2.4 MB parameters and 3.6 GB FLOPS. Meanwhile, compared with more complicated models of similar scale, the proposed method can enhance recognition accuracy to 84.2% and significantly reduce computational costs.
- Published
- 2024
- Full Text
- View/download PDF
35. Development of a deep learning-based object recognition system for pre-stage separation to improve the recycling rate of major general-purpose plastics.
- Author
-
Lee, Hansol, Park, Youngjae, Kim, Kwanho, and Lee, Hoon
- Abstract
The generation of plastic waste has been increasing annually, necessitating various recycling methods. Among these, material recycling with low carbon emissions should be prioritized. This study aims to enhance the material recycling rate by developing a separation system using deep learning-based object recognition. To improve labeling efficiency, image data were acquired for each material type, and the performance of different labeling methods was evaluated. The average precision (AP) values for the single-material learning model using hybrid labeling (manual + automatic) were 0.947 for PET, 0.951 for PP, 0.892 for PE, and 0.896 for PS, demonstrating superior performance compared to other methods. An integrated learning model was also developed for composite materials, achieving AP values of 0.972 for PET, 0.976 for PP, 0.963 for PE, and 0.961 for PS. These results demonstrate the model's strong applicability for plastic waste recognition. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
36. Eigenspectrum regularisation reverse neighbourhood discriminative learning
- Author
-
Ming Xie, Hengliang Tan, Jiao Du, Shuo Yang, Guofeng Yan, Wangwang Li, and Jianwei Feng
- Subjects
eigenvalues and eigenfunctions ,face recognition ,feature extraction ,image classification ,image recognition ,object recognition ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Linear discriminant analysis is a classical method for solving problems of dimensional reduction and pattern classification. Although it has been extensively developed, however, it still suffers from various common problems, such as the Small Sample Size (SSS) and the multimodal problem. Neighbourhood linear discriminant analysis (nLDA) was recently proposed to solve the problem of multimodal class caused by the contravention of independently and identically distributed samples. However, due to the existence of many small‐scale practical applications, nLDA still has to face the SSS problem, which leads to instability and poor generalisation caused by the singularity of the within‐neighbourhood scatter matrix. The authors exploit the eigenspectrum regularisation techniques to circumvent the singularity of the within‐neighbourhood scatter matrix of nLDA, which is called Eigenspectrum Regularisation Reverse Neighbourhood Discriminative Learning (ERRNDL). The algorithm of nLDA is reformulated as a framework by searching two projection matrices. Three eigenspectrum regularisation models are introduced to our framework to evaluate the performance. Experiments are conducted on the University of California, Irvine machine learning repository and six image classification datasets. The proposed ERRNDL‐based methods achieve considerable performance.
- Published
- 2024
- Full Text
- View/download PDF
37. Neural network-assisted meta-router for fiber mode and polarization demultiplexing
- Author
-
Zhao Yu, Wang Huijiao, Huang Tian, Guan Zhiqiang, Li Zile, Yu Lei, Yu Shaohua, and Zheng Guoxing
- Subjects
metasurface ,deep learning ,object recognition ,space-division multiplexing ,Physics ,QC1-999 - Abstract
Advancements in computer science have propelled society into an era of data explosion, marked by a critical need for enhanced data transmission capacity, particularly in the realm of space-division multiplexing and demultiplexing devices for fiber communications. However, recently developed mode demultiplexers primarily focus on mode divisions within one dimension rather than multiple dimensions (i.e., intensity distributions and polarization states), which significantly limits their applicability in space-division multiplexing communications. In this context, we introduce a neural network-assisted meta-router to recognize intensity distributions and polarization states of optical fiber modes, achieved through a single layer of metasurface optimized via neural network techniques. Specifically, a four-mode meta-router is theoretically designed and experimentally characterized, which enables four modes, comprising two spatial modes with two polarization states, independently divided into distinct spatial regions, and successfully recognized by positions of corresponding spatial regions. Our framework provides a paradigm for fiber mode demultiplexing apparatus characterized by application compatibility, transmission capacity, and function scalability with ultra-simple design and ultra-compact device. Merging metasurfaces, neural network and mode routing, this proposed framework paves a practical pathway towards intelligent metasurface-aided optical interconnection, including applications such as fiber communication, object recognition and classification, as well as information display, processing, and encryption.
- Published
- 2024
- Full Text
- View/download PDF
38. Bioinspired Passive Tactile Sensors Enabled by Reversible Polarization of Conjugated Polymers
- Author
-
Feng He, Sitong Chen, Ruili Zhou, Hanyu Diao, Yangyang Han, and Xiaodong Wu
- Subjects
Passive tactile sensors ,Reversible polarization of conjugated polymers ,Tactile perception ,Machine learning algorithm ,Object recognition ,Technology - Abstract
Highlights Fully organic and passive tactile sensors are developed via mimicking the sensing behavior of natural sensory cells. Controllable polarizability of conjugated polymers is adopted for the first time to construct passive tactile sensors. Machine learning-assisted surface texture detection, material property recognition, as well as shape/profile perception are realized with the tactile sensors.
- Published
- 2024
- Full Text
- View/download PDF
39. Flexible thin parts multi‐target positioning method of multi‐level feature fusion
- Author
-
Yaohua Deng, Xiali Liu, Kenan Yang, and Zehang Li
- Subjects
Gaussian processes ,image fusion ,image recognition ,object recognition ,Photography ,TR1-1050 ,Computer software ,QA76.75-76.765 - Abstract
Abstract In new energy battery manufacturing, machine vision is widely used in automated assembly scenarios for key parts. To improve the accuracy and real‐time multi‐target positioning recognition of flexible thin parts, this paper proposes a multi‐level feature fusion template matching algorithm based on the Gaussian pyramid. Firstly, the algorithm constructs a Gaussian pyramid by multi‐scale image construction. Secondly, considering the image features of each layer of the pyramid, this paper uses the grey‐based Fast Normalized Matching algorithm to obtain coarse positioning coordinates on the upper layer, and the improved Linemod‐2D algorithm is applied to the bottom layer image to get accurate positioning coordinates. Finally, the positioning coordinates returned from each layer are fused to obtain the final positioning coordinate. The experimental results show that the proposed algorithm achieves excellent performance in nickel plate positioning and recognition. It exhibits satisfactory performance in nickel sheet localization and recognition. In terms of angular error, repeat accuracy, and matching speed, it competes favourably with Halcon, VisionMaster, and SCISmart. Its positioning error closely approximates that of Halcon, effectively meeting the practical production demands for high‐speed feeding and high‐precision positioning.
- Published
- 2024
- Full Text
- View/download PDF
40. Adaptive soft threshold transformer for radar high‐resolution range profile target recognition
- Author
-
Siyu Chen, Xiaohong Huang, and Weibo Xu
- Subjects
artificial intelligence ,neural nets ,noise ,object recognition ,radar ,radar signal processing ,Telecommunication ,TK5101-6720 - Abstract
Abstract Radar High‐Resolution Range Profile (HRRP) has great potential for target recognition because it can provide target structural information. Existing work commonly applies deep learning to extract deep features from HRRPs and achieve impressive recognition performance. However, most approaches are unable to distinguish between the target and non‐target regions in the feature extraction process and do not fully consider the impact of background noise, which is harmful to recognition, especially at low signal‐to‐noise ratios (SNR). To tackle these problems, the authors propose a radar HRRP target recognition framework termed Adaptive Soft Threshold Transformer (ASTT), which is composed of a patch embedding (PE) layer, ASTT blocks, and Discrete Wavelet Patch Merging (DWPM) layers. Given the limited semantic information of individual range cells, the PE layer integrates nearby isolated range cells into semantically explicit target structure patches. Thanks to its convolutional layer and attention mechanism, the ASTT blocks assign a weight to each patch to locate the target areas in the HRRP while capturing local features and constructing sequence correlations. Moreover, the ASTT block efficiently filters noise features in combination with a soft threshold function to further enhance the recognition performance at low SNR, where the threshold is adaptively determined. Utilising the reversibility of the discrete wavelet transform, the DWPM layer efficiently eliminates the loss of valuable information during the pooling process. Experiments based on simulated and measured datasets show that the proposed method has excellent target recognition performance, noise robustness, and small‐scale range shift robustness.
- Published
- 2024
- Full Text
- View/download PDF
41. Gait‐based human recognition based on millimetre wave multiple input multiple output radar point cloud constructed using velocity‐depth‐time
- Author
-
Xianxian He, Yunhua Zhang, and Xiao Dong
- Subjects
gait analysis ,millimetre wave radar ,MIMO radar ,object recognition ,radar target recognition ,Telecommunication ,TK5101-6720 - Abstract
Abstract Gait recognition is to recognise different individuals based on their faint differences of gait characteristics, which is different from and more challengeable than the recognition of human activities based on relatively bigger differences between different motions. Existing millimetre‐wave Multiple Input Multiple Output radar point cloud data contains time‐varying three‐dimensional spatial positions, velocity, and intensity information. How to enhance the accuracy of gait recognition by effectively utilising the available radar point cloud data has become an attractive research topic in recent years. A velocity‐depth‐time (VDT) based point cloud construction method for millimetre‐wave Multiple Input Multiple Output radar is proposed for gait recognition application, which can not only alleviate the sparsity problem of mmWave point cloud but also make the constructed point cloud to exhibit temporal structural features of micro‐motions, and therefore enable the successful application of PointNet++ to mmWave‐MIMO point cloud gait recognition. New point clouds are constructed by the proposed method using public gait recognition datasets of 10 and 20 individuals from mmWave‐MIMO radar, which are used to conduct gait recognition experiments using PointNet++. The results show that the point clouds constructed based on VDT are more conducive to the gait recognition task. Even using the classic PointNet++ model, which is not specially designed for radar point clouds, high recognition accuracy can be achieved for gait recognition tasks. The recognition accuracies are improved by 11% and 12% in this work for datasets of 10 and 20 individuals, respectively, compared with the 84% and 80% achieved by the traditional method using the same dataset and the same PointNet++ model, while the accuracies are improved by 5% and 12%, respectively, compared with the 90% and 80% achieved by the original dataset thesis method corresponding to 10‐individual and 20‐individual datasets.
- Published
- 2024
- Full Text
- View/download PDF
42. OmDet: Large‐scale vision‐language multi‐dataset pre‐training with multimodal detection network
- Author
-
Tiancheng Zhao, Peng Liu, and Kyusong Lee
- Subjects
computer vision ,object detection ,object recognition ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract The advancement of object detection (OD) in open‐vocabulary and open‐world scenarios is a critical challenge in computer vision. OmDet, a novel language‐aware object detection architecture and an innovative training mechanism that harnesses continual learning and multi‐dataset vision‐language pre‐training is introduced. Leveraging natural language as a universal knowledge representation, OmDet accumulates “visual vocabularies” from diverse datasets, unifying the task as a language‐conditioned detection framework. The multimodal detection network (MDN) overcomes the challenges of multi‐dataset joint training and generalizes to numerous training datasets without manual label taxonomy merging. The authors demonstrate superior performance of OmDet over strong baselines in object detection in the wild, open‐vocabulary detection, and phrase grounding, achieving state‐of‐the‐art results. Ablation studies reveal the impact of scaling the pre‐training visual vocabulary, indicating a promising direction for further expansion to larger datasets. The effectiveness of our deep fusion approach is underscored by its ability to learn jointly from multiple datasets, enhancing performance through knowledge sharing.
- Published
- 2024
- Full Text
- View/download PDF
43. Research status and prospects of UWB radar life information recognition for mine rescue
- Author
-
ZHENG Xuezhao, MA Yang, HUANG Yuan, CAI Guobin, and DING Wen
- Subjects
mine rescue ,life information recognition ,uwb radar ,clutter filtering ,life information extraction ,object recognition ,life quantification ,Mining engineering. Metallurgy ,TN1-997 - Abstract
Ultra-wide band (UWB) radar can penetrate non-magnetic media such as coal and rock to detect life information of personnel after collapse. Due to the complex mining environment, UWB radar loaded with vital sign signals is prone to interference from environmental noise and clutter signals. It is difficult to recognize human subject information. This paper introduces the principle of UWB radar life detection system and its application in mine rescue. This paper summarizes the current research status of UWB radar life information recognition from three aspects: UWB radar life information extraction, dynamic and static human object recognition, and life quantification. This paper points out the current issues with the application of UWB radar life detection technology in the field of mine rescue. ① There is limited research on filtering methods for non-stationary signals and environmental noise in underground collapse environments. ② The extraction and representation methods for posture, behavior, life status, and other information of moving (or micro moving) objects need to be improved. The human life information recognition model is not yet perfect and the feature correlation between models is low. ③ There is a lack of solutions to the 'overlapping' problem caused by multiple objects. This paper proposes the prospects for the research direction of UWB radar life information recognition for mine rescue. ① It is suggested to continuously optimize noise and clutter adaptive filtering methods for multiple types of mine disaster environments. ② It is suggested to construct a human life information recognition model suitable for the field of mine rescue. ③ It is suggested to further improve the quantification capability of multi-object after mine shelter. ④ It is suggested to conduct depth exploration of the method for determining the optimal detection frequency band for UWB radar.
- Published
- 2024
- Full Text
- View/download PDF
44. The Number of Pictograms About Side Effects on the Medication Package Influences Medication Risk Perception
- Author
-
Lea Laasner Vogt, Ester Reijnen, Swen J. Kühne, and Marc Sulser
- Subjects
pharmaceutical pictograms ,risk perception ,side effects ,numerate literacy ,object recognition ,Psychology ,BF1-990 - Abstract
Abstract: Introduction and aim: Pictograms can make taking medication safer. However, little is known about how pictograms on a medication package influence the subjective assessment of a medication. Methods: In this online study, 276 participants were presented with a fictitious package that contained 0 to 5 pictograms of possible side effects. Participants had to assess the probability of side effects occurring as well as the benefits and harms of the medication, both before and after consulting the package insert. Results: The number of pictograms (leveling out at 2 pictograms) influenced the assessment of the probability of side effects occurring. In addition, the assessment of this measure served as an anchor for assessing all subsequent measures (e.g., benefit). Although participants adjusted their measures after package insert consultation - these adjustments were insufficient (as expected from a normative probability account). Discussion and conclusion: Pictograms influence medication assessment, and humans can process only a limited number of pictograms.
- Published
- 2024
- Full Text
- View/download PDF
45. Bioinspired Passive Tactile Sensors Enabled by Reversible Polarization of Conjugated Polymers.
- Author
-
He, Feng, Chen, Sitong, Zhou, Ruili, Diao, Hanyu, Han, Yangyang, and Wu, Xiaodong
- Subjects
- *
TACTILE sensors , *MACHINE learning , *OBJECT recognition algorithms , *MATERIALS texture , *SURFACE texture , *CONJUGATED polymers - Abstract
Highlights: Fully organic and passive tactile sensors are developed via mimicking the sensing behavior of natural sensory cells. Controllable polarizability of conjugated polymers is adopted for the first time to construct passive tactile sensors. Machine learning-assisted surface texture detection, material property recognition, as well as shape/profile perception are realized with the tactile sensors. Tactile perception plays a vital role for the human body and is also highly desired for smart prosthesis and advanced robots. Compared to active sensing devices, passive piezoelectric and triboelectric tactile sensors consume less power, but lack the capability to resolve static stimuli. Here, we address this issue by utilizing the unique polarization chemistry of conjugated polymers for the first time and propose a new type of bioinspired, passive, and bio-friendly tactile sensors for resolving both static and dynamic stimuli. Specifically, to emulate the polarization process of natural sensory cells, conjugated polymers (including poly(3,4-ethylenedioxythiophene):poly(styrenesulfonate), polyaniline, or polypyrrole) are controllably polarized into two opposite states to create artificial potential differences. The controllable and reversible polarization process of the conjugated polymers is fully in situ characterized. Then, a micro-structured ionic electrolyte is employed to imitate the natural ion channels and to encode external touch stimulations into the variation in potential difference outputs. Compared with the currently existing tactile sensing devices, the developed tactile sensors feature distinct characteristics including fully organic composition, high sensitivity (up to 773 mV N−1), ultralow power consumption (nW), as well as superior bio-friendliness. As demonstrations, both single point tactile perception (surface texture perception and material property perception) and two-dimensional tactile recognitions (shape or profile perception) with high accuracy are successfully realized using self-defined machine learning algorithms. This tactile sensing concept innovation based on the polarization chemistry of conjugated polymers opens up a new path to create robotic tactile sensors and prosthetic electronic skins. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Skin‐Inspired Bimodal Receptors for Object Recognition and Temperature Sensing Simulation.
- Author
-
Chen, Jianfeng, Liu, Andeng, Shi, Yating, Luo, Yingjin, Li, Jianing, Ye, Meidan, and Guo, Wenxi
- Subjects
- *
OBJECT recognition (Computer vision) , *RECOGNITION (Psychology) , *MACHINE learning , *THERMAL conductivity , *REMOTE sensing , *SKIN temperature - Abstract
Skin contacts with objects with different thermal conductivity and tactile perception will produce different temperature and tactile sensations. Here, an innovative creation is presented known as the BB‐Skin, a highly realistic bionic bimodal electronic skin, meticulously designed to mirror the thermal sensitivity and tactile perception found in human skin. This technology allows for precise object recognition and offers remote temperature sensing feedback. The BB‐Skin comprises temperature sensing, heating, and tribo‐electrode modules. Through the utilization of machine learning algorithms that measure the thermal conductivity and electronegativity of materials, a bimodal bionic robot object recognition system is developed, achieving an impressive accuracy rate exceeding 98.11%. The bimodal nature of the system, based on different types of electrical signals that operate independently, significantly enhances the reliability of the device. Moreover, harnessing the inherent capabilities of the BB‐Skin, a novel remote temperature sensing and feedback system is successfully implemented. This system adeptly replicates the temperature perception when remotely touching objects and provides users with feedback through gloves embedded with heating and cooling modules. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. A question of perspective: Perspective as a feature in stimulus-response binding.
- Author
-
Münster, Nicolas D., Schmalbrock, Philip, and Frings, Christian
- Subjects
- *
RECOGNITION (Psychology) , *CONTROL (Psychology) , *STIMULUS & response (Psychology) , *COLOR - Abstract
In action control research, stimulus-response binding and retrieval processes are assumed core mechanisms. Stimulus and response features are integrated into event files when they occur together. This event file is retrieved if any feature is repeated. Partial mismatches between current and retrieved episodes cause conflicts, leading to performance costs, while full matches improve performance (the difference between these conditions is known as binding effects). Such binding effects have been found for different stimulus features like colour, shape, or location, using stimulus material that is often kept very simple. In our study, we manipulated the perspective on a real-world three-dimensional distractor stimulus. We found that perspective was bound to the response, independent of stimulus identity. It seems that perspective is treated as a stimulus feature in action control. This result enhances the way we discuss what a feature actually is and how features are bound in event files. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. Flexible thin parts multi‐target positioning method of multi‐level feature fusion.
- Author
-
Deng, Yaohua, Liu, Xiali, Yang, Kenan, and Li, Zehang
- Subjects
- *
IMAGE fusion , *COMPUTER vision , *IMAGE recognition (Computer vision) , *NICKEL-plating , *GAUSSIAN processes - Abstract
In new energy battery manufacturing, machine vision is widely used in automated assembly scenarios for key parts. To improve the accuracy and real‐time multi‐target positioning recognition of flexible thin parts, this paper proposes a multi‐level feature fusion template matching algorithm based on the Gaussian pyramid. Firstly, the algorithm constructs a Gaussian pyramid by multi‐scale image construction. Secondly, considering the image features of each layer of the pyramid, this paper uses the grey‐based Fast Normalized Matching algorithm to obtain coarse positioning coordinates on the upper layer, and the improved Linemod‐2D algorithm is applied to the bottom layer image to get accurate positioning coordinates. Finally, the positioning coordinates returned from each layer are fused to obtain the final positioning coordinate. The experimental results show that the proposed algorithm achieves excellent performance in nickel plate positioning and recognition. It exhibits satisfactory performance in nickel sheet localization and recognition. In terms of angular error, repeat accuracy, and matching speed, it competes favourably with Halcon, VisionMaster, and SCISmart. Its positioning error closely approximates that of Halcon, effectively meeting the practical production demands for high‐speed feeding and high‐precision positioning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Object/Scene Recognition Based on a Directional Pixel Voting Descriptor.
- Author
-
Aguilar-González, Abiel, Medina Santiago, Alejandro, and Osuna-Coutiño, J. A. de Jesús
- Subjects
ARTIFICIAL intelligence ,FEATURE extraction ,CONVOLUTIONAL neural networks ,IMAGE processing ,VOTING - Abstract
Detecting objects in images is crucial for several applications, including surveillance, autonomous navigation, augmented reality, and so on. Although AI-based approaches such as Convolutional Neural Networks (CNNs) have proven highly effective in object detection, in scenarios where the objects being recognized are unknow, it is difficult to generalize an AI model for such tasks. In another trend, feature-based approaches like SIFT, SURF, and ORB offer the capability to search any object but have limitations under complex visual variations. In this work, we introduce a novel edge-based object/scene recognition method. We propose that utilizing feature edges, instead of feature points, offers high performance under complex visual variations. Our primary contribution is a directional pixel voting descriptor based on image segments. Experimental results are promising; compared to previous approaches, ours demonstrates superior performance under complex visual variations and high processing speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Position-Based ANN Impedance Control for A Three-Fingered Robot Hand.
- Author
-
Shauri, R. L. A., Sabri, M. A. A. M., and Roslan, A. B.
- Subjects
ARTIFICIAL neural networks ,OBJECT recognition (Computer vision) ,IMPEDANCE control ,ROBOT hands ,PLASTIC bottles - Abstract
A feedforward ANN was previously developed for recognition of two objects i.e. a spongy ball and a plastic bottle but was verified through simulation only. In this work, the feasibility of the ANN model is tested by applying it to the robot's impedance control which takes the exerted force at the finger as input while resulting in an output for the selection rule of the impedance stiffness parameter, named K
d . From the results, the different object textures can be distinguished by the ANN where the absolute peak values of measured rate force during contact with the ball reached 0.15 N and a slightly higher value of 0.32 N for the bottle. Kd values were found to switch between 1000 and 250 based on the ANN outputs for the ball and bottle, respectively, thus affecting the dynamics of the fingertip through modified position reference of the fingertip. However, it is also observed that the object was incorrectly classified in some moments when the exerted force was not sufficient due to the weak grasp of the object. This shows that the nonlinear factor from the hardware defects needs to be considered when refining the Kd selection rule in future studies. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.