8,308 results on '"ANNOTATIONS"'
Search Results
2. Unsupervised Representation Learning for Automated Segmentation of Brain Tumors on MRI Scans
- Author
-
Faujdar, Pramod Kumar, Singh, Shalakha, Nachappa, M. N., Agarwal, Ankita, Angrisani, Leopoldo, Series Editor, Arteaga, Marco, Series Editor, Chakraborty, Samarjit, Series Editor, Chen, Shanben, Series Editor, Chen, Tan Kay, Series Editor, Dillmann, Rüdiger, Series Editor, Duan, Haibin, Series Editor, Ferrari, Gianluigi, Series Editor, Ferre, Manuel, Series Editor, Jabbari, Faryar, Series Editor, Jia, Limin, Series Editor, Kacprzyk, Janusz, Series Editor, Khamis, Alaa, Series Editor, Kroeger, Torsten, Series Editor, Li, Yong, Series Editor, Liang, Qilian, Series Editor, Martín, Ferran, Series Editor, Ming, Tan Cher, Series Editor, Minker, Wolfgang, Series Editor, Misra, Pradeep, Series Editor, Mukhopadhyay, Subhas, Series Editor, Ning, Cun-Zheng, Series Editor, Nishida, Toyoaki, Series Editor, Oneto, Luca, Series Editor, Panigrahi, Bijaya Ketan, Series Editor, Pascucci, Federica, Series Editor, Qin, Yong, Series Editor, Seng, Gan Woon, Series Editor, Speidel, Joachim, Series Editor, Veiga, Germano, Series Editor, Wu, Haitao, Series Editor, Zamboni, Walter, Series Editor, Tan, Kay Chen, Series Editor, Kumar, Amit, editor, Gunjan, Vinit Kumar, editor, Senatore, Sabrina, editor, and Hu, Yu-Chen, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Pseudo label refining for semi-supervised temporal action localization.
- Author
-
Meng, Lingwen, Ban, Guobang, Xi, Guanghui, and Guo, Siqi
- Subjects
- *
TEMPORAL lobe , *LEARNING strategies , *DETECTORS , *ANNOTATIONS , *TEACHERS , *LOCALIZATION (Mathematics) - Abstract
The training of temporal action localization models relies heavily on a large amount of manually annotated data. Video annotation is more tedious and time-consuming compared with image annotation. Therefore, the semi-supervised method that combines labeled and unlabeled data for joint training has attracted increasing attention from academics and industry. This study proposes a method called pseudo-label refining (PLR) based on the teacher-student framework, which consists of three key components. First, we propose pseudo-label self-refinement which features in a temporal region interesting pooling to improve the boundary accuracy of TAL pseudo label. Second, we design a module named boundary synthesis to further refined temporal interval in pseudo label with multiple inference. Finally, an adaptive weight learning strategy is tailored for progressively learning pseudo labels with different qualities. The method proposed in this study uses ActionFormer and BMN as the detector and achieves significant improvement on the THUMOS14 and ActivityNet v1.3 datasets. The experimental results show that the proposed method significantly improve the localization accuracy compared to other advanced SSTAL methods at a label rate of 10% to 60%. Further ablation experiments show the effectiveness of each module, proving that the PLR method can improve the accuracy of pseudo-labels obtained by teacher model reasoning. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
4. Self-Supervised Image Segmentation Using Meta-Learning and Multi-Backbone Feature Fusion.
- Author
-
Ajmal, Muhammad Shahroz, Geng, Guohua, Wang, Xiaofeng, and Ashraf, Mohsin
- Subjects
- *
IMAGE segmentation , *SPINE , *GENERALIZATION , *ANNOTATIONS - Abstract
Few-shot segmentation (FSS) aims to reduce the need for manual annotation, which is both expensive and time-consuming. While FSS enhances model generalization to new concepts with only limited test samples, it still relies on a substantial amount of labeled training data for base classes. To address these issues, we propose a multi-backbone few shot segmentation (MBFSS) method. This self-supervised FSS technique utilizes unsupervised saliency for pseudo-labeling, allowing the model to be trained on unlabeled data. In addition, it integrates features from multiple backbones (ResNet, ResNeXt, and PVT v2) to generate a richer feature representation than a single backbone. Through extensive experimentation on PASCAL-5i and COCO-20i, our method achieves 54.3% and 25.1% on one-shot segmentation, exceeding the baseline methods by 13.5% and 4%, respectively. These improvements significantly enhance the model’s performance in real-world applications with negligible labeling effort. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
5. Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey.
- Author
-
Siméoni, Oriane, Zablocki, Éloi, Gidaris, Spyros, Puy, Gilles, and Pérez, Patrick
- Subjects
- *
ENTHUSIASM , *ANNOTATIONS , *VIDEOS - Abstract
The recent enthusiasm for open-world vision systems show the high interest of the community to perform perception tasks outside of the closed-vocabulary benchmark setups which have been so popular until now. Being able to discover objects in images/videos without knowing in advance what objects populate the dataset is an exciting prospect. But how to find objects without knowing anything about them? Recent works show that it is possible to perform class-agnostic unsupervised object localization by exploiting self-supervised pre-trained features. We propose here a survey of unsupervised object localization methods that discover objects in images without requiring any manual annotation in the era of self-supervised ViTs. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
6. Backpropagation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration.
- Author
-
Ji, Wei, Li, Li, Lv, Zheqi, Zhang, Wenqiao, Li, Mengze, Wan, Zhen, Lei, Wenqiang, and Zimmermann, Roger
- Subjects
CLOUD computing ,ARTIFICIAL intelligence ,EVERYDAY life ,ANNOTATIONS ,COST - Abstract
In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing AI systems primarily rooted in the cloud. As these systems grapple with shifting data distributions between the cloud and devices, the traditional approach of fine-tuning-based adaptation (FTA) exists the following issues: the costly and time-consuming data annotation required by FTA and the looming risk of model overfitting. To surmount these challenges, we introduce a universal on-device Multi-modal Model Adaptation framework, revolutionizing on-device model adaptation by striking a balance between efficiency and effectiveness. The framework features the Fast Domain Adaptor (FDA) hosted in the cloud, providing tailored parameters for the lightweight multi-modal model on devices. To enhance adaptability across multi-modal tasks, the AnchorFrame Distribution Reasoner (ADR) minimizes communication costs. Our contributions, encapsulated in the Cloud-Device Collaboration Multi-modal Parameter Generation (CDC-MMPG) framework, represent a pioneering solution for on-Device Multi-modal Model Adaptation (DMMA). Extensive experiments validate the efficiency and effectiveness of our method, particularly in video question answering and retrieval tasks, driving forward the integration of intelligent devices into our daily lives. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
7. AR-assisted assembly method based on instance segmentation.
- Author
-
Lv, Chaofan, Liu, Bo, Wu, Dianliang, Lv, Jianhao, Li, Jianjun, and Bao, Jinsong
- Subjects
AUGMENTED reality ,POINT cloud ,ALGORITHMS ,ANNOTATIONS ,AWARENESS - Abstract
AR-assisted assembly refers to the overlaying of virtual models, annotations, and other AR instructions in a genuine scene to help workers perform assembly tasks. However, most AR-aided assembly processes lack scene awareness and require frequent interaction to complete the assembly guidance process. To achieve intelligent AR-assisted assembly, this paper firstly uses an instance segmentation method based on depth learning to process the RGB-D data of the assembly scene, segment the instance of the assembly object, and segment the corresponding depth information according to the instance mask to reconstruct the point cloud instance of the assembly object. Next, the Iterative Closest Point (ICP) algorithm is employed to register all recognized assembly objects to the 3D model of the assembly, allowing for pose estimation and assembly status perception of the objects in the scene. Based on this, the current assembly step can be determined, and AR instructions can be automatically triggered to reduce user interaction burden. Finally, the proposed AR-assisted system was evaluated through quantitative and qualitative experiments, and the experimental results showed that the proposed method effectively improved assembly efficiency and reduced the occurrence of assembly errors. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. Shift guided active learning.
- Author
-
Yang, Jianan, Tan, Jimin, Wang, Haobo, Chen, Gang, Wu, Sai, and Zhao, Junbo
- Subjects
COMPUTER vision ,INFORMATION resources ,ALGORITHMS ,ANNOTATIONS - Abstract
Active learning is a pivotal machine learning paradigm where the algorithm queries data iteratively from an information source and updates itself accordingly. Active learning provides an instrument to investigate data selection and has been proven effective in reducing annotation costs. In a typical active learning framework, the query step only takes information from the current learning cycle and the information between cycles is usually ignored. It turns out that both inner-cycle and inter-cycle information provide crucial insights for learning progression. In this study, we identify the existence of distribution shifts that include both inner-cycle and inter-cycle information. This shift negatively impacts stability and model performance. To counter the impact of such a shift, we propose to integrate them into an active learning framework with specialized models. Our framework, Shift Adaptation via Guided Enquiry (SAGE), is founded on a set of dedicated query strategies guided by the distribution shift. We show that this new framework mitigates distribution shifts and outperforms previous studies on multiple computer vision benchmarks. With extensive experiments, we conclude that SAGE improves the state-of-the-art, with a significant 3.28% absolute accuracy improvement over the previous methods in the field of active learning. This framework is also compatible with semi-supervised (SSL) settings, allowing state-of-the-art SSL methods to attain higher performance. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
9. NG-Net: No-Grasp annotation grasp detection network for stacked scenes.
- Author
-
Shi, Min, Hou, Jingzhao, Li, Zhaoxin, and Zhu, Dengming
- Subjects
POINT cloud ,ANNOTATIONS ,LABOR supply ,ROBOTS ,ROBOTICS ,PREHENSION (Physiology) - Abstract
Achieving a high grasping success rate in a stacked environment is the core of the robot's grasping task. Most methods achieve a high grasping success rate by training the network on a dataset containing a large number of grasping annotations which requires a lot of manpower and material resources. Therefore, achieving a high grasping success rate for stacked scenes without grasping annotations is a challenging task. To address this, we propose a No-Grasp annotation grasp detection network for stacked scenes (NG-Net). Our network consists of two modules: an object selection module and a grasp generation module. Specifically, the object selection module performs instance segmentation on the raw point cloud to select the object with the highest score as the object to be grasped, and the grasp generation module uses mathematical methods to analyze the geometric features of the point cloud surface to achieve grasping pose generation without grasping annotations. Experiments show that on the modified IPA-Binpicking dataset G, NG-Net has an average grasp success rate of 97% in the stacked scene grasp experiment, 14–22% higher than PointNetGPD. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
10. MetaQ: fast, scalable and accurate metacell inference via single-cell quantization.
- Author
-
Li, Yunfan, Li, Hancong, Lin, Yijie, Zhang, Dan, Peng, Dezhong, Liu, Xiting, Xie, Jie, Hu, Peng, Chen, Lu, Luo, Han, and Peng, Xi
- Subjects
COMPUTATIONAL complexity ,MULTIOMICS ,ALGORITHMS ,ANNOTATIONS ,ANCESTORS - Abstract
To overcome the computational barriers of analyzing large-scale single-cell sequencing data, we introduce MetaQ, a metacell algorithm that scales to arbitrarily large datasets with linear runtime and constant memory usage. Inspired by cellular development, MetaQ conceptualizes each metacell as a collective ancestor of biologically similar cells. By quantizing cells into a discrete codebook, where each entry represents a metacell capable of reconstructing the original cells it quantizes, MetaQ identifies homogeneous cell subsets for efficient and accurate metacell inference. This approach reduces computational complexity from exponential to linear while maintaining or surpassing the performance of existing metacell algorithms. Extensive experiments demonstrate that MetaQ excels in downstream tasks such as cell type annotation, developmental trajectory inference, batch integration, and differential expression analysis. Thanks to its superior efficiency and effectiveness, MetaQ makes analyzing datasets with millions of cells practical, offering a powerful solution for single-cell studies in the era of high-throughput profiling. Large-scale single-cell sequencing data brings computational barriers for downstream analysis. Here, the authors propose MetaQ, a metacell algorithm that reduces cell number while preserving biological characteristics through a cell quantisation process, supporting both uni- and multi-omics data. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. In-Depth Collaboratively Supervised Video Instance Segmentation.
- Author
-
Deng, Yunnan, Zhang, Yinhui, and He, Zifen
- Subjects
ANNOTATIONS ,VIDEOS ,PIXELS ,COST ,SUPERVISION - Abstract
Video instance segmentation (VIS) is plagued by the high cost of pixel-level annotation and defects of weakly supervised segmentation, leading to the urgent need for a trade-off between annotation cost and performance. We propose a novel In-Depth Collaboratively Supervised video instance segmentation (IDCS) with efficient training. A collaborative supervised training pipeline is designed to flow samples of different labeling levels and carry out multimodal training, in which instance clues are obtained from mask-annotated instances to guide the box-annotated training through an in-depth collaborative paradigm: (1) a trident learning method is proposed, which leverages the video temporal consistency to match instances with multimodal annotation across frames for effective instance relation learning without additional network parameters; (2) spatial clues in the first frames are captured to implement multidimensional pixel affinity evaluation of box-annotated instances and augment the noise-disturbed spatial affinity map. Experiments on YoutTube-VIS validate the performance of IDCS with mask-annotated instances in the first frames and the bounding-box-annotated samples in the remaining frames. IDCS achieves up to 92.0% fully supervised performance and average 1.4 times faster, 2.2% mAP higher than the weakly supervised baseline. The results show that IDCS can efficiently utilize multimodal data, while providing advanced guidance for effective trade-off in VIS training. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
12. OW-YOLO: An Improved YOLOv8s Lightweight Detection Method for Obstructed Walnuts.
- Author
-
Wang, Haoyu, Yun, Lijun, Yang, Chenggui, Wu, Mingjie, Wang, Yansong, and Chen, Zaiqing
- Subjects
AGRICULTURAL development ,WALNUT ,SPINE ,ANNOTATIONS ,AGRICULTURE - Abstract
Walnut detection in mountainous and hilly regions often faces significant challenges due to obstructions, which adversely affect model performance. To address this issue, we collected a dataset comprising 2379 walnut images from these regions, with detailed annotations for both obstructed and non-obstructed walnuts. Based on this dataset, we propose OW-YOLO, a lightweight object detection model specifically designed for detecting small, obstructed walnuts. The model's backbone was restructured with the integration of the DWR-DRB (Dilated Weighted Residual-Dilated Residual Block) module. To enhance efficiency and multi-scale feature fusion, we incorporated the HSFPN (High-Level Screening Feature Pyramid Network) and redesigned the detection head by replacing the original head with the more efficient LADH detection head while removing the head processing 32 × 32 feature maps. These improvements effectively reduced model complexity and significantly enhanced detection accuracy for obstructed walnuts. Experiments were conducted using the PyTorch framework on an NVIDIA GeForce RTX 4060 Ti GPU. The results demonstrate that OW-YOLO outperforms other models, achieving an mAP@0.5 (mean average precision) of 83.6%, mAP@[0.5:0.95] of 53.7%, and an F1 score of 77.9%. Additionally, the model's parameter count decreased by 49.2%, weight file size was reduced by 48.1%, and computational load dropped by 37.3%, effectively mitigating the impact of obstruction on detection accuracy. These findings provide robust support for the future development of walnut agriculture and lay a solid foundation for the broader adoption of intelligent agriculture. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. Weakly Supervised Nuclei Segmentation with Point-Guided Attention and Self-Supervised Pseudo-Labeling.
- Author
-
Mo, Yapeng, Chen, Lijiang, Zhang, Lingfeng, and Zhao, Qi
- Subjects
- *
K-means clustering , *MOVING average process , *CLINICAL medicine , *ANNOTATIONS , *NOISE - Abstract
Due to the labor-intensive manual annotations for nuclei segmentation, point-supervised segmentation based on nuclei coordinate supervision has gained recognition in recent years. Despite great progress, two challenges hinder the performance of weakly supervised nuclei segmentation methods: (1) The stable and effective segmentation of adjacent cell nuclei remains an unresolved challenge. (2) Existing approaches rely solely on initial pseudo-labels generated from point annotations for training, and inaccurate labels may lead the model to assimilate a considerable amount of noise information, thereby diminishing performance. To address these issues, we propose a method based on center-point prediction and pseudo-label updating for precise nuclei segmentation. First, we devise a Gaussian kernel mechanism that employs multi-scale Gaussian masks for multi-branch center-point prediction. The generated center points are utilized by the segmentation module to facilitate the effective separation of adjacent nuclei. Next, we introduce a point-guided attention mechanism that concentrates the segmentation module's attention around authentic point labels, reducing the noise impact caused by pseudo-labels. Finally, a label updating mechanism based on the exponential moving average (EMA) and k-means clustering is introduced to enhance the quality of pseudo-labels. The experimental results on three public datasets demonstrate that our approach has achieved state-of-the-art performance across multiple metrics. This method can significantly reduce annotation costs and reliance on clinical experts, facilitating large-scale dataset training and promoting the adoption of automated analysis in clinical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
14. Tropy.
- Author
-
McCandless, Bret
- Subjects
- *
PHOTOGRAPHS , *IMAGE , *HISTORICAL research , *COMPUTER software , *ANNOTATIONS - Abstract
The article focuses on the use of Tropy software for managing research photographs in archival projects. Topics include the software's organizational features, its role in facilitating annotation and analysis of archival images, and its application for historical researchers managing large collections of photographs.
- Published
- 2025
- Full Text
- View/download PDF
15. ضرورت بازتصحیح دیوان مجیرالدین بیلقانی.
- Author
-
سیدمهدی طباطبائ&
- Subjects
- *
CANON (Literature) , *LIBRARY resources , *POETS , *POETRY (Literary form) , *ANNOTATIONS , *ACADEMIC dissertations - Abstract
Mojir al-Din Baylaqani (d. 586 AH) is a renowned poet of the 6th century and a prominent figure of the Arranian school in Persian poetry. He is renowned for his deceptively simple yet fluent verses compared to his contemporaries. The only existing edition of his Divan (collected works) was prepared by Mohammad Abadi Bavail as a doctoral dissertation in 1974 under the supervision of Mehdi Mohaghegh, Mazaher Mosaffa, and Ahmad Mahdavi Damghani and was subsequently published in Tabriz in 1979. While this pioneering edition holds historical value, subsequent reviews have revealed its deficiencies. The recent discovery of new manuscript sources that were unavailable to the previous editor presents an opportunity for a revised, more comprehensive edition. This study, undertaken through a critical re-editing of Mojir al-Din's Divan based on over 30 manuscript copies and employing a descriptive-analytical methodology using library resources, aimed to highlight the shortcomings of the existing edition. The significance of poets from Arran Region in the Persian literary canon, coupled with the necessity of a deeper scholarly engagement with Mojir al-Din's poems, underscored the importance of this research. The overall conclusion pointed to the inadequacies of the previous edition and the pressing need for its re-editing. According to the findings, the primary deficiencies of the prior edition included the erroneous attribution of other poets' works to Mojir al-Din, omission of some of his poems, inconsistencies in the transcription of certain poems, misreadings, incorrect selections, and inadequate or inaccurate annotations. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
16. Telepresence for surgical assistance and training using eXtended reality during and after pandemic periods.
- Author
-
Wisotzky, Eric L, Rosenthal, Jean-Claude, Meij, Senna, van den Dobblesteen, John, Arens, Philipp, Hilsmann, Anna, Eisert, Peter, Uecker, Florian Cornelius, and Schneider, Armin
- Subjects
- *
MEDICAL students , *MEDICAL personnel , *UNITS of measurement , *COVID-19 pandemic , *ENGINEERING students - Abstract
Existing challenges in surgical education (See one, do one, teach one) as well as the COVID-19 pandemic make it necessary to develop new ways for surgical training. Therefore, this work describes the implementation of a scalable remote solution called "TeleSTAR" using immersive, interactive and augmented reality elements which enhances surgical training in the operating room. The system uses a full digital surgical microscope in the context of Ear–Nose–Throat surgery. The microscope is equipped with a modular software augmented reality interface consisting an interactive annotation mode to mark anatomical landmarks using a touch device, an experimental intraoperative image-based stereo-spectral algorithm unit to measure anatomical details and highlight tissue characteristics. The new educational tool was evaluated and tested during the broadcast of three live XR-based three-dimensional cochlear implant surgeries. The system was able to scale to five different remote locations in parallel with low latency and offering a separate two-dimensional YouTube stream with a higher latency. In total more than 150 persons were trained including healthcare professionals, biomedical engineers and medical students. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
17. Exploring annotation taxonomy in grouped bar charts: A qualitative classroom study.
- Author
-
Rahman, Md Dilshadur, Quadri, Ghulam Jilani, Szafir, Danielle Albers, and Rosen, Paul
- Subjects
DATA transmission systems ,DATA analysis ,ANNOTATIONS ,VISUALIZATION ,EMPIRICAL research - Abstract
Annotations are an essential part of data analysis and communication in visualizations, which focus a readers attention on critical visual elements (e.g. an arrow that emphasizes a downward trend in a bar chart). Annotations enhance comprehension, mental organization, memorability, user engagement, and interaction and are crucial for data externalization and exploration, collaborative data analysis, and narrative storytelling in visualizations. However, we have identified a general lack of understanding of how people annotate visualizations to support effective communication. In this study, we evaluate how visualization students annotate grouped bar charts when answering high-level questions about the data. The resulting annotations were qualitatively coded to generate a taxonomy of how they leverage different visual elements to communicate critical information. We found that the annotations used significantly varied by the task they were supporting and that whereas several annotation types supported many tasks, others were usable only in special cases. We also found that some tasks were so challenging that ensembles of annotations were necessary to support the tasks sufficiently. The resulting taxonomy of approaches provides a foundation for understanding the usage of annotations in broader contexts to help visualizations achieve their desired message. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. Class incremental named entity recognition without forgetting: Class incremental named entity recognition without forgetting: Y. Liu et al.
- Author
-
Liu, Ye, Huang, Shaobin, Wei, Chi, Tian, Sicheng, Li, Rongsheng, Yan, Naiyu, and Du, Zhijuan
- Subjects
MACHINE learning ,ANNOTATIONS - Abstract
Class Incremental Named Entity Recognition (CINER) needs to learn new entity classes without forgetting old entity classes under the setting where the data only contain annotations for new entity classes. As is well known, the forgetting problem is the biggest challenge in Class Incremental Learning (CIL). In the CINER scenario, the unlabeled old class entities will further aggravate the forgetting problem. The current CINER method based on a single model cannot completely avoid the forgetting problem and is sensitive to the learning order of entity classes. To this end, we propose a Multi-Model (MM) framework that trains a new model for each incremental step and uses all the models for inference. In MM, each model only needs to learn the entity classes included in corresponding step, so MM has no forgetting problem and is robust to the different entity class learning orders. Furthermore, we design an error-correction training strategy and conflict-handling rules for MM to further improve performance. We evaluate MM on CoNLL-03 and OntoNotes-V5, and the experimental results show that our framework outperforms the current state-of-the-art (SOTA) methods by a large margin. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
19. Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding.
- Author
-
Tang, Kefan, He, Lihuo, Wang, Nannan, and Gao, Xinbo
- Published
- 2025
- Full Text
- View/download PDF
20. Category-Contrastive Fine-Grained Crowd Counting and Beyond.
- Author
-
Zhang, Meijing, Chen, Mengxue, Li, Qi, Chen, Yanchen, Lin, Rui, Li, Xiaolian, He, Shengfeng, and Liu, Wenxi
- Published
- 2025
- Full Text
- View/download PDF
21. SQL-Net: Semantic Query Learning for Point-Supervised Temporal Action Localization.
- Author
-
Wang, Yu, Zhao, Shengjie, and Chen, Shiwei
- Published
- 2025
- Full Text
- View/download PDF
22. Video Instance Segmentation Without Using Mask and Identity Supervision.
- Author
-
Li, Ge, Cao, Jiale, Sun, Hanqing, Anwer, Rao Muhammad, Xie, Jin, Khan, Fahad, and Pang, Yanwei
- Published
- 2025
- Full Text
- View/download PDF
23. 3D Lidar Point Cloud Segmentation for Automated Driving.
- Author
-
Abbasi, Rashid, Bashir, Ali Kashif, Rehman, Amjad, and Ge, Yuan
- Abstract
The use of 3D point clouds (3DPCs) in deep learning (DL) has recently gained popularity due to several applications in fields such as computer vision, autonomous systems, and robotics. DL, as a dominant artificial intelligence approach, has been effectively applied to handle a variety of 3D vision challenges. However, building strong discriminative feature representations from irregular and unordered PCs is difficult. Self-driving systems commonly employ lidar to obtain precise 3D geometric data around vehicles for perception, path planning, and localization. The semantic segmentation of lidar-based PCs is a key activity that must be completed in real time. However, the majority of current convolutional neural network models for 3DPC semantic segmentation are extremely complex and cannot be processed in real time on an embedded platform. In this article, our goal is to offer a comprehensive review of current advances in DL approaches for PC feature representation, including 3DPC segmentation, and future challenges. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
24. Toward Oriented Fisheye Object Detection: Dataset and Baseline.
- Author
-
Yang, Jialin, Lin, Chunyu, Nie, Lang, Kong, Zisen, Wang, Jiapeng, and Zhao, Yao
- Subjects
OBJECT recognition (Computer vision) ,DATA augmentation ,ANNOTATIONS - Abstract
Fisheye object detection is challenging due to the fisheye distortion, which inclines objects to different extents and pushes extensive irrelevant pixels into the predicted horizontal bounding box (HBB). To address the problems above, we establish a new fisheye object detection dataset (named FishOBB) with compact oriented bounding box (OBB) annotations, as well as an OBB-customized mosaic augmentation technology. To our knowledge, there are very few fisheye datasets labeled by OBB, especially the open source forword view dataset like ours. Besides, we provide a fisheye object detection baseline (named FDA-YOLO) with two fisheye adaption units. Concretely, we first design a distortion orientation aggregation (DOA) unit guided by polar sampling to capture distortion-aware fisheye features. On the other hand, to transfer HBB-based detection models to OBB-based counterparts, we propose an oriented anchor attention unit. It automatically weights the unbalanced positive/negative samples and facilitates convergence for multi-anchor models. Finally, we demonstrate that the two adaption units can be easily integrated into various anchor-based YOLO methods, e.g., ScaledYOLOv4 and YOLOv7, contributing to superior performance to existing state-of-the-art (SoTA) solutions in the proposed dataset. Meanwhile, our method has also achieved SoTA performance on other popular datasets like WEPDTOF. The dataset and code are released at https://github.com/lukanightfever/FishOBB. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
25. Decomposed Prototype Learning for Few-Shot Scene Graph Generation.
- Author
-
Li, Xingchen, Xiao, Jun, Chen, Guikun, Feng, Yinfu, Yang, Yi, Liu, An-An, and Chen, Long
- Subjects
POLYSEMY ,PROTOTYPES ,ANNOTATIONS - Abstract
Today's scene graph generation (SGG) models typically require abundant manual annotations to learn new predicate types. Therefore, it is difficult to apply them to real-world applications with massive uncommon predicate categories whose annotations are hard to collect. In this article, we focus on Few-Shot SGG (FSSGG), which encourages SGG models to be able to quickly transfer previous knowledge and recognize unseen predicates well with only a few examples. However, current methods for FSSGG are hindered by the high intra-class variance of predicate categories in SGG: On one hand, each predicate category commonly has multiple semantic meanings under different contexts. On the other hand, the visual appearance of relation triplets with the same predicate differs greatly under different subject–object compositions. Such great variance of inputs makes it hard to learn generalizable representation for each predicate category with current few-shot learning (FSL) methods. However, we found that this intra-class variance of predicates is highly related to the composed subjects and objects. To model the intra-class variance of predicates with subject–object context, we propose a novel Decomposed Prototype Learning (DPL) model for FSSGG. Specifically, we first construct a decomposable prototype space to capture diverse semantics and visual patterns of subjects and objects for predicates by decomposing them into multiple prototypes. Afterwards, we integrate these prototypes with different weights to generate query-adaptive predicate representation with more reliable semantics for each query sample. We conduct extensive experiments and compare with various baseline methods to show the effectiveness of our method. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
26. Semantic Segmentation of Pedestrian Groups Based on Directional-oriented Density Features with Shallow U-net Architecture.
- Author
-
Sidharta, Hanugra Aulia, Al Kindhi, Berlian, Mulyanto, Eko, and Purnomo, Mauridhi Hery
- Subjects
PEDESTRIANS ,SHOULDER ,NECK ,DENSITY ,ANNOTATIONS - Abstract
Pedestrians typically form a small group with another pedestrian when they are traveling in the same direction and toward the same destination. While becoming members of a group of pedestrians, they dynamically interact with other pedestrians in the same group and outside the group. In order to understand pedestrian interaction, it is necessary to distinguish between pedestrians in a group and pedestrians not in a group. This can be achieved by performing semantic segmentation based on the JAAD dataset, which is suitable for observing pedestrian walking behavior with abundant annotation. In this research, we propose to perform semantic segmentation by utilizing directional-oriented density features. Density features are calculated by utilizing each joint relationship, while pedestrian direction can be predicted by calculated dot product based on shoulder, neck and hip joint. Segmentation is performed by employing a shallow U-network architecture, with fewer layers than the original U-network architecture. Compared with the original U-net architecture and its derived, our proposed method not only outperforms but also achieves stable performance from the early epoch in 6
th epoch, and reaches a score over 0.97 during prediction, demonstrating its impressive performance. [ABSTRACT FROM AUTHOR]- Published
- 2025
- Full Text
- View/download PDF
27. Web System with Gamification to Enhance Reading Comprehension in a Secondary-Level Educational Institution.
- Author
-
Vega-Huerta, Hugo, Chunga-Vargas, Manuel, Guerra-Grados, Luis, Lázaro-Guillermo, Juan, Benito-Pacheco, Oscar, Pantoja-Collantes, Jorge, and Gil-Calvo, Ruben
- Subjects
READING comprehension ,HIGH school students ,METACOGNITION ,GAMIFICATION ,EXPERIMENTAL groups - Abstract
In 2021, a decrease in High school students' grades occurred compared to previous years in the Reading comprehension area, according to the MINEDU. To improve their reading comprehension skills, we suggest implementing gamification, a learning technique which helps understanding certain topics by games. To achieve this, we considered working with two groups. The experimental group used gamification strategies and the collaborative annotation tool, while the control group did not have these gamification tools. The results showed that the experimental group took more notes significantly when reading and answered more questions effectively than the control group, having a more immersive experience with teamwork. As there was a 17.46% improvement in scores, this shows that the annotation technique increases the reading comprehension skills in high school students. This demonstrated that the use of the annotation tool helps improve students' reading comprehension. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. Estudio sobre el uso de anotaciones multimedia y etiquetado social aplicado al campo del Prácticum.
- Author
-
Latorre Medina, María José
- Subjects
TAGS (Metadata) ,EDUCATIONAL films ,TEACHER training ,RESEARCH personnel ,ANNOTATIONS ,DIGITAL video - Abstract
Copyright of Campus Virtuales is the property of Campus Virtuales and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2025
- Full Text
- View/download PDF
29. Pseudolabel guided pixels contrast for domain adaptive semantic segmentation.
- Author
-
Xiang, Jianzi, Wan, Cailu, and Cao, Zhu
- Subjects
- *
ARTIFICIAL intelligence , *IMAGE processing , *TEST methods , *PIXELS , *ANNOTATIONS - Abstract
Semantic segmentation is essential for comprehending images, but the process necessitates a substantial amount of detailed annotations at the pixel level. Acquiring such annotations can be costly in the real-world. Unsupervised domain adaptation (UDA) for semantic segmentation is a technique that uses virtual data with labels to train a model and adapts it to real data without labels. Some recent works use contrastive learning, which is a powerful method for self-supervised learning, to help with this technique. However, these works do not take into account the diversity of features within each class when using contrastive learning, which leads to errors in class prediction. We analyze the limitations of these works and propose a novel framework called Pseudo-label Guided Pixel Contrast (PGPC), which overcomes the disadvantages of previous methods. We also investigate how to use more information from target images without adding noise from pseudo-labels. We test our method on two standard UDA benchmarks and show that it outperforms existing methods. Specifically, we achieve relative improvements of 5.1% mIoU and 4.6% mIoU on the Grand Theft Auto V (GTA5) to Cityscapes and SYNTHIA to Cityscapes tasks based on DAFormer, respectively. Furthermore, our approach can enhance the performance of other UDA approaches without increasing model complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Annotating protein functions via fusing multiple biological modalities.
- Author
-
Ma, Wenjian, Bi, Xiangpeng, Jiang, Huasen, Wei, Zhiqiang, and Zhang, Shugang
- Subjects
- *
GENE fusion , *INDIVIDUALIZED medicine , *GENE ontology , *PROTEINS , *ANNOTATIONS , *MULTIMODAL user interfaces - Abstract
Understanding the function of proteins is of great significance for revealing disease pathogenesis and discovering new targets. Benefiting from the explosive growth of the protein universal, deep learning has been applied to accelerate the protein annotation cycle from different biological modalities. However, most existing deep learning-based methods not only fail to effectively fuse different biological modalities, resulting in low-quality protein representations, but also suffer from the convergence of suboptimal solution caused by sparse label representations. Aiming at the above issue, we propose a multiprocedural approach for fusing heterogeneous biological modalities and annotating protein functions, i.e., MIF2GO (Multimodal Information Fusion to infer Gene Ontology terms), which sequentially fuses up to six biological modalities ranging from different biological levels in three steps, thus leading to powerful protein representations. Evaluation results on seven benchmark datasets show that the proposed method not only considerably outperforms state-of-the-art performance, but also demonstrates great robustness and generalizability across species. Besides, we also present biological insights into the associations between those modalities and protein functions. This research provides a robust framework for integrating multimodal biological data, offering a scalable solution for protein function annotation, ultimately facilitating advancements in precision medicine and the discovery of novel therapeutic strategies. MIF2GO leverages up to six biological modalities to enhance protein function annotation. It outperforms state-of-the-art methods, showing robustness and generalizability across species, while offering insights into modality-function associations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Streamlining segmentation of cryo-electron tomography datasets with Ais.
- Author
-
Last, Mart G. F., Abendstein, Leoni, Voortman, Lenard M., and Sharp, Thomas H.
- Subjects
- *
ELECTRONIC data processing , *TOMOGRAPHY , *INSTITUTIONAL repositories , *ANNOTATIONS , *COMPUTER software - Abstract
Segmentation is a critical data processing step in many applications of cryo-electron tomography. Downstream analyses, such as subtomogram averaging, are often based on segmentation results, and are thus critically dependent on the availability of open-source software for accurate as well as high-throughput tomogram segmentation. There is a need for more user-friendly, flexible, and comprehensive segmentation software that offers an insightful overview of all steps involved in preparing automated segmentations. Here, we present Ais: a dedicated tomogram segmentation package that is geared towards both high performance and accessibility, available on GitHub. In this report, we demonstrate two common processing steps that can be greatly accelerated with Ais: particle picking for subtomogram averaging, and generating many-feature segmentations of cellular architecture based on in situ tomography data. Featuring comprehensive annotation, segmentation, and rendering functionality, as well as an open repository for trained models at aiscryoet. org, we hope that Ais will help accelerate research and dissemination of data involving cryoET. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. P-CNN: Percept-CNN for semantic segmentation.
- Author
-
Hegde, Deepak and Balaji, G. N.
- Subjects
COMPUTER vision ,CONVOLUTIONAL neural networks ,VISUAL fields ,FEATURE extraction ,ANNOTATIONS - Abstract
The task of image segmentation remains a fundamental challenge, in the field of computer vision. Convolutional Neural Networks (CNNs) have achieved significant success in this field, yet there are some limitations in the conventional approach. The process of accurate, pixel-wise image annotation is time-consuming, as well as requires more human effort. These problems are addressed by the proposed method called as Percept-CNN (P-CNN), which extracts the power of percepts, pixels responsible for the higher activations at each level. From each layer, percepts are extracted during the forward propagation. These percepts are then passed onto the subsequent layers, enabling the model to focus only on the useful visual information. The proposed method with Percept Convolution can potentially eliminate the complex and time-consuming task of image annotation without affecting the segmentation accuracy. Since the model focuses only on the useful salient visual information, it tends to reduce the extraction of the redundant features, which doesn't really contribute towards the final goal. This makes the model to be more robust, accurate and efficient. The proposed model was able to perform semantic segmentation without pixelwise annotations with an accuracy of 67% when tested on Oxford IIIT pet dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. CNVizard—a lightweight streamlit application for an interactive analysis of copy number variants.
- Author
-
Krause, Jeremias, Classen, Carlos, Dey, Daniela, Lausberg, Eva, Kessler, Luise, Eggermann, Thomas, Kurth, Ingo, Begemann, Matthias, and Kraft, Florian
- Subjects
- *
DNA copy number variations , *GENETIC testing , *DATA visualization , *DATA analysis , *ANNOTATIONS - Abstract
Background: Methods to call, analyze and visualize copy number variations (CNVs) from massive parallel sequencing data have been widely adopted in clinical practice and genetic research. To enable a streamlined analysis of CNV data, comprehensive annotations and good visualizations are indispensable. The ability to detect single exon CNVs is another important feature for genetic testing. Nonetheless, most available open-source tools come with limitations in at least one of these areas. One additional drawback is that available tools deliver data in an unstructured and static format which requires subsequent visualization and formatting efforts. Results: Here we present CNVizard, an interactive Streamlit app allowing a comprehensive visualization of CNVkit data. Furthermore, combining CNVizard with the CNVand pipeline allows the annotation and visualization of CNV or SV VCF files from any CNV caller. Conclusion: CNVizard, in combination with CNVand, enables the comprehensive and streamlined analysis of short- and long-read sequencing data and provide an intuitive webapp-like experience enabling an interactive visualization of CNV data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Point Cloud Wall Projection for Realistic Road Data Augmentation.
- Author
-
Kim, Kana, Lee, Sangjun, Kakani, Vijay, Li, Xingyou, and Kim, Hakil
- Subjects
- *
OBJECT recognition (Computer vision) , *POINT cloud , *DATA augmentation , *LIDAR , *ANNOTATIONS - Abstract
Several approaches have been developed to generate synthetic object points using real LiDAR point cloud data for advanced driver-assistance system (ADAS) applications. The synthetic object points generated from a scene (both the near and distant objects) are essential for several ADAS tasks. However, generating points from distant objects using sparse LiDAR data with precision is still a challenging task. Although there are a few state-of-the-art techniques to generate points from synthetic objects using LiDAR point clouds, limitations such as the need for intense compute power still persist in most cases. This paper suggests a new framework to address these limitations in the existing literature. The proposed framework contains three major modules, namely position determination, object generation, and synthetic annotation. The proposed framework uses a spherical point-tracing method that augments 3D LiDAR distant objects using point cloud object projection with point-wall generation. Also, the pose determination module facilitates scenarios such as platooning carried out by the synthetic object points. Furthermore, the proposed framework improves the ability to describe distant points from synthetic object points using multiple LiDAR systems. The performance of the proposed framework is evaluated on various 3D detection models such as PointPillars, PV-RCNN, and Voxel R-CNN for the KITTI dataset. The results indicate an increase in mAP (mean average precision) by 1.97 % , 1.3 % , and 0.46 % from the original dataset values of 82.23 % , 86.72 % , and 87.05 % , respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Unsupervised Semantic Scene Reconstruction via Transformer-Based Quantized Vector Reconstruction and Autoregressive Completion.
- Author
-
Miao, Yubin, Xie, Shuxin, Quan, Tianrui, Wan, Junkang, and Hao, Mengxiang
- Subjects
POINT cloud ,RESEARCH personnel ,ALGORITHMS ,ANNOTATIONS ,DEEP learning ,COST - Abstract
Semantic scene reconstruction from sparse and incomplete point clouds is a vital task in understanding point scenes. This task involves assigning semantic labels to objects and reconstructing their complete shapes as meshes. In recent years, researchers have adopted a "reconstruction from recognition" approach, which first segments foreground objects from the point cloud and then completes and reconstructs them as mesh representations. This method has successfully facilitated both the semantic and geometric understanding of point scenes. However, existing approaches based on deep learning often depend on supervised training, requiring extensive annotations and incurring high training costs. To address this limitation, we introduce unsupervised algorithms for completing and reconstructing partial observations. While Transformer-based autoregressive shape completion shows great potential, there has been limited research on applying it to complete instances segmented from real-world scenes. To bridge this gap, we propose VRC (unsupervised semantic scene reconstruction via Transformer-based quantized Vector Reconstruction and autoregressive Completion), a novel framework that integrates unsupervised algorithms with Transformer-based autoregressive completion. Our approach enables the unsupervised reconstruction of real-world scenes. Comparisons with state-of-the-art methods on authoritative public datasets demonstrate that VRC achieves superior reconstruction performance with significantly reduced data costs. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Corrosion Detection and Grading Method for Hydraulic Metal Structures Based on an Improved YOLOv10 Sequential Architecture.
- Author
-
Cheng, Haodong and Kang, Fei
- Subjects
HYDRAULIC structures ,DEEP learning ,METALLIC surfaces ,SURFACE structure ,ANNOTATIONS - Abstract
Herein, we present a method for detecting and determining the corrosion level of hydraulic metal structure surfaces through images while reducing the difficulty of dataset annotation. To achieve accurate detection of corrosion targets, the MobileViTv3 block is integrated into YOLOv10, resulting in the proposed YOLOv10-vit for corrosion target detection. Based on YOLOv10-vit, the YOLOv10-vit-cls classification network is introduced for corrosion level determination. This network leverages the pre-trained parameters of YOLOv10-vit to more quickly learn the features of different corrosion levels. To avoid subjective factors in the corrosion level annotation process and reduce annotation difficulty, a cascaded corrosion detection architecture combining YOLOv10-vit and YOLOv10-vit-cls is proposed. Finally, based on the proposed corrosion detection architecture, we achieve accurate corrosion detection and level determination for hydraulic metal structures. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. Toward trustable use of machine learning models of variant effects in the clinic.
- Author
-
Dias, Mafalda, Orenbuch, Rose, Marks, Debora S., and Frazer, Jonathan
- Subjects
- *
MACHINE learning , *GENETIC variation , *PREDICTION models , *ANNOTATIONS , *CALIBRATION - Abstract
There has been considerable progress in building models to predict the effect of missense substitutions in protein-coding genes, fueled in large part by progress in applying deep learning methods to sequence data. These models have the potential to enable clinical variant annotation on a large scale and hence increase the impact of patient sequencing in guiding diagnosis and treatment. To realize this potential, it is essential to provide reliable assessments of model performance, scope of applicability, and robustness. As a response to this need, the ClinGen Sequence Variant Interpretation Working Group, Pejaver et al., recently proposed a strategy for validation and calibration of in-silico predictions in the context of guidelines for variant annotation. While this work marks an important step forward, the strategy presented still has important limitations. We propose core principles and recommendations to overcome these limitations that can enable both more reliable and more impactful use of variant effect prediction models in the future. Machine-learning-powered predictions of the effect of genetic variants on human disease are becoming increasingly important in the clinic. In this manuscript, we lay down the core principles for their trustworthy validation and implementation and highlight four areas where current practices fall short, offering recommendations for advancing the field. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Theatrical Figures (and Others) as Book Subscribers for Sterne and Derrick.
- Author
-
Walker, Robert G.
- Subjects
- *
DERRICKS , *RECOMMENDED books , *ANNOTATIONS - Abstract
The investigation of the subscriber lists of eighteenth-century books is a relatively new subgenre in the scholarly world, in part because quite often the names appearing in such lists are now obscure. But subscribers were hardly unknown when they subscribed, and a recovery of their identities can often be useful to other scholarly pursuits. This essay focusses on the subscribers to the works of Laurence Sterne and Samuel Derrick, one author well known today and one little known. A relatively large number of the subscribers in both were members of the eighteenth-century theatrical world. Furthermore, the essay looks at eighteenth-century annotations of the subscription list in Derrick's Works (1755) with an eye toward using them to provide further information about the evolving relationship between Derrick and a far more famous figure of the time, Samuel Johnson. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. SimLVSeg: Simplifying Left Ventricular Segmentation in 2-D+Time Echocardiograms With Self- and Weakly Supervised Learning.
- Author
-
Maani, Fadillah, Ukaye, Asim, Saadi, Nada, Saeed, Numan, and Yaqub, Mohammad
- Subjects
- *
ECHOCARDIOGRAPHY , *IMAGE segmentation , *CONFIDENCE intervals , *MEDICAL personnel , *ANNOTATIONS - Abstract
Achieving reliable automatic left ventricle (LV) segmentation from echocardiograms is challenging due to the inherent sparsity of annotations in the dataset, as clinicians typically only annotate two specific frames for diagnostic purposes. Here we aim to address this challenge by introducing simplified LV segmentation (SimLVSeg), a novel paradigm that enables video-based networks for consistent LV segmentation from sparsely annotated echocardiogram videos. SimLVSeg consists of two training stages: (i) self-supervised pre-training with temporal masking, which involves pre-training a video segmentation network by capturing the cyclic patterns of echocardiograms from largely unannotated echocardiogram frames, and (ii) weakly supervised learning tailored for LV segmentation from sparse annotations. We extensively evaluated SimLVSeg using EchoNet-Dynamic, the largest echocardiography dataset. SimLVSeg outperformed state-of-the-art solutions by achieving a 93.32% (95% confidence interval: 93.21–93.43%) dice score while being more efficient. We further conducted an out-of-distribution test to showcase SimLVSeg's generalizability on distribution shifts (CAM US dataset). Our findings show that SimLVSeg exhibits excellent performance on LV segmentation with a relatively cheaper computational cost. This suggests that adopting video-based networks for LV segmentation is a promising research direction to achieve reliable LV segmentation. Our code is publicly available at https://github.com/BioMedIA-MBZUAI/SimLVSeg. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. People make mistakes: Obtaining accurate ground truth from continuous annotations of subjective constructs.
- Author
-
Booth, Brandon M. and Narayanan, Shrikanth S.
- Subjects
- *
RANK correlation (Statistics) , *TIME measurements , *ANNOTATIONS , *PSYCHOMETRICS - Abstract
Accurately representing changes in mental states over time is crucial for understanding their complex dynamics. However, there is little methodological research on the validity and reliability of human-produced continuous-time annotation of these states. We present a psychometric perspective on valid and reliable construct assessment, examine the robustness of interval-scale (e.g., values between zero and one) continuous-time annotation, and identify three major threats to validity and reliability in current approaches. We then propose a novel ground truth generation pipeline that combines emerging techniques for improving validity and robustness. We demonstrate its effectiveness in a case study involving crowd-sourced annotation of perceived violence in movies, where our pipeline achieves a.95 Spearman correlation in summarized ratings compared to a.15 baseline. These results suggest that highly accurate ground truth signals can be produced from continuous annotations using additional comparative annotation (e.g., a versus b) to correct structured errors, highlighting the need for a paradigm shift in robust construct measurement over time. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Comparing automated subcortical volume estimation methods; amygdala volumes estimated by FSL and FreeSurfer have poor consistency.
- Author
-
Sadil, Patrick and Lindquist, Martin A.
- Subjects
- *
AMYGDALOID body , *STATISTICAL significance , *BIOMARKERS , *MAGNETIC resonance imaging , *ANNOTATIONS - Abstract
Subcortical volumes are a promising source of biomarkers and features in biosignatures, and automated methods facilitate extracting them in large, phenotypically rich datasets. However, while extensive research has verified that the automated methods produce volumes that are similar to those generated by expert annotation; the consistency of methods with each other is understudied. Using data from the UK Biobank, we compare the estimates of subcortical volumes produced by two popular software suites: FSL and FreeSurfer. Although most subcortical volumes exhibit good to excellent consistency across the methods, the tools produce diverging estimates of amygdalar volume. Through simulation, we show that this poor consistency can lead to conflicting results, where one but not the other tool suggests statistical significance, or where both tools suggest a significant relationship but in opposite directions. Considering these issues, we discuss several ways in which care should be taken when reporting on relationships involving amygdalar volume. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus.
- Author
-
Bermúdez-Sabel, Helena, Dell'Oro, Francesca, and Marongiu, Paola
- Subjects
- *
LATIN language , *ANNOTATIONS , *SEMANTICS , *MODALITY (Linguistics) , *CORPORA , *MODAL logic - Abstract
This paper stems from the project A World of Possibilities. Modal pathways over an extra-long period of time: the diachrony of modality in the Latin language (WoPoss) which involves a corpus-based approach to the study of modality in the history of the Latin language. Linguistic annotation and, in particular, the semantic annotation of modality is a keystone of the project. Besides the difficulties intrinsic to any annotation task dealing with semantics, our annotation scheme involves multiple layers of annotation that are interconnected, adding complexity to the task. Considering the intricacies of our fine-grained semantic annotation, we needed to develop well-documented schemas in order to control the consistency of the annotation, but also to enable an efficient reuse of our annotated corpus. This paper presents the different elements involved in the annotation task, and how the description and the relations between the different linguistic components were formalised and documented, combining schema languages with XML documentation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Procedure-Aware Action Quality Assessment: Datasets and Performance Evaluation.
- Author
-
Xu, Jinglin, Rao, Yongming, Zhou, Jie, and Lu, Jiwen
- Subjects
- *
SPORTS films , *DIVING , *LIMITATION of actions , *GENERALIZATION , *ANNOTATIONS - Abstract
In this paper, we investigate the problem of procedure-aware action quality assessment, which analyzes the action quality by delving into the semantic and spatial-temporal relationships among various composed steps of the action. Most existing action quality assessment methods regress on deep features of entire videos to learn diverse scores, which ignore the relationships among different fine-grained steps in actions and result in limitations in visual interpretability and generalization ability. To address these issues, we construct a fine-grained competitive sports video dataset called FineDiving with detailed semantic and temporal annotations, which helps understand the internal structures of each action. We also propose a new approach (i.e., spatial-temporal segmentation attention, STSA) that introduces procedure segmentation to parse an action into consecutive steps, learns powerful representations from these steps by constructing spatial motion attention and procedure-aware cross-attention, and designs a fine-grained contrastive regression to achieve an interpretable scoring mechanism. In addition, we build a benchmark on the FineDiving dataset to evaluate the performance of representative action quality assessment methods. Then, we expand FineDiving to FineDiving+ and construct three new benchmarks to investigate the transferable abilities between different diving competitions, between synchronized and individual dives, and between springboard and platform dives to demonstrate the generalization abilities of our STSA in unknown scenarios, scoring rules, action types, and difficulty degrees. Extensive experiments demonstrate that our approach, designed for procedure-aware action quality assessment, achieves substantial improvements. Our dataset and code are available at https://github.com/xujinglin/FineDiving. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Open-Vocabulary Animal Keypoint Detection with Semantic-Feature Matching.
- Author
-
Zhang, Hao, Xu, Lumin, Lai, Shenqi, Shao, Wenqi, Zheng, Nanning, Luo, Ping, Qiao, Yu, and Zhang, Kaipeng
- Subjects
- *
LANGUAGE models , *ANIMAL species , *ANNOTATIONS , *VOCABULARY , *SPECIES - Abstract
Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into fully supervised and few-shot class-agnostic approaches. The former typically relies on laborious and time-consuming manual annotations, posing considerable challenges in expanding keypoint detection to a broader range of keypoint categories and animal species. The latter, though less dependent on extensive manual input, still requires necessary support images with annotation for reference during testing. To realize zero-shot keypoint detection without any prior annotation, we introduce the Open-Vocabulary Keypoint Detection (OVKD) task, which is innovatively designed to use text prompts for identifying arbitrary keypoints across any species. In pursuit of this goal, we have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM). This framework synergistically combines vision and language models, creating an interplay between language features and local keypoint visual features. KDSM enhances its capabilities by integrating Domain Distribution Matrix Matching (DDMM) and other special modules, such as the Vision-Keypoint Relational Awareness (VKRA) module, improving the framework's generalizability and overall performance. Our comprehensive experiments demonstrate that KDSM significantly outperforms the baseline in terms of performance and achieves remarkable success in the OVKD task. Impressively, our method, operating in a zero-shot fashion, still yields results comparable to state-of-the-art few-shot species class-agnostic keypoint detection methods. Codes and data are available at https://github.com/zhanghao5201/KDSM. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Extreme R-CNN: Few-Shot Object Detection via Sample Synthesis and Knowledge Distillation.
- Author
-
Zhang, Shenyong, Wang, Wenmin, Wang, Zhibing, Li, Honglei, Li, Ruochen, and Zhang, Shixiong
- Subjects
- *
DETECTORS , *CLASSIFICATION , *ANNOTATIONS - Abstract
Traditional object detectors require extensive instance-level annotations for training. Conversely, few-shot object detectors, which are generally fine-tuned using limited data from unknown classes, tend to show biases toward base categories and are susceptible to variations within these unknown samples. To mitigate these challenges, we introduce a Two-Stage Fine-Tuning Approach (TFA) named Extreme R-CNN, designed to operate effectively with extremely limited original samples through the integration of sample synthesis and knowledge distillation. Our approach involves synthesizing new training examples via instance clipping and employing various data-augmentation techniques. We enhance the Faster R-CNN architecture by decoupling the regression and classification components of the Region of Interest (RoI), allowing synthetic samples to train the classification head independently of the object-localization process. Comprehensive evaluations on the Microsoft COCO and PASCAL VOC datasets demonstrate significant improvements over baseline methods. Specifically, on the PASCAL VOC dataset, the average precision for novel categories is enhanced by up to 15 percent, while on the more complex Microsoft COCO benchmark it is enhanced by up to 6.1 percent. Remarkably, in the 1-shot scenario, the AP50 of our model exceeds that of the baseline model in the 10-shot setting within the PASCAL VOC dataset, confirming the efficacy of our proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Comparative analysis of manual and programmed annotations for crowd assessment and classification using artificial intelligence.
- Author
-
Thakur, Amrish and Arya, Shwetank
- Subjects
ARTIFICIAL intelligence ,ACQUISITION of data ,DATA integrity ,CROWD funding ,ANNOTATIONS - Abstract
Funding agencies play a pivotal role in bolstering research endeavors by allocating financial resources for data collection and analysis. However, the lack of detailed information regarding the methods employed for data gathering and analysis can obstruct the replication and utilization of the results, ultimately affecting the study's transparency and integrity. The task of manually annotating extensive datasets demands considerable labor and financial investment, especially when it entails engaging specialized individuals. In our crowd counting study, we employed the web-based annotation tool SuperAnnotate to streamline the human annotation process for a dataset comprising 3,000 images. By integrating automated annotation tools, we realized substantial time efficiencies, as demonstrated by the remarkable achievement of 858,958 annotations. This underscores the significant contribution of such technologies to the efficiency of the annotation process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. 基于概率融合算法的煤矿事故隐患文本 知识实体抽取研究.
- Author
-
李靖, 李泽荃, 石福泰, and 郝强
- Subjects
COAL mining accidents ,KNOWLEDGE graphs ,TASK performance ,ANNOTATIONS ,HAZARDS ,DYNAMIC models - Abstract
Copyright of Journal of Mining Science & Technology is the property of Journal of Mining Science & Technology Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
48. Genome-wide identification, expression profiling, and network analysis of calcium and cadmium transporters in rice (Oryza sativa L.).
- Author
-
Kothari, Shubham, Sharma, V. K., Singh, Ashutosh, Singh, Sumeet Kumar, and Kumari, Sarita
- Subjects
BIOTECHNOLOGY ,TRANSITION metals ,BIOLOGICAL systems ,RICE ,HEAVY metals - Abstract
Calcium (Ca) and cadmium (Cd) are transition metals coexisting in the ecosystem. Ca is indispensable for the growth and development of plants as well as animals, while Cd is regarded as a toxic heavy metal for the living system. The transportation of Cd in the biological systems often used the pathways of Ca because of chemical similarities. High concentrations of cadmium replace Ca, Mn, and Zn from their respective metalloprotein sites and strongly associated with them. Replaced minerals from their metalloprotein sites are often released as an oxidative ion that is detrimental to it. The common transportation mechanism of Ca and Cd is implicit in the role of common and similar transporters for transporting them in plants. Thus, our study was done to identify the transporters for Ca and Cd and characterize them for similarity in terms of cotransportation system. A profile-based search program identified 44 transporters genes for Ca transportation and 70 genes for cadmium transportation. They were categorized into different groups based on the presence of signature motifs and domains. Identified transporters were characterized for genomic distribution, gene structure, annotation, conserved signature motifs, and domain. Further, cis motif analysis, heat map, gene ontology, and protein–protein interaction were conducted for Ca and Cd transporter genes. In silico expression showed Os05g0319800-1304 and Os0319800-6065 transporter genes were overexpressed for Ca and Os07g00232800-40298 and Os07g00384500-25924 transporter genes overexpressed for Cd transporter. These genes could be used as a candidate genes for enhancing the Ca concentration with reduced Cd content in rice using biotechnological approaches. Twenty-seven genes were found as the common transporters for Ca and Cd. Both active and passive transporter mechanisms act as cotransporters for Ca and Cd. The common signature motifs and domains can be targeted for the characterization of cotransporters of different minerals. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. PhrasIS: Phrase Inference and Similarity benchmark.
- Author
-
Lopez-Gazpio, I, Gaviria, J, García, P, Sanjurjo-González, H, Sanz, B, Zarranz, A, Maritxalar, M, and Agirre, E
- Subjects
HEADLINES ,TERMS & phrases ,ANNOTATIONS ,MANUSCRIPTS ,VOCABULARY - Abstract
We present PhrasIS, a benchmark dataset composed of natural occurring Phras e pairs with I nference and S imilarity annotations for the evaluation of semantic representations. The described dataset fills the gap between word and sentence-level datasets, allowing to evaluate compositional models at a finer granularity than sentences. Contrary to other datasets, the phrase pairs are extracted from naturally occurring text in image captions and news headlines. All the text fragments have been annotated by experts following a rigorous process also described in the manuscript achieving high inter annotator agreement. In this work we analyse the dataset, showing the relation between inference labels and similarity scores. With 10K phrase pairs split in development and test, the dataset is an excellent benchmark for testing meaning representation systems. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. A large-scale Chinese patent dataset for information extraction.
- Author
-
Zheng, Qian, Guo, Kefu, and Xu, Lin
- Subjects
DATA mining ,PATENTS ,CLASSIFICATION ,CORPORA ,ANNOTATIONS - Abstract
Information extraction is an important foundation for automated patent analysis. Deep learning methods show promising results for information extraction, the performance of such methods heavily depends on the available corpus. To promote research on Chinese information extraction and evaluate the performance of related systems, we present a novel dataset, named CPIE, and make it publicly available. The dataset consisting of five thousands records of Chinese patent documents. The data were annotated by a tagging team using an on-line annotation tools. The dataset was evaluated using a state-of-the-art information extraction method that involves named entity recognition and relationship classification. The results shed light on new challenges and promote information extraction research. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.