70,796 results on '"Li, Xiang"'
Search Results
2. Rethinking Structure Learning For Graph Neural Networks
- Author
-
Zheng, Yilun, Zhang, Zhuofan, Wang, Ziming, Li, Xiang, Luan, Sitao, Peng, Xiaojiang, and Chen, Lihui
- Subjects
Computer Science - Machine Learning - Abstract
To improve the performance of Graph Neural Networks (GNNs), Graph Structure Learning (GSL) has been extensively applied to reconstruct or refine original graph structures, effectively addressing issues like heterophily, over-squashing, and noisy structures. While GSL is generally thought to improve GNN performance, it often leads to longer training times and more hyperparameter tuning. Besides, the distinctions among current GSL methods remain ambiguous from the perspective of GNN training, and there is a lack of theoretical analysis to quantify their effectiveness. Recent studies further suggest that, under fair comparisons with the same hyperparameter tuning, GSL does not consistently outperform baseline GNNs. This motivates us to ask a critical question: is GSL really useful for GNNs? To address this question, this paper makes two key contributions. First, we propose a new GSL framework, which includes three steps: GSL base (the representation used for GSL) construction, new structure construction, and view fusion, to better understand the effectiveness of GSL in GNNs. Second, after graph convolution, we analyze the differences in mutual information (MI) between node representations derived from the original topology and those from the newly constructed topology. Surprisingly, our empirical observations and theoretical analysis show that no matter which type of graph structure construction methods are used, after feeding the same GSL bases to the newly constructed graph, there is no MI gain compared to the original GSL bases. To fairly reassess the effectiveness of GSL, we conduct ablation experiments and find that it is the pretrained GSL bases that enhance GNN performance, and in most cases, GSL cannot improve GNN performance. This finding encourages us to rethink the essential components in GNNs, such as self-training and structural encoding, in GNN design rather than GSL.
- Published
- 2024
3. Is Graph Convolution Always Beneficial For Every Feature?
- Author
-
Zheng, Yilun, Li, Xiang, Luan, Sitao, Peng, Xiaojiang, and Chen, Lihui
- Subjects
Computer Science - Machine Learning ,Computer Science - Social and Information Networks - Abstract
Graph Neural Networks (GNNs) have demonstrated strong capabilities in processing structured data. While traditional GNNs typically treat each feature dimension equally during graph convolution, we raise an important question: Is the graph convolution operation equally beneficial for each feature? If not, the convolution operation on certain feature dimensions can possibly lead to harmful effects, even worse than the convolution-free models. In prior studies, to assess the impacts of graph convolution on features, people proposed metrics based on feature homophily to measure feature consistency with the graph topology. However, these metrics have shown unsatisfactory alignment with GNN performance and have not been effectively employed to guide feature selection in GNNs. To address these limitations, we introduce a novel metric, Topological Feature Informativeness (TFI), to distinguish between GNN-favored and GNN-disfavored features, where its effectiveness is validated through both theoretical analysis and empirical observations. Based on TFI, we propose a simple yet effective Graph Feature Selection (GFS) method, which processes GNN-favored and GNN-disfavored features separately, using GNNs and non-GNN models. Compared to original GNNs, GFS significantly improves the extraction of useful topological information from each feature with comparable computational costs. Extensive experiments show that after applying GFS to 8 baseline and state-of-the-art (SOTA) GNN architectures across 10 datasets, 83.75% of the GFS-augmented cases show significant performance boosts. Furthermore, our proposed TFI metric outperforms other feature selection methods. These results validate the effectiveness of both GFS and TFI. Additionally, we demonstrate that GFS's improvements are robust to hyperparameter tuning, highlighting its potential as a universal method for enhancing various GNN architectures.
- Published
- 2024
4. RPCAcc: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator
- Author
-
Zhang, Jie, Huang, Hongjing, Xu, Xuzheng, Li, Xiang, Liu, Ming, and Wang, Zeke
- Subjects
Computer Science - Hardware Architecture - Abstract
The emerging microservice/serverless-based cloud programming paradigm and the rising networking speeds leave the RPC stack as the predominant data center tax. Domain-specific hardware acceleration holds the potential to disentangle the overhead and save host CPU cycles. However, state-of-the-art RPC accelerators integrate RPC logic into the CPU or use specialized low-latency interconnects, hardly adopted in commodity servers. To this end, we design and implement RPCAcc, a software-hardware co-designed RPC on-NIC accelerator that enables reconfigurable RPC kernel offloading. RPCAcc connects to the server through the most widely used PCIe interconnect. To grapple with the ramifications of PCIe-induced challenges, RPCAcc introduces three techniques:(a) a target-aware deserializer that effectively batches cross-PCIe writes on the accelerator's on-chip memory using compacted hardware data structures; (b) a memory-affinity CPU-accelerator collaborative serializer, which trades additional host memory copies for slow cross-PCIe transfers; (c) an automatic field update technique that transparently codifies the schema based on dynamic reconfigure RPC kernels to minimize superfluous PCIe traversals. We prototype RPCAcc using the Xilinx U280 FPGA card. On HyperProtoBench, RPCAcc achieves 3.2X lower serialization time than a comparable RPC accelerator baseline and demonstrates up to 2.6X throughput improvement in the end-to-end cloud workload.
- Published
- 2024
5. Are there Black Hole Symbiotic X-ray Binaries?
- Author
-
Deng, Zhu-Ling and Li, Xiang-Dong
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
While there are over a dozen known neutron star (NS) symbiotic X-ray binaries (SyXBs) in the Galaxy, none SyXBs containing a black hole (BH) have been detected. We address this problem by incorporating binary population synthesis and the accretion properties of BHs fed by the wind from red giant companions. We investigate the impact of different supernova mechanisms, kick velocity distributions and wind velocities on the formation of both NS and BH SyXBs. Our simulations show that the number of BH SyXBs is at most one-sixth that of NS SyXBs in the Galaxy provided that the common envelope efficiency parameter $\alpha\sim 0.3-5$. And less than $\sim 10$ of BH SyXBs could be detectable in X-ray, considering their low radiation efficiencies. These findings indicate a scarcity of BH SyXBs in the Galaxy., Comment: 13 pages, 5 figures, accepted for publication in ApJ
- Published
- 2024
6. Escalating LLM-based Code Translation Benchmarking into the Class-level Era
- Author
-
Xue, Pengyu, Wu, Linhao, Wang, Chengyi, Li, Xiang, Yang, Zhen, Jin, Ruikai, Zhang, Yuxiang, Li, Jia, Pei, Yifei, Shen, Zhaoyan, and Lyu, Xiran
- Subjects
Computer Science - Software Engineering - Abstract
In recent years, Large Language Models (LLMs) have significantly improved automated code translation, often achieving over 80% accuracy on existing benchmarks. However, most of these benchmarks consist of short, standalone, algorithmic samples that do not reflect practical coding tasks. To address this gap, we introduce ClassEval-T, a class-level code translation benchmark designed to assess LLM performance on real-world coding scenarios. Built upon ClassEval, a class-level Python code generation benchmark covering topics such as database operations and game design, ClassEval-T extends into Java and C++ with complete code samples and test suites, requiring 360 person-hours for manual migration. We propose three translation strategies (holistic, min-dependency, and standalone) and evaluate six recent LLMs across various families and sizes on ClassEval-T. Results reveal a significant performance drop compared to method-level benchmarks, highlighting discrepancies among LLMs and demonstrating ClassEval-T's effectiveness. We further analyze LLMs' dependency awareness in translating class samples and categorize 1,397 failure cases by the best-performing LLM for practical insights and future improvement.
- Published
- 2024
7. The Framework of NAVIS: Navigating Virtual Spaces with Immersive Scooters
- Author
-
Lin, Zhixun, He, Wei, Liu, Xinyi, Ye, Mingchen, Li, Xiang, and Kan, Ge Lin
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Virtual reality (VR) environments have greatly expanded opportunities for immersive exploration, yet physically navigating these digital spaces remains a significant challenge. In this paper, we present the conceptual framework of NAVIS (Navigating Virtual Spaces with Immersive Scooters), a novel system that utilizes a scooter-based interface to enhance both navigation and interaction within virtual environments. NAVIS combines real-time physical mobility, haptic feedback, and CAVE-like (Cave Automatic Virtual Environment) technology to create a realistic sense of travel and movement, improving both spatial awareness and the overall immersive experience. By offering a more natural and physically engaging method of exploration, NAVIS addresses key limitations found in traditional VR locomotion techniques, such as teleportation or joystick control, which can detract from immersion and realism. This approach highlights the potential of combining physical movement with virtual environments to provide a more intuitive and enjoyable experience for users, opening up new possibilities for applications in gaming, education, and beyond.
- Published
- 2024
- Full Text
- View/download PDF
8. PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training
- Author
-
Yi, Rongjie, Li, Xiang, Xie, Weikai, Lu, Zhenyan, Wang, Chenghua, Zhou, Ao, Wang, Shangguang, Zhang, Xiwen, and Xu, Mengwei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
The interest in developing small language models (SLM) for on-device deployment is fast growing. However, the existing SLM design hardly considers the device hardware characteristics. Instead, this work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training. Guided by this principle, we develop PhoneLM SLM family (currently with 0.5B and 1.5B versions), that acheive the state-of-the-art capability-efficiency tradeoff among those with similar parameter size. We fully open-source the code, weights, and training datasets of PhoneLM for reproducibility and transparency, including both base and instructed versions. We also release a finetuned version of PhoneLM capable of accurate Android Intent invocation, and an end-to-end Android demo. All materials are available at https://github.com/UbiquitousLearning/PhoneLM.
- Published
- 2024
9. Magnetic order induced truly chiral phonons in a ferromagnetic Weyl semimetal
- Author
-
Che, Mengqian, Liang, Jinxuan, Cui, Yunpeng, Li, Hao, Lu, Bingru, Sang, Wenbo, Li, Xiang, Dong, Xuebin, Zhang, Shuai, Sun, Tao, Liu, Enke, Jin, Feng, Zhang, Tiantian, and Yang, Luyi
- Subjects
Condensed Matter - Materials Science - Abstract
Chiral phonons are vibrational modes in a crystal that possess a well-defined handedness or chirality, typically found in materials that lack inversion symmetry. Here we report the discovery of truly chiral phonon modes in the kagome ferromagnetic Weyl semimetal Co3Sn2S2, a material that preserves inversion symmetry but breaks time-reversal symmetry. Using helicity-resolved magneto-Raman spectroscopy, we observe the spontaneous splitting of the doubly degenerate in-plane Eg modes into two distinct chiral phonon modes of opposite helicity when the sample is zero-field cooled below the Curie temperature, without the application of an external magnetic field. As we sweep the out-of-plane magnetic field, this Eg phonon splitting exhibits a well-defined hysteresis loop directly correlated with the material's magnetization. The observed spontaneous splitting reaches up to 1.27 cm-1 at low temperatures and diminishes with increasing temperature, ultimately vanishing at the Curie temperature. Our findings highlight the role of the magnetic order in inducing chiral phonons, paving the way for novel methods to manipulate chiral phonons through magnetization and vice versa. Additionally, our work introduces new possibilities for controlling chiral Weyl fermions using chiral phonons.
- Published
- 2024
10. Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent
- Author
-
Sun, Xingwu, Chen, Yanfeng, Huang, Yiqing, Xie, Ruobing, Zhu, Jiaqi, Zhang, Kai, Li, Shuaipeng, Yang, Zhen, Han, Jonny, Shu, Xiaobo, Bu, Jiahao, Chen, Zhongzhi, Huang, Xuemeng, Lian, Fengzong, Yang, Saiyong, Yan, Jianfeng, Zeng, Yuyuan, Ren, Xiaoqin, Yu, Chao, Wu, Lulu, Mao, Yue, Xia, Jun, Yang, Tao, Zheng, Suncong, Wu, Kan, Jiao, Dian, Xue, Jinbao, Zhang, Xipeng, Wu, Decheng, Liu, Kai, Wu, Dengpeng, Xu, Guanghui, Chen, Shaohua, Chen, Shuang, Feng, Xiao, Hong, Yigeng, Zheng, Junqiang, Xu, Chengcheng, Li, Zongwei, Kuang, Xiong, Hu, Jianglu, Chen, Yiqi, Deng, Yuchi, Li, Guiyang, Liu, Ao, Zhang, Chenchen, Hu, Shihui, Zhao, Zilong, Wu, Zifan, Ding, Yao, Wang, Weichao, Liu, Han, Wang, Roberts, Fei, Hao, Yu, Peijie, Zhao, Ze, Cao, Xun, Wang, Hai, Xiang, Fusheng, Huang, Mengyuan, Xiong, Zhiyuan, Hu, Bin, Hou, Xuebin, Jiang, Lei, Ma, Jianqiang, Wu, Jiajia, Deng, Yaping, Shen, Yi, Wang, Qian, Liu, Weijie, Liu, Jie, Chen, Meng, Dong, Liang, Jia, Weiwen, Chen, Hu, Liu, Feifei, Yuan, Rui, Xu, Huilin, Yan, Zhenxiang, Cao, Tengfei, Hu, Zhichao, Feng, Xinhua, Du, Dong, Yu, Tinghao, Tao, Yangyu, Zhang, Feng, Zhu, Jianchen, Xu, Chengzhong, Li, Xirui, Zha, Chong, Ouyang, Wen, Xia, Yinben, Li, Xiang, He, Zekun, Chen, Rongpeng, Song, Jiawei, Chen, Ruibin, Jiang, Fan, Zhao, Chongqing, Wang, Bo, Gong, Hao, Gan, Rong, Hu, Winston, Kang, Zhanhui, Yang, Yong, Liu, Yuhong, Wang, Di, and Jiang, Jie
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large, Comment: 17 pages, 4 Figures
- Published
- 2024
11. TableGPT2: A Large Multimodal Model with Tabular Data Integration
- Author
-
Su, Aofeng, Wang, Aowen, Ye, Chao, Zhou, Chen, Zhang, Ga, Chen, Gang, Zhu, Guangcheng, Wang, Haobo, Xu, Haokai, Chen, Hao, Li, Haoze, Lan, Haoxuan, Tian, Jiaming, Yuan, Jing, Zhao, Junbo, Zhou, Junlin, Shou, Kaizhe, Zha, Liangyu, Long, Lin, Li, Liyao, Wu, Pengzuo, Zhang, Qi, Huang, Qingyi, Yang, Saisai, Zhang, Tao, Ye, Wentao, Zhu, Wufang, Hu, Xiaomeng, Gu, Xijun, Sun, Xinjie, Li, Xiang, Yang, Yuhang, and Xiao, Zhiqing
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Databases - Abstract
The emergence of models like GPTs, Claude, LLaMA, and Qwen has reshaped AI applications, presenting vast new opportunities across industries. Yet, the integration of tabular data remains notably underdeveloped, despite its foundational role in numerous real-world domains. This gap is critical for three main reasons. First, database or data warehouse data integration is essential for advanced applications; second, the vast and largely untapped resource of tabular data offers immense potential for analysis; and third, the business intelligence domain specifically demands adaptable, precise solutions that many current LLMs may struggle to provide. In response, we introduce TableGPT2, a model rigorously pre-trained and fine-tuned with over 593.8K tables and 2.36M high-quality query-table-output tuples, a scale of table-related data unprecedented in prior research. This extensive training enables TableGPT2 to excel in table-centric tasks while maintaining strong general language and coding abilities. One of TableGPT2's key innovations is its novel table encoder, specifically designed to capture schema-level and cell-level information. This encoder strengthens the model's ability to handle ambiguous queries, missing column names, and irregular tables commonly encountered in real-world applications. Similar to visual language models, this pioneering approach integrates with the decoder to form a robust large multimodal model. We believe the results are compelling: over 23 benchmarking metrics, TableGPT2 achieves an average performance improvement of 35.20% in the 7B model and 49.32% in the 72B model over prior benchmark-neutral LLMs, with robust general-purpose capabilities intact.
- Published
- 2024
12. TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image
- Author
-
Li, Xiang, Liu, Mingsi, and Duan, Lixin
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Purpose: Automatic and accurate segmentation of fundus vessel images has become an essential prerequisite for computer-aided diagnosis of ophthalmic diseases such as diabetes mellitus. The task of high-precision retinal vessel segmentation still faces difficulties due to the low contrast between the branch ends of retinal vessels and the background, the long and thin vessel span, and the variable morphology of the optic disc and optic cup in fundus vessel images. Methods: We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext, which integrates an Efficient Self-attention Mechanism into the encoder and decoder of U-Net to capture both local features and global dependencies with minimal computational overhead. Meanwhile, the Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences. Inspired by ConvNeXt, TransNeXt Block is designed to optimize the computational complexity of each base block in U-Net and avoid the information loss caused by the compressed dimension when the information is converted between the feature spaces of different dimensions. Results: We evaluated the proposed method on four public datasets DRIVE, STARE, CHASE-DB1, and HRF. In the experimental results, the AUC (area under the ROC curve) values were 0.9867, 0.9869, 0.9910, and 0.9887, which exceeded the other state-of-the-art.
- Published
- 2024
13. ARN-LSTM: A Multi-Stream Attention-Based Model for Action Recognition with Temporal Dynamics
- Author
-
Wang, Chuanchuan, Mohmamed, Ahmad Sufril Azlan, Yang, Xiao, and Li, Xiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
This paper presents ARN-LSTM, a novel multi-stream action recognition model designed to address the challenge of simultaneously capturing spatial motion and temporal dynamics in action sequences. Traditional methods often focus solely on spatial or temporal features, limiting their ability to comprehend complex human activities fully. Our proposed model integrates joint, motion, and temporal information through a multi-stream fusion architecture. Specifically, it comprises a joint stream for extracting skeleton features, a temporal stream for capturing dynamic temporal features, and an ARN-LSTM block that utilizes Time-Distributed Long Short-Term Memory (TD-LSTM) layers followed by an Attention Relation Network (ARN) to model temporal relations. The outputs from these streams are fused in a fully connected layer to provide the final action prediction. Evaluations on the NTU RGB+D 60 and NTU RGB+D 120 datasets demonstrate the effectiveness of our model, achieving effective performance, particularly in group activity recognition.
- Published
- 2024
14. Attosecond Coherent Electron Motion in a Photoionized Aromatic Molecule
- Author
-
Driver, Taran, Guo, Zhaoheng, Isele, Erik, Grell, Gilbert, Ruberti, Marco, ONeal, Jordan T., Alexander, Oliver, Beauvarlet, Sandra, Cesar, David, Duris, Joseph, Garratt, Douglas, Larsen, Kirk A., Li, Siqi, Kolorenč, Přemysl, McCracken, Gregory A., Tuthill, Daniel, Wang, Zifan, Berrah, Nora, Bostedt, Christoph, Borne, Kurtis, Cheng, Xinxin, DiMauro, Louis F., Doumy, Gilles, Franz, Paris L., Kamalov, Andrei, Li, Xiang, Lin, Ming-Fu, Obaid, Razib, Picón, Antonio, Robles, River R., Rolles, Daniel, Rudenko, Artem, Shaikh, Moniruzzaman, Slaughter, Daniel S., Sudar, Nicholas S., Thierstein, Emily, Ueda, Kiyoshi, Wang, Enliang, Wang, Anna L., Weber, Thorsten, Wolf, Thomas J. A., Young, Linda, Zhang, Zhen, Averbukh, Vitali, Gessner, Oliver, Bucksbaum, Philip H., Kling, Matthias F., Palacios, Alicia, Martín, Fernando, Marangos, Jon P., Walter, Peter, Marinelli, Agostino, and Cryan, James P.
- Subjects
Physics - Chemical Physics - Abstract
In molecular systems, the ultrafast motion of electrons initiates the process of chemical change. Tracking this electronic motion across molecules requires coupling attosecond time resolution to atomic-scale spatial sensitivity. In this work, we employ a pair of attosecond x-ray pulses from an x-ray free-electron laser to follow electron motion resulting from the sudden removal of an electron from a prototypical aromatic system, para-aminophenol. X-ray absorption enables tracking this motion with atomic-site specificity. Our measurements are compared with state-of-the-art computational modeling, reproducing the observed response across multiple timescales. Sub-femtosecond dynamics are assigned to states undergoing non-radiative decay, while few-femtosecond oscillatory motion is associated with electronic wavepacket motion in stable cation states, that will eventually couple to nuclear motion. Our work provides insight on the ultrafast charge motion preceding and initiating chemical transformations in moderately complex systems, and provides a powerful benchmark for computational models of ultrafast charge motion in matter.
- Published
- 2024
15. Multi-Channel Hypergraph Contrastive Learning for Matrix Completion
- Author
-
Li, Xiang, Shui, Changsheng, Yu, Yanwei, Huang, Chao, Zhao, Zhongying, and Dong, Junyu
- Subjects
Computer Science - Machine Learning ,Computer Science - Information Retrieval - Abstract
Rating is a typical user explicit feedback that visually reflects how much a user likes a related item. The (rating) matrix completion is essentially a rating prediction process, which is also a significant problem in recommender systems. Recently, graph neural networks (GNNs) have been widely used in matrix completion, which captures users' preferences over items by formulating a rating matrix as a bipartite graph. However, existing methods are susceptible due to data sparsity and long-tail distribution in real-world scenarios. Moreover, the messaging mechanism of GNNs makes it difficult to capture high-order correlations and constraints between nodes, which are essentially useful in recommendation tasks. To tackle these challenges, we propose a Multi-Channel Hypergraph Contrastive Learning framework for matrix completion, named MHCL. Specifically, MHCL adaptively learns hypergraph structures to capture high-order correlations between nodes and jointly captures local and global collaborative relationships through attention-based cross-view aggregation. Additionally, to consider the magnitude and order information of ratings, we treat different rating subgraphs as different channels, encourage alignment between adjacent ratings, and further achieve the mutual enhancement between different ratings through multi-channel cross-rating contrastive learning. Extensive experiments on five public datasets demonstrate that the proposed method significantly outperforms the current state-of-the-art approaches.
- Published
- 2024
16. MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
- Author
-
Guo, Jingming, Liu, Yan, Meng, Yu, Tao, Zhiwei, Liu, Banglan, Chen, Gang, and Li, Xiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
The Mixture of Experts (MoE) is an advanced model architecture in the industry that combines multiple specialized expert models from various domains into a single supermodel. This approach enables the model to scale without significantly increasing the computational costs of training and inference, while maximizing model performance. However, current distributed training frameworks do not consider the ultimate optimization of communication, especially for large base models. This paper proposes a network-traffic-aware parallel optimization method that selects the optimal parallel strategy based on the communication volume, and the training cluster's inter-node and intra-node network topologies. Compared to the DeepSpeed, MoNTA achieves an 8x increase in AllToAll communication performance under 8-card tensor parallelism. Compared to the baseline, training a 2x70B model using 16 A800 cards, with an 8K sequence, results in a 13% overall latency performance improvement. Project Page: https://github.com/EnflameTechnology/DeepSpeed.
- Published
- 2024
17. HopTrack: A Real-time Multi-Object Tracking System for Embedded Devices
- Author
-
Li, Xiang, Chen, Cheng, Lou, Yuan-yao, Abdallah, Mustafa, Kim, Kwang Taik, and Bagchi, Saurabh
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-Object Tracking (MOT) poses significant challenges in computer vision. Despite its wide application in robotics, autonomous driving, and smart manufacturing, there is limited literature addressing the specific challenges of running MOT on embedded devices. State-of-the-art MOT trackers designed for high-end GPUs often experience low processing rates (<11fps) when deployed on embedded devices. Existing MOT frameworks for embedded devices proposed strategies such as fusing the detector model with the feature embedding model to reduce inference latency or combining different trackers to improve tracking accuracy, but tend to compromise one for the other. This paper introduces HopTrack, a real-time multi-object tracking system tailored for embedded devices. Our system employs a novel discretized static and dynamic matching approach along with an innovative content-aware dynamic sampling technique to enhance tracking accuracy while meeting the real-time requirement. Compared with the best high-end GPU modified baseline Byte (Embed) and the best existing baseline on embedded devices MobileNet-JDE, HopTrack achieves a processing speed of up to 39.29 fps on NVIDIA AGX Xavier with a multi-object tracking accuracy (MOTA) of up to 63.12% on the MOT16 benchmark, outperforming both counterparts by 2.15% and 4.82%, respectively. Additionally, the accuracy improvement is coupled with the reduction in energy consumption (20.8%), power (5%), and memory usage (8%), which are crucial resources on embedded devices. HopTrack is also detector agnostic allowing the flexibility of plug-and-play.
- Published
- 2024
18. DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models
- Author
-
Sun, Haifeng, Zhang, Lan, and Li, Xiang-Yang
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Machine Learning - Abstract
As intellectual property rights, the copyright protection of deep models is becoming increasingly important. Existing work has made many attempts at model watermarking and fingerprinting, but they have ignored homologous models trained with similar structures or training datasets. We highlight challenges in efficiently querying black-box piracy models to protect model copyrights without misidentifying homologous models. To address these challenges, we propose a novel method called DeepCore, which discovers that the classification confidence of the model is positively correlated with the distance of the predicted sample from the model decision boundary and piracy models behave more similarly at high-confidence classified sample points. Then DeepCore constructs core points far away from the decision boundary by optimizing the predicted confidence of a few sample points and leverages behavioral discrepancies between piracy and homologous models to identify piracy models. Finally, we design different model identification methods, including two similarity-based methods and a clustering-based method to identify piracy models using models' predictions of core points. Extensive experiments show the effectiveness of DeepCore in identifying various piracy models, achieving lower missed and false identification rates, and outperforming state-of-the-art methods., Comment: 9 pages
- Published
- 2024
19. Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure
- Author
-
Li, Xiang, Dai, Yixiang, and Qu, Qing
- Subjects
Computer Science - Machine Learning ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing ,Electrical Engineering and Systems Science - Signal Processing - Abstract
In this work, we study the generalizability of diffusion models by looking into the hidden properties of the learned score functions, which are essentially a series of deep denoisers trained on various noise levels. We observe that as diffusion models transition from memorization to generalization, their corresponding nonlinear diffusion denoisers exhibit increasing linearity. This discovery leads us to investigate the linear counterparts of the nonlinear diffusion models, which are a series of linear models trained to match the function mappings of the nonlinear diffusion denoisers. Surprisingly, these linear denoisers are approximately the optimal denoisers for a multivariate Gaussian distribution characterized by the empirical mean and covariance of the training dataset. This finding implies that diffusion models have the inductive bias towards capturing and utilizing the Gaussian structure (covariance information) of the training dataset for data generation. We empirically demonstrate that this inductive bias is a unique property of diffusion models in the generalization regime, which becomes increasingly evident when the model's capacity is relatively small compared to the training dataset size. In the case that the model is highly overparameterized, this inductive bias emerges during the initial training phases before the model fully memorizes its training data. Our study provides crucial insights into understanding the notable strong generalization phenomenon recently observed in real-world diffusion models.
- Published
- 2024
20. EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
- Author
-
Kim, Sekeun, Jin, Pengfei, Song, Sifan, Chen, Cheng, Li, Yiwei, Ren, Hui, Li, Xiang, Liu, Tianming, and Li, Quanzheng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Foundation models have recently gained significant attention because of their generalizability and adaptability across multiple tasks and data distributions. Although medical foundation models have emerged, solutions for cardiac imaging, especially echocardiography videos, are still unexplored. In this paper, we introduce EchoFM, a foundation model specifically designed to represent and analyze echocardiography videos. In EchoFM, we propose a self-supervised learning framework that captures both spatial and temporal variability patterns through a spatio-temporal consistent masking strategy and periodic-driven contrastive learning. This framework can effectively capture the spatio-temporal dynamics of echocardiography and learn the representative video features without any labels. We pre-train our model on an extensive dataset comprising over 290,000 echocardiography videos covering 26 scan views across different imaging modes, with up to 20 million frames of images. The pre-trained EchoFM can then be easily adapted and fine-tuned for a variety of downstream tasks, serving as a robust backbone model. Our evaluation was systemically designed for four downstream tasks after the echocardiography examination routine. Experiment results show that EchoFM surpasses state-of-the-art methods, including specialized echocardiography methods, self-supervised pre-training models, and general-purposed pre-trained foundation models, across all downstream tasks.
- Published
- 2024
21. Non-contact Dexterous Micromanipulation with Multiple Optoelectronic Robots
- Author
-
Jia, Yongyi, Miao, Shu, Wang, Ao, Ni, Caiding, Feng, Lin, Wang, Xiaowo, and Li, Xiang
- Subjects
Computer Science - Robotics - Abstract
Micromanipulation systems leverage automation and robotic technologies to improve the precision, repeatability, and efficiency of various tasks at the microscale. However, current approaches are typically limited to specific objects or tasks, which necessitates the use of custom tools and specialized grasping methods. This paper proposes a novel non-contact micromanipulation method based on optoelectronic technologies. The proposed method utilizes repulsive dielectrophoretic forces generated in the optoelectronic field to drive a microrobot, enabling the microrobot to push the target object in a cluttered environment without physical contact. The non-contact feature can minimize the risks of potential damage, contamination, or adhesion while largely improving the flexibility of manipulation. The feature enables the use of a general tool for indirect object manipulation, eliminating the need for specialized tools. A series of simulation studies and real-world experiments -- including non-contact trajectory tracking, obstacle avoidance, and reciprocal avoidance between multiple microrobots -- are conducted to validate the performance of the proposed method. The proposed formulation provides a general and dexterous solution for a range of objects and tasks at the micro scale., Comment: 8 pages, 10 figures
- Published
- 2024
22. Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models
- Author
-
Luo, Kangyang, Ding, Zichen, Weng, Zhenmin, Qiao, Lingfeng, Zhao, Meng, Li, Xiang, Yin, Di, and Shu, Jinlong
- Subjects
Computer Science - Computation and Language - Abstract
While Chain of Thought (CoT) prompting approaches have significantly consolidated the reasoning capabilities of large language models (LLMs), they still face limitations that require extensive human effort or have performance needs to be improved. Existing endeavors have focused on bridging these gaps; however, these approaches either hinge on external data and cannot completely eliminate manual effort, or they fall short in effectively directing LLMs to generate high-quality exemplary prompts. To address the said pitfalls, we propose a novel prompt approach for automatic reasoning named \textbf{LBS3}, inspired by curriculum learning which better reflects human learning habits. Specifically, LBS3 initially steers LLMs to recall easy-to-hard proxy queries that are pertinent to the target query. Following this, it invokes a progressive strategy that utilizes exemplary prompts stemmed from easy-proxy queries to direct LLMs in solving hard-proxy queries, enabling the high-quality of the proxy solutions. Finally, our extensive experiments in various reasoning-intensive tasks with varying open- and closed-source LLMs show that LBS3 achieves strongly competitive performance compared to the SOTA baselines.
- Published
- 2024
23. Novel Object Synthesis via Adaptive Text-Image Harmony
- Author
-
Xiong, Zeren, Zhang, Zedong, Chen, Zikun, Chen, Shuo, Li, Xiang, Sun, Gan, Yang, Jian, and Li, Jun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image. However, most diffusion models struggle with this task, \textit{i.e.}, often generating an object that predominantly reflects either the text or the image due to an imbalance between their inputs. To address this issue, we propose a simple yet effective method called Adaptive Text-Image Harmony (ATIH) to generate novel and surprising objects. First, we introduce a scale factor and an injection step to balance text and image features in cross-attention and to preserve image information in self-attention during the text-image inversion diffusion process, respectively. Second, to better integrate object text and image, we design a balanced loss function with a noise parameter, ensuring both optimal editability and fidelity of the object image. Third, to adaptively adjust these parameters, we present a novel similarity score function that not only maximizes the similarities between the generated object image and the input text/image but also balances these similarities to harmonize text and image integration. Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations such as colobus-glass jar. Project page: https://xzr52.github.io/ATIH/., Comment: NeurIPS2024
- Published
- 2024
24. Langevin deformation for R\'enyi entropy on Wasserstein space over Riemannian manifolds
- Author
-
Lei, Rong, Li, Songzi, and Li, Xiang-Dong
- Subjects
Mathematics - Probability - Abstract
We introduce the Langevin deformation for the R\'enyi entropy on the $L^2$-Wasserstein space over $\mathbb{R}^n$ or a Riemannian manifold, which interpolates between the porous medium equation and the Benamou-Brenier geodesic flow on the $L^2$-Wasserstein space and can be regarded as the compressible Euler equations for isentropic gas with damping. We prove the $W$-entropy-information formulae and the the rigidity theorems for the Langevin deformation for the R\'enyi entropy on the Wasserstein space over complete Riemannian manifolds with non-negative Ricci curvature or CD$(0, m)$-condition. Moreover, we prove the monotonicity of the Hamiltonian and the convexity of the Lagrangian along the Langevin deformation of flows. Finally, we prove the convergence of the Langevin deformation for the R\'enyi entropy as $c\rightarrow 0$ and $c\rightarrow \infty$ respectively. Our results are new even in the case of Euclidean spaces and compact or complete Riemannian manifolds with non-negative Ricci curvature.
- Published
- 2024
25. Swarm manipulation: An efficient and accurate technique for multi-object manipulation in virtual reality
- Author
-
Li, Xiang, Wang, Jin-Du, Dudley, John J., and Kristensson, Per Ola
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Robotics - Abstract
The theory of swarm control shows promise for controlling multiple objects, however, scalability is hindered by cost constraints, such as hardware and infrastructure. Virtual Reality (VR) can overcome these limitations, but research on swarm interaction in VR is limited. This paper introduces a novel Swarm Manipulation interaction technique and compares it with two baseline techniques: Virtual Hand and Controller (ray-casting). We evaluated these techniques in a user study ($N$ = 12) in three tasks (selection, rotation, and resizing) across five conditions. Our results indicate that Swarm Manipulation yielded superior performance, with significantly faster speeds in most conditions across the three tasks. It notably reduced resizing size deviations but introduced a trade-off between speed and accuracy in the rotation task. Additionally, we conducted a follow-up user study ($N$ = 6) using Swarm Manipulation in two complex VR scenarios and obtained insights through semi-structured interviews, shedding light on optimized swarm control mechanisms and perceptual changes induced by this interaction paradigm. These results demonstrate the potential of the Swarm Manipulation technique to enhance the usability and user experience in VR compared to conventional manipulation techniques. In future studies, we aim to understand and improve swarm interaction via internal swarm particle cooperation., Comment: 15 pages, accepted at Computers & Graphics
- Published
- 2024
26. Cross-model Control: Improving Multiple Large Language Models in One-time Training
- Author
-
Wu, Jiayi, Sun, Hao, Cai, Hengyi, Su, Lixin, Wang, Shuaiqiang, Yin, Dawei, Li, Xiang, and Gao, Ming
- Subjects
Computer Science - Computation and Language - Abstract
The number of large language models (LLMs) with varying parameter scales and vocabularies is increasing. While they deliver powerful performance, they also face a set of common optimization needs to meet specific requirements or standards, such as instruction following or avoiding the output of sensitive information from the real world. However, how to reuse the fine-tuning outcomes of one model to other models to reduce training costs remains a challenge. To bridge this gap, we introduce Cross-model Control (CMC), a method that improves multiple LLMs in one-time training with a portable tiny language model. Specifically, we have observed that the logit shift before and after fine-tuning is remarkably similar across different models. Based on this insight, we incorporate a tiny language model with a minimal number of parameters. By training alongside a frozen template LLM, the tiny model gains the capability to alter the logits output by the LLMs. To make this tiny language model applicable to models with different vocabularies, we propose a novel token mapping strategy named PM-MinED. We have conducted extensive experiments on instruction tuning and unlearning tasks, demonstrating the effectiveness of CMC. Our code is available at https://github.com/wujwyi/CMC., Comment: Accepted by NeurIPS 2024
- Published
- 2024
27. Can Large Language Models Act as Ensembler for Multi-GNNs?
- Author
-
Duan, Hanqi, Cheng, Yao, Yu, Jianxiang, and Li, Xiang
- Subjects
Computer Science - Artificial Intelligence - Abstract
Graph Neural Networks (GNNs) have emerged as powerful models for learning from graph-structured data. However, GNNs lack the inherent semantic understanding capability of rich textual nodesattributes, limiting their effectiveness in applications. On the other hand, we empirically observe that for existing GNN models, no one can consistently outperforms others across diverse datasets. In this paper, we study whether LLMs can act as an ensembler for multi-GNNs and propose the LensGNN model. The model first aligns multiple GNNs, mapping the representations of different GNNs into the same space. Then, through LoRA fine-tuning, it aligns the space between the GNN and the LLM, injecting graph tokens and textual information into LLMs. This allows LensGNN to integrate multiple GNNs and leverage LLM's strengths, resulting in better performance. Experimental results show that LensGNN outperforms existing models. This research advances text-attributed graph ensemble learning by providing a robust, superior solution for integrating semantic and structural information. We provide our code and data here: https://anonymous.4open.science/r/EnsemGNN-E267/.
- Published
- 2024
28. Evolution of Cataclysmic Variables with Binary-Driven Mass-Loss during Nova Eruptions
- Author
-
Tang, Wen-Shi, Li, Xiang-dong, and Cui, Zhe
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
The discrepancies between observations and theoretical predictions of cataclysmic variables (CVs) suggest that there exists unknown angular momentum loss mechanism(s) besides magnetic braking and gravitational radiation. Mass loss due to nova eruptions belongs to the most likely candidates. While standard theory assumes that mass is lost in the form of radiation driven, optically thick wind (fast wind; FW), recent numerical simulations indicate that most of the mass loss is initiated and shaped by binary interaction. We explore the effect of this binary-driven mass-loss (BDML) on the CV evolutions assuming a major fraction of the lost mass leaves the system from the outer Lagrangian point. Different from the traditional continuous wind picture, we consider the mass loss process to be instantaneous, because the duration of nova eruptions is much shorter than the binary evolutionary timescale. Our detailed binary evolution calculations reveal the following results. (1) BDML seems able to provide extra angular momentum loss below the period gap. The mass transfer rates at a given orbital period occupy a large range, in agreement with the observed secular mass transfer rate distribution in CVs. (2) The enhanced mass transfer rates do not lead to runaway mass transfer process, and allow the white dwarfs to grow mass $\lesssim 0.1\,M_{\odot}$. (3) BDML can cause both positive and negative variations of the orbital period induced by nova eruptions, in line with observations, and can potentially explain the properties of some peculiar supersoft X-ray sources likely CAL 87, 1E 0035.4$-$7230, and RX J0537.7$-$7034., Comment: 18 pages, 10 figures, Accepted by ApJ
- Published
- 2024
29. On the Diversity of Synthetic Data and its Impact on Training Large Language Models
- Author
-
Chen, Hao, Waheed, Abdul, Li, Xiang, Wang, Yidong, Wang, Jindong, Raj, Bhiksha, and Abdin, Marah I.
- Subjects
Computer Science - Computation and Language - Abstract
The rise of Large Language Models (LLMs) has accentuated the need for diverse, high-quality pre-training data. Synthetic data emerges as a viable solution to the challenges of data scarcity and inaccessibility. While previous literature has focused predominantly on the quality and quantity of real data, our work enables the measurement of diversity in synthetic data and explores its impact on LLM performance. We study the downstream effects of synthetic data diversity during both the pre-training and fine-tuning stages by introducing a new diversity metric, \textit{LLM cluster-agent}, designed to evaluate the diversity of synthetic datasets. Through a series of controlled experiments with models of 350M and 1.4B parameters, we demonstrate that the proposed cluster-based LLM scoring of diversity correlates positively with both pre-training and supervised fine-tuning performance. Our findings also reveal that synthetic data diversity in pre-training affects supervised fine-tuning more significantly than pre-training itself, even for smaller models. We hope this study advances our understanding of the optimal use of synthetic data in LLM training and opens new avenues for efficient data generation processes.
- Published
- 2024
30. DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain
- Author
-
Wang, Kun, Yan, Zhiqiang, Fan, Junkai, Zhu, Wanlu, Li, Xiang, Li, Jun, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task. Moving beyond conventional pixel-wise depth estimation in the spatial domain, our approach estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain. This unique formulation allows for the modeling of local depth correlations within each patch. Crucially, the frequency transformation segregates the depth information into various frequency components, with low-frequency components encapsulating the core scene structure and high-frequency components detailing the finer aspects. This decomposition forms the basis of our progressive strategy, which begins with the prediction of low-frequency components to establish a global scene context, followed by successive refinement of local details through the prediction of higher-frequency components. We conduct comprehensive experiments on NYU-Depth-V2, TOFDC, and KITTI datasets, and demonstrate the state-of-the-art performance of DCDepth. Code is available at https://github.com/w2kun/DCDepth., Comment: Accepted by NeurIPS-2024
- Published
- 2024
31. Hierarchical Conditional Multi-Task Learning for Streamflow Modeling
- Author
-
Xu, Shaoming, Renganathan, Arvind, Khandelwal, Ankush, Ghosh, Rahul, Li, Xiang, Liu, Licheng, Tayal, Kshitij, Harrington, Peter, Jia, Xiaowei, Jin, Zhenong, Nieber, Jonh, and Kumar, Vipin
- Subjects
Computer Science - Machine Learning - Abstract
Streamflow, vital for water resource management, is governed by complex hydrological systems involving intermediate processes driven by meteorological forces. While deep learning models have achieved state-of-the-art results of streamflow prediction, their end-to-end single-task learning approach often fails to capture the causal relationships within these systems. To address this, we propose Hierarchical Conditional Multi-Task Learning (HCMTL), a hierarchical approach that jointly models soil water and snowpack processes based on their causal connections to streamflow. HCMTL utilizes task embeddings to connect network modules, enhancing flexibility and expressiveness while capturing unobserved processes beyond soil water and snowpack. It also incorporates the Conditional Mini-Batch strategy to improve long time series modeling. We compare HCMTL with five baselines on a global dataset. HCMTL's superior performance across hundreds of drainage basins over extended periods shows that integrating domain-specific causal knowledge into deep learning enhances both prediction accuracy and interpretability. This is essential for advancing our understanding of complex hydrological systems and supporting efficient water resource management to mitigate natural disasters like droughts and floods.
- Published
- 2024
32. Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum
- Author
-
Haque, Nashrah, Li, Xiang, Chen, Zhehui, Wu, Yanzhao, Yu, Lei, Iyengar, Arun, and Wei, Wenqi
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE), for generating adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label. Our method leverages the text-to-image generation capabilities of the Stable Diffusion model by manipulating token embeddings corresponding to the specified class in its latent space. These token embeddings guide the generation of adversarial images that maintain high visual fidelity. The SD-MIAE framework consists of two phases: (1) an initial adversarial optimization phase that modifies token embeddings to produce misclassified yet natural-looking images and (2) a momentum-based optimization phase that refines the adversarial perturbations. By introducing momentum, our approach stabilizes the optimization of perturbations across iterations, enhancing both the misclassification rate and visual fidelity of the generated adversarial examples. Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method while preserving the imperceptibility of adversarial perturbations and the semantic similarity to the original class label, making it a practical method for robust adversarial evaluation., Comment: 10 pages, 12 figures. To be published in IEEE TPS 2024 Proceedings. Code available on GitHub: https://github.com/nashrahhaque/SD-MIAE
- Published
- 2024
33. S$^4$ST: A Strong, Self-transferable, faSt, and Simple Scale Transformation for Transferable Targeted Attack
- Author
-
Liu, Yongxiang, Peng, Bowen, Liu, Li, and Li, Xiang
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
Transferable targeted adversarial attacks (TTAs) against deep neural networks have been proven significantly more challenging than untargeted ones, yet they remain relatively underexplored. This paper sheds new light on performing highly efficient yet transferable targeted attacks leveraging the simple gradient-based baseline. Our research underscores the critical importance of image transformations within gradient calculations, marking a shift from the prevalent emphasis on loss functions to address the gradient vanishing problem. Moreover, we have developed two effective blind estimators that facilitate the design of transformation strategies to enhance targeted transferability under black-box conditions. The adversarial examples' self-transferability to geometric transformations has been identified as strongly correlated with their black-box transferability, featuring these basic operations as potent yet overlapped proxies for facilitating targeted transferability. The surrogate self-alignment assessments further highlight simple scaling transformation's exceptional efficacy, which rivals that of most advanced methods. Building on these insights, we introduce a scaling-centered transformation strategy termed Strong, Self-transferable, faSt, and Simple Scale Transformation (S4ST) to enhance transferable targeted attacks. In experiments conducted on the ImageNet-Compatible benchmark dataset, our proposed S4ST attains a SOTA average targeted transfer success rate across various challenging black-box models, outperforming the previous leading method by over 14% while requiring only 25% of the execution time. Additionally, our approach eclipses SOTA attacks considerably and exhibits remarkable effectiveness against real-world APIs. This work marks a significant leap forward in TTAs, revealing the realistic threats they pose and providing a practical generation method for future research., Comment: 16 pages, 18 figures
- Published
- 2024
34. Retrieval Instead of Fine-tuning: A Retrieval-based Parameter Ensemble for Zero-shot Learning
- Author
-
Jin, Pengfei, Shu, Peng, Kim, Sekeun, Xiao, Qing, Song, Sifan, Chen, Cheng, Liu, Tianming, Li, Xiang, and Li, Quanzheng
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Foundation models have become a cornerstone in deep learning, with techniques like Low-Rank Adaptation (LoRA) offering efficient fine-tuning of large models. Similarly, methods such as Retrieval-Augmented Generation (RAG), which leverage vectorized databases, have further improved model performance by grounding outputs in external information. While these approaches have demonstrated notable success, they often require extensive training or labeled data, which can limit their adaptability in resource-constrained environments. To address these challenges, we introduce Retrieval-based Parameter Ensemble (RPE), a new method that creates a vectorized database of LoRAs, enabling efficient retrieval and application of model adaptations to new tasks. RPE minimizes the need for extensive training and eliminates the requirement for labeled data, making it particularly effective for zero-shot learning. Additionally, RPE is well-suited for privacy-sensitive domains like healthcare, as it modifies model parameters without accessing raw data. When applied to tasks such as medical report generation and image segmentation, RPE not only proved effective but also surpassed supervised fine-tuning methods in certain cases, highlighting its potential to enhance both computational efficiency and privacy in deep learning applications.
- Published
- 2024
35. EG-SpikeFormer: Eye-Gaze Guided Transformer on Spiking Neural Networks for Medical Image Analysis
- Author
-
Pan, Yi, Jiang, Hanqi, Chen, Junhao, Li, Yiwei, Zhao, Huaqin, Zhou, Yifan, Shu, Peng, Wu, Zihao, Liu, Zhengliang, Zhu, Dajiang, Li, Xiang, Abate, Yohannes, and Liu, Tianming
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Neural and Evolutionary Computing - Abstract
Neuromorphic computing has emerged as a promising energy-efficient alternative to traditional artificial intelligence, predominantly utilizing spiking neural networks (SNNs) implemented on neuromorphic hardware. Significant advancements have been made in SNN-based convolutional neural networks (CNNs) and Transformer architectures. However, neuromorphic computing for the medical imaging domain remains underexplored. In this study, we introduce EG-SpikeFormer, an SNN architecture tailored for clinical tasks that incorporates eye-gaze data to guide the model's attention to the diagnostically relevant regions in medical images. Our developed approach effectively addresses shortcut learning issues commonly observed in conventional models, especially in scenarios with limited clinical data and high demands for model reliability, generalizability, and transparency. Our EG-SpikeFormer not only demonstrates superior energy efficiency and performance in medical image prediction tasks but also enhances clinical relevance through multi-modal information alignment. By incorporating eye-gaze data, the model improves interpretability and generalization, opening new directions for applying neuromorphic computing in healthcare.
- Published
- 2024
36. Octopus Inspired Optimization Algorithm: Multi-Level Structures and Parallel Computing Strategies
- Author
-
Wang, Xu, Xu, Longji, Wang, Yiquan, Dong, Yuhua, Li, Xiang, Deng, Jia, and He, Rui
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
This paper introduces a novel bionic intelligent optimisation algorithm, Octopus Inspired Optimization (OIO) algorithm, which is inspired by the neural structure of octopus, especially its hierarchical and decentralised interaction properties. By simulating the sensory, decision-making, and executive abilities of octopuses, the OIO algorithm adopts a multi-level hierarchical strategy, including tentacles, suckers, individuals and groups, to achieve an effective combination of global and local search. This hierarchical design not only enhances the flexibility and efficiency of the algorithm, but also significantly improves its search efficiency and adaptability. In performance evaluations, including comparisons with existing mainstream intelligent optimisation algorithms, OIO shows faster convergence and higher accuracy, especially when dealing with multimodal functions and high-dimensional optimisation problems. This advantage is even more pronounced as the required minimum accuracy is higher, with the OIO algorithm showing an average speedup of 2.27 times that of conventional particle swarm optimisation (PSO) and 9.63 times that of differential evolution (DE) on multimodal functions. In particular, when dealing with high-dimensional optimisation problems, OIO achieves an average speed of 10.39 times that of DE, demonstrating its superior computational efficiency. In addition, the OIO algorithm also shows a reduction of about $5\%$ in CPU usage efficiency compared to PSO, which is reflected in the efficiency of CPU resource usage also shows its efficiency. These features make the OIO algorithm show great potential in complex optimisation problems, and it is especially suitable for application scenarios that require fast, efficient and robust optimisation methods, such as robot path planning, supply chain management optimisation, and energy system management.
- Published
- 2024
37. PSF Calibration of DAMPE for gamma-ray Observations
- Author
-
Duan, Kai-Kai, Shen, Zhao-Qiang, Xu, Zun-Lei, Jiang, Wei, and Li, Xiang
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena - Abstract
The DArk Matter Particle Explorer (DAMPE) is dedicated to exploring critical scientific domains including the indirect detection of dark matter, cosmic ray physics, and gamma ray astronomy. This study introduces a novel method for calibrating the Point Spread Function (PSF) of DAMPE, specifically designed to enhance the accuracy of gamma-ray observations. By leveraging data from regions near pulsars and bright Active Galactic Nuclei (AGNs), we have refined the PSF calibration process, resulting in an improved angular resolution that closely matches our observational data. This advancement significantly boosts the precision of gamma-ray detection by DAMPE, thereby contributing to its mission objectives in dark matter detection and gamma ray astronomy., Comment: 7 pages, 12 figures and 1 table. Accepted for publication in Astroparticle Physics
- Published
- 2024
- Full Text
- View/download PDF
38. An X-ray Shell Reveals the Supernova Explosion for Galactic Microquasar SS 433
- Author
-
Chi, Yi-Heng, Huang, Jiahui, Zhou, Ping, Feng, Hua, Li, Xiang-Dong, Markoff, Sera B., Safi-Harb, Samar, and Olivera-Nieto, Laura
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Solar and Stellar Astrophysics - Abstract
How black holes are formed remains an open and fundamental question in Astrophysics. Despite theoretical predictions, it lacks observations to understand whether the black hole formation experiences a supernova explosion. Here we report the discovery of an X-ray shell north of the Galactic micro-quasar SS 433 harboring a stellar-mass black hole spatially associated with radio continuum and polarization emissions, and an HI cloud. Its spectrum can be reproduced by a 1-keV under-ionized plasma, from which the shell is inferred to have been created by a supernova explosion 20-30 kyr ago and its properties constitute evidence for canonical SN explosions to create some black holes. Our analysis precludes other possible origins including heated by jets or blown by disk winds. According to the lower mass limit of the compact object in SS 433, we roughly deduced that the progenitor should be more massive than 25 M$_{\odot}$. The existence of such a young remnant in SS 433 can also lead to new insights into the supercritical accretion in young microquasars and the ${\gamma}$-ray emission of this system. The fallback ejecta may provide accretion materials within tens of thousands of years while the shock of the supernova remnant may play a crucial role in the cosmic ray (re)acceleration., Comment: 20 pages, 8 figures, accepted for publication in ApJL
- Published
- 2024
39. Instability in supernova fallback disks and its effect on the formation of ultra long period pulsars
- Author
-
Yang, Hao-Ran, Li, Xiang-Dong, Gao, Shi-Jie, and Xu, Kun
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
Several pulsars with unusually long periods were discovered recently, comprising a potential population of ultra long period pulsars (ULPPs). The origin of their long periodicity is not well understood, but may be related to magnatars spun down by surrounding fallback disks. While there are few systematic investigations on the fallback disk-assisted evolution of magnetars, the instability in the disk has received little attention, which determines the lifetime of the disk. In this work we simulate the evolution of the magnetic field, spin period, and magnetic inclination angle of magnetars with a supernova fallback disk. We find that thermal viscous instability in the disk could significantly affect the formation of ULPPs. Our simulation results also reveal that a large fraction of ULPPs seem to be nearly aligned and orthogonal rotators. This might help place ULPPs above the death line in the pulse period - period derivative plane. However, some extra mechanisms seem to be required to account for radio emission of ULPPs., Comment: 15 pages, 8 figures, accepted for publication in ApJ
- Published
- 2024
40. Quark correlation functions at three-loop order and extraction of splitting functions
- Author
-
Cheng, Chen, Huang, Li-Hong, Li, Xiang, Li, Zheng-Yang, and Ma, Yan-Qing
- Subjects
High Energy Physics - Phenomenology ,High Energy Physics - Lattice - Abstract
We present the first complete next-to-next-to-next-to-leading-order calculation of the matching coefficients that link unpolarized flavor non-singlet parton distribution functions with lattice QCD computable correlation functions. By using this high-order result, we notice a reduction in theoretical uncertainties compared to relying solely on previously known lower-order matching coefficients. Furthermore, based on this result we have extracted the three-loop unpolarized flavor non-singlet splitting function, which is in agreement with the state-of-the-art result. Due to the simplicity of our method, it has the potential to advance the calculation of splitting functions to the desired four-loop order., Comment: 12 pages, 2 figures
- Published
- 2024
41. SONAR: A Synthetic AI-Audio Detection Framework and Benchmark
- Author
-
Li, Xiang, Chen, Pin-Yu, and Wei, Wenqi
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recent advances in Text-to-Speech (TTS) and Voice-Conversion (VC) using generative Artificial Intelligence (AI) technology have made it possible to generate high-quality and realistic human-like audio. This introduces significant challenges to distinguishing AI-synthesized speech from the authentic human voice and could raise potential issues of misuse for malicious purposes such as impersonation and fraud, spreading misinformation, deepfakes, and scams. However, existing detection techniques for AI-synthesized audio have not kept pace and often exhibit poor generalization across diverse datasets. In this paper, we introduce SONAR, a synthetic AI-Audio Detection Framework and Benchmark, aiming to provide a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content. SONAR includes a novel evaluation dataset sourced from 9 diverse audio synthesis platforms, including leading TTS providers and state-of-the-art TTS models. It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based deepfake detection systems. Through extensive experiments, we reveal the generalization limitations of existing detection methods and demonstrate that foundation models exhibit stronger generalization capabilities, which can be attributed to their model size and the scale and quality of pretraining data. Additionally, we explore the effectiveness and efficiency of few-shot fine-tuning in improving generalization, highlighting its potential for tailored applications, such as personalized detection systems for specific entities or individuals. Code and dataset are available at https://github.com/Jessegator/SONAR.
- Published
- 2024
42. ECHOPulse: ECG controlled echocardio-grams video generation
- Author
-
Li, Yiwei, Kim, Sekeun, Wu, Zihao, Jiang, Hanqi, Pan, Yi, Jin, Pengfei, Song, Sifan, Shi, Yucheng, Liu, Tianming, Li, Quanzheng, and Li, Xiang
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Echocardiography (ECHO) is essential for cardiac assessments, but its video quality and interpretation heavily relies on manual expertise, leading to inconsistent results from clinical and portable devices. ECHO video generation offers a solution by improving automated monitoring through synthetic data and generating high-quality videos from routine health data. However, existing models often face high computational costs, slow inference, and rely on complex conditional prompts that require experts' annotations. To address these challenges, we propose ECHOPULSE, an ECG-conditioned ECHO video generation model. ECHOPULSE introduces two key advancements: (1) it accelerates ECHO video generation by leveraging VQ-VAE tokenization and masked visual token modeling for fast decoding, and (2) it conditions on readily accessible ECG signals, which are highly coherent with ECHO videos, bypassing complex conditional prompts. To the best of our knowledge, this is the first work to use time-series prompts like ECG signals for ECHO video generation. ECHOPULSE not only enables controllable synthetic ECHO data generation but also provides updated cardiac function information for disease monitoring and prediction beyond ECG alone. Evaluations on three public and private datasets demonstrate state-of-the-art performance in ECHO video generation across both qualitative and quantitative measures. Additionally, ECHOPULSE can be easily generalized to other modality generation tasks, such as cardiac MRI, fMRI, and 3D CT generation. Demo can seen from \url{https://github.com/levyisthebest/ECHOPulse_Prelease}.
- Published
- 2024
43. ImageFolder: Autoregressive Image Generation with Folded Tokens
- Author
-
Li, Xiang, Qiu, Kai, Chen, Hao, Kuen, Jason, Gu, Jiuxiang, Raj, Bhiksha, and Lin, Zhe
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image tokenizers are crucial for visual generative models, e.g., diffusion models (DMs) and autoregressive (AR) models, as they construct the latent representation for modeling. Increasing token length is a common approach to improve the image reconstruction quality. However, tokenizers with longer token lengths are not guaranteed to achieve better generation quality. There exists a trade-off between reconstruction and generation quality regarding token length. In this paper, we investigate the impact of token length on both image reconstruction and generation and provide a flexible solution to the tradeoff. We propose ImageFolder, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling to improve both generation efficiency and quality. To enhance the representative capability without increasing token length, we leverage dual-branch product quantization to capture different contexts of images. Specifically, semantic regularization is introduced in one branch to encourage compacted semantic information while another branch is designed to capture the remaining pixel-level details. Extensive experiments demonstrate the superior quality of image generation and shorter token length with ImageFolder tokenizer., Comment: Code: https://github.com/lxa9867/ImageFolder
- Published
- 2024
44. The calibrations of DAMPE $\gamma$-ray effective area
- Author
-
Shen, Zhao-Qiang, Li, Wen-Hao, Duan, Kai-Kai, Jiang, Wei, Xu, Zun-Lei, Yue, Chuan, and Li, Xiang
- Subjects
Astrophysics - Instrumentation and Methods for Astrophysics ,Astrophysics - High Energy Astrophysical Phenomena ,Physics - Instrumentation and Detectors - Abstract
The DArk Matter Particle Explorer (DAMPE) is a cosmic-ray detector as well as a pair-converting $\gamma$-ray telescope. The effective area, reflecting the geometrical cross-section area, the $\gamma$-ray conversion probability and the photon selection efficiency, is important in the $\gamma$-ray analyses. In the work, we find a significant time variation in the effective area, as large as $\sim -4\%/{\rm yr}$ at 2 GeV for the high-energy trigger. We derive the data-based correction factors to the effective areas and apply corrections to both the effective areas and the exposure maps. The calibrated exposure can be $\sim 12\%$ smaller than the Monte Carlo one on average at 2 GeV. The calibration is further verified using the observation of the Vela pulsar, showing the spectral parameters with the correction are more consistent with those in the Fermi-LAT catalog than the ones without correction. All the corrections are now implemented in the latest version of the DAMPE $\gamma$-ray analysis toolkit DmpST., Comment: 10 pages, 9 figures and 1 table. Accepted for publication in ApJ
- Published
- 2024
- Full Text
- View/download PDF
45. GTC optical/NIR upper limits and NICER X-ray analysis of SGR J1935+2154 for the outburst in 2022
- Author
-
Shao, Yi-Xuan, Zhou, Ping, Li, Xiang-Dong, Zhang, Bin-Bin, Castro-Tirado, Alberto Javier, Wang, Pei, Li, Di, Zhang, Zeng-Hua, Zhang, Zi-Jian, Hu, You-Dong, and Pandey, Shashi B.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
The Galactic magnetar SGR J1935+2154 has undergone another outburst since 2022 October 10. We present the results of searching for an optical/NIR counterpart of SGR J1935+2154 before and during this outburst. No counterpart was detected at the magnetar's position in ${r'}$ and ${z'}$ bands, providing stringent upper limits of $r'\gtrsim 28.65$ and $z'\gtrsim 26.27$. Using archival X-ray data from NICER, we investigated the properties of the bursts and the spectral evolution of persistent emission. The burst flux $F$ showed a power-law distribution of $N\propto F^{-0.76\pm0.10}$ for flux $\gtrsim 2.6\times 10^{-9}\rm{\ erg\ cm^{-2}\ s^{-1}}$, while the temperature and radius followed a lognormal distribution with $kT=1.63^{+0.73}_{-0.50}\ \rm{keV}$ and $R_{\rm bb}=4.35_{-1.35}^{+1.95}\ \rm{km}$, respectively. The persistent flux evolution experienced a quick decay and an enhancement $\sim 27$ days after the BAT trigger. Using the near-infrared (NIR) and X-ray emission, together with the GTC optical/NIR upper limits, we discussed the origin of the NIR emission from the magnetar based on the fallback disk model and magnetosphere model. We found that either model cannot be ruled out with currently available data. Further mid-infrared observations are needed to find out the mechanism for producing the NIR emission from SGR J1935+2154., Comment: 21 pages, 7 figures, 4 tables. Accepted for publication in ApJ
- Published
- 2024
46. Investigating Creation Perspectives and Icon Placement Preferences for On-Body Menus in Virtual Reality
- Author
-
Li, Xiang, He, Wei, Jin, Shan, Gugenheimer, Jan, Hui, Pan, Liang, Hai-Ning, and Kristensson, Per Ola
- Subjects
Computer Science - Human-Computer Interaction - Abstract
On-body menus present a novel interaction paradigm within Virtual Reality (VR) environments by embedding virtual interfaces directly onto the user's body. Unlike traditional screen-based interfaces, on-body menus enable users to interact with virtual options or icons visually attached to their physical form. In this paper, We investigated the impact of the creation process on the effectiveness of on-body menus, comparing first-person, third-person, and mirror perspectives. Our first study ($N$ = 12) revealed that the mirror perspective led to faster creation times and more accurate recall compared to the other two perspectives. To further explore user preferences, we conducted a second study ($N$ = 18) utilizing a VR system with integrated body tracking. By combining distributions of icons from both studies ($N$ = 30), we confirmed significant preferences in on-body menu placement based on icon category (e.g., Social Media icons were consistently placed on forearms). We also discovered associations between categories, such as Leisure and Social Media icons frequently co-occurring. Our findings highlight the importance of the creation process, uncover user preferences for on-body menu organization, and provide insights to guide the development of intuitive and effective on-body interactions within virtual environments., Comment: 19 pages. PACM HCI: ISS (ACM ISS 2024)
- Published
- 2024
- Full Text
- View/download PDF
47. Volumetric Conditional Score-based Residual Diffusion Model for PET/MR Denoising
- Author
-
Yoon, Siyeop, Hu, Rui, Wang, Yuang, Tivnan, Matthew, Son, Young-don, Wu, Dufan, Li, Xiang, Kim, Kyungsang, and Li, Quanzheng
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
PET imaging is a powerful modality offering quantitative assessments of molecular and physiological processes. The necessity for PET denoising arises from the intrinsic high noise levels in PET imaging, which can significantly hinder the accurate interpretation and quantitative analysis of the scans. With advances in deep learning techniques, diffusion model-based PET denoising techniques have shown remarkable performance improvement. However, these models often face limitations when applied to volumetric data. Additionally, many existing diffusion models do not adequately consider the unique characteristics of PET imaging, such as its 3D volumetric nature, leading to the potential loss of anatomic consistency. Our Conditional Score-based Residual Diffusion (CSRD) model addresses these issues by incorporating a refined score function and 3D patch-wise training strategy, optimizing the model for efficient volumetric PET denoising. The CSRD model significantly lowers computational demands and expedites the denoising process. By effectively integrating volumetric data from PET and MRI scans, the CSRD model maintains spatial coherence and anatomical detail. Lastly, we demonstrate that the CSRD model achieves superior denoising performance in both qualitative and quantitative evaluations while maintaining image details and outperforms existing state-of-the-art methods., Comment: Accepted to MICCAI 2024
- Published
- 2024
48. GrokLST: Towards High-Resolution Benchmark and Toolkit for Land Surface Temperature Downscaling
- Author
-
Dai, Qun, Yuan, Chunyang, Dai, Yimian, Li, Yuxuan, Li, Xiang, Ni, Kang, Xu, Jianhui, Shu, Xiangbo, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Land Surface Temperature (LST) is a critical parameter for environmental studies, but obtaining high-resolution LST data remains challenging due to the spatio-temporal trade-off in satellite remote sensing. Guided LST downscaling has emerged as a solution, but current methods often neglect spatial non-stationarity and lack a open-source ecosystem for deep learning methods. To address these limitations, we propose the Modality-Conditional Large Selective Kernel (MoCoLSK) Networks, a novel architecture that dynamically fuses multi-modal data through modality-conditioned projections. MoCoLSK re-engineers our previous LSKNet to achieve a confluence of dynamic receptive field adjustment and multi-modal feature integration, leading to enhanced LST prediction accuracy. Furthermore, we establish the GrokLST project, a comprehensive open-source ecosystem featuring the GrokLST dataset, a high-resolution benchmark, and the GrokLST toolkit, an open-source PyTorch-based toolkit encapsulating MoCoLSK alongside 40+ state-of-the-art approaches. Extensive experimental results validate MoCoLSK's effectiveness in capturing complex dependencies and subtle variations within multispectral data, outperforming existing methods in LST downscaling. Our code, dataset, and toolkit are available at https://github.com/GrokCV/GrokLST.
- Published
- 2024
49. HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes
- Author
-
Feng, Changfeng, Chen, Zhenyuan, Kou, Renke, Gao, Guangwei, Wang, Chunping, Li, Xiang, Shu, Xiangbo, Dai, Yimian, Fu, Qiang, and Yang, Jian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Drone-based object detection in adverse weather conditions is crucial for enhancing drones' environmental perception, yet it remains largely unexplored due to the lack of relevant benchmarks. To bridge this gap, we introduce HazyDet, a large-scale dataset tailored for drone-based object detection in hazy scenes. It encompasses 383,000 real-world instances, collected from both naturally hazy environments and normal scenes with synthetically imposed haze effects to simulate adverse weather conditions. By observing the significant variations in object scale and clarity under different depth and haze conditions, we designed a Depth Conditioned Detector (DeCoDet) to incorporate this prior knowledge. DeCoDet features a Multi-scale Depth-aware Detection Head that seamlessly integrates depth perception, with the resulting depth cues harnessed by a dynamic Depth Condition Kernel module. Furthermore, we propose a Scale Invariant Refurbishment Loss to facilitate the learning of robust depth cues from pseudo-labels. Extensive evaluations on the HazyDet dataset demonstrate the flexibility and effectiveness of our method, yielding significant performance improvements. Our dataset and toolkit are available at https://github.com/GrokCV/HazyDet.
- Published
- 2024
50. Mixture of Multicenter Experts in Multimodal Generative AI for Advanced Radiotherapy Target Delineation
- Author
-
Oh, Yujin, Park, Sangjoon, Li, Xiang, Yi, Wang, Paly, Jonathan, Efstathiou, Jason, Chan, Annie, Kim, Jun Won, Byun, Hwa Kyung, Lee, Ik Jae, Cho, Jaeho, Wee, Chan Woo, Shu, Peng, Wang, Peilong, Yu, Nathan, Holmes, Jason, Ye, Jong Chul, Li, Quanzheng, Liu, Wei, Koom, Woong Sub, Kim, Jin Sung, and Kim, Kyungsang
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Clinical experts employ diverse philosophies and strategies in patient care, influenced by regional patient populations. However, existing medical artificial intelligence (AI) models are often trained on data distributions that disproportionately reflect highly prevalent patterns, reinforcing biases and overlooking the diverse expertise of clinicians. To overcome this limitation, we introduce the Mixture of Multicenter Experts (MoME) approach. This method strategically integrates specialized expertise from diverse clinical strategies, enhancing the AI model's ability to generalize and adapt across multiple medical centers. The MoME-based multimodal target volume delineation model, trained with few-shot samples including images and clinical notes from each medical center, outperformed baseline methods in prostate cancer radiotherapy target delineation. The advantages of MoME were most pronounced when data characteristics varied across centers or when data availability was limited, demonstrating its potential for broader clinical applications. Therefore, the MoME framework enables the deployment of AI-based target volume delineation models in resource-constrained medical facilities by adapting to specific preferences of each medical center only using a few sample data, without the need for data sharing between institutions. Expanding the number of multicenter experts within the MoME framework will significantly enhance the generalizability, while also improving the usability and adaptability of clinical AI applications in the field of precision radiation oncology., Comment: 39 pages
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.