Author: "An, Liwei" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"An, Liwei"' showing total 18,329 results

Start Over Author "An, Liwei" Publication Year Range Last 3 years

18,329 results on '"An, Liwei"'

1. Theoretical Benefit and Limitation of Diffusion Language Model

Author: Feng, Guhao, Geng, Yihan, Guan, Jian, Wu, Wei, Wang, Liwei, and He, Di
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate--which is important for understanding the "correctness" of a sequence, such as a reasoning chain--we show that the required sampling steps must scale linearly with sequence length to obtain "correct" sequences, thereby eliminating MDM's efficiency advantage over autoregressive models. Our analysis establishes the first theoretical foundation for understanding the benefits and limitations of MDMs. All theoretical findings are supported by empirical studies., Comment: 32 pages, 3 figures
Published: 2025

2. Controlling Symmetries and Quantum Criticality in the Anisotropic Coupled-Top Model

Author: Mao, Wen-Jian, Ye, Tian, Duan, Liwei, and Wang, Yan-Zhi
Subjects: Condensed Matter - Strongly Correlated Electrons, Quantum Physics
Abstract: We investigate the anisotropic coupled-top model, which describes the interactions between two large spins along both $x-$ and $y-$directions. By tuning anisotropic coupling strengths along distinct directions, we can manipulate the system's symmetry, inducing either discrete $Z_2$ or continuous U(1) symmetry. In the thermodynamic limit, the mean-field phase diagram is divided into five phases: the disordered paramagnetic phase, the ordered ferromagnetic or antiferromagnetic phases with symmetry breaking along either $x-$ or $y-$direction. This results in a double degeneracy of the spin projections along the principal direction for $Z_2$ symmetry breaking. When U(1) symmetry is broken, infinite degeneracy associated with the Goldstone mode emerges. Beyond the mean-field ansatz, at the critical points, the energy gap closes, and both quantum fluctuations and entanglement entropy diverge, signaling the onset of second-order quantum phase transitions. These critical behaviors consistently support the universality class of $Z_2$ symmetry. Contrarily, when U(1) symmetry is broken, the energy gap vanishes beyond the critical points, yielding a novel exponent of 1, rather than 1/2 for $Z_2$ symmetry breaking. The framework provides an ideal platform for experimentally controlling symmetries and investigating associated physical phenomena., Comment: 21 pages, 3 figures
Published: 2025

3. Resilient Quantized Consensus in Multi-Hop Relay Networks

Author: Yuan, Liwei and Ishii, Hideaki
Subjects: Computer Science - Multiagent Systems, Electrical Engineering and Systems Science - Systems and Control
Abstract: We study resilient quantized consensus in multi-agent systems, where some agents may malfunction. The network consists of agents taking integer-valued states, and the agents' communication is subject to asynchronous updates and time delays. We utilize the quantized weighted mean subsequence reduced algorithm where agents communicate with others through multi-hop relays. We prove necessary and sufficient conditions for our algorithm to achieve the objective under the malicious and Byzantine attack models. Our approach has tighter graph conditions compared to the one-hop algorithm and the flooding-based algorithms for binary consensus. Numerical examples verify the efficacy of our algorithm., Comment: 13 pages
Published: 2025

4. Neutron star evolution by combining discontinuous Galerkin and finite volume methods

Author: Adhikari, Ananya, Tichy, Wolfgang, Ji, Liwei, and Poudel, Amit
Subjects: Astrophysics - High Energy Astrophysical Phenomena, General Relativity and Quantum Cosmology
Abstract: We present here a new hybrid scheme that combines a discontinuous Galerkin (DG) method with finite volume (FV) and finite difference (FD) methods. The computational mesh is divided into smaller elements that touch but do not overlap. Like a pure DG method, our new hybrid scheme requires information exchange only at the surface of neighboring elements. This avoids the need for ghostzones that are usually many points deep in traditional FV implementations. Furthermore, unlike traditional FV implementations, that require information exchange between each element and its 26 surrounding neighbors on non-cuboid meshes, our new hybrid method exchanges information only between each element and its six nearest neighbors. Through this reduction in communication, we aim to retain the high scalability of DG when using large supercomputers. The goal is to use DG in elements with smooth matter fields and to fall back onto the more robust FV/FD method in elements that contain non-smooth shocks or star surfaces. For this we devise trouble criteria to decide whether an element should be evolved with DG or FV/FD. We use the Nmesh program to implement and test the new scheme. We successfully evolve various single neutron star cases. These include the challenging cases of a neutron star initially in an unstable equilibrium migrating to a stable configuration and a boosted neutron star. These cases are simulated for the first time here in full 3D with general relativistic hydrodynamics using DG methods. We also describe additional numerical methods, such as the limiters and the atmosphere treatment we need for our simulations., Comment: 34 pages, 20 figures
Published: 2025

5. Relativistic Gas Accretion onto Supermassive Black Hole Binaries from Inspiral through Merger

Author: Ennoggi, Lorenzo, Campanelli, Manuela, Zlochower, Yosef, Noble, Scott C., Krolik, Julian, Cattorini, Federico, Kalinani, Jay V., Mewes, Vassilios, Chabanov, Michail, Ji, Liwei, and de Simone, Maria Chiara
Subjects: Astrophysics - High Energy Astrophysical Phenomena, General Relativity and Quantum Cosmology
Abstract: Accreting supermassive black hole binaries are powerful multimessenger sources emitting both gravitational and electromagnetic (EM) radiation. Understanding the accretion dynamics of these systems and predicting their distinctive EM signals is crucial to informing and guiding upcoming efforts aimed at detecting gravitational waves produced by these binaries. To this end, accurate numerical modeling is required to describe both the spacetime and the magnetized gas around the black holes. In this paper, we present two key advances in this field of research. First, we have developed a novel 3D general relativistic magnetohydrodynamics (GRMHD) framework that combines multiple numerical codes to simulate the inspiral and merger of supermassive black hole binaries starting from realistic initial data and running all the way through merger. Throughout the evolution, we adopt a simple but functional prescription to account for gas cooling through the emission of photons. Next, we have applied our new computational method to follow the time evolution of a circular, equal-mass, non-spinning black hole binary for ${\sim\!200}$ orbits starting from a separation of ${20\,r_g}$ and reaching the post-merger evolutionary stage of the system. We have identified how and when the minidisks dissolve as the binary compresses. We also show that even when the binary ``decouples'' from its surrounding disk, its luminosity decreases by only a factor of a few and abruptly increases by ${\sim\!50\%}$ at the time of merger, accompanied by an equally abrupt change in spectrum. Finally, the magnetic flux brought to the spin-parameter ${\sim\!0.68}$ merger remnant is able to drive a relativistic, Poynting-flux-dominated jet., Comment: 36 pages, 22 figures. Submitted to Physical Review D
Published: 2025

6. Online Covariance Estimation in Nonsmooth Stochastic Approximation

Author: Jiang, Liwei, Roy, Abhishek, Balasubramanian, Krishna, Davis, Damek, Drusvyatskiy, Dmitriy, and Na, Sen
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning, Mathematics - Optimization and Control
Abstract: We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H\'ajek and Le Cam. However, no methods have been proposed to estimate this covariance matrix in a nonsmooth and potentially non-monotone (nonconvex) setting. In this paper, we study an online batch-means covariance matrix estimator introduced in Zhu et al.(2023). The estimator groups the SA iterates appropriately and computes the sample covariance among batches as an estimate of the limiting covariance. Its construction does not require prior knowledge of the total sample size, and updates can be performed recursively as new data arrives. We establish that, as long as the batch size sequence is properly specified (depending on the stepsize sequence), the estimator achieves a convergence rate of order $O(\sqrt{d}n^{-1/8+\varepsilon})$ for any $\varepsilon>0$, where $d$ and $n$ denote the problem dimensionality and the number of iterations (or samples) used. Although the problem is nonsmooth and potentially non-monotone (nonconvex), our convergence rate matches the best-known rate for covariance estimation methods using only first-order information in smooth and strongly-convex settings. The consistency of this covariance estimator enables asymptotically valid statistical inference, including constructing confidence intervals and performing hypothesis testing., Comment: 46 pages, 1 figure
Published: 2025

7. BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Author: Zhong, Han, Yin, Yutong, Zhang, Shenao, Xu, Xiaojun, Liu, Yuanxin, Zuo, Yifei, Liu, Zhihan, Liu, Boyi, Zheng, Sirui, Guo, Hongyi, Wang, Liwei, Hong, Mingyi, and Wang, Zhaoran
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks, yet generating reliable reasoning processes remains a significant challenge. We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model incorporating latent thinking processes and evaluation signals. Within this framework, we introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps. First, it generates high-quality rationales by approximating the optimal thinking process through reinforcement learning, using a novel reward shaping mechanism. Second, it enhances the base LLM by maximizing the joint probability of rationale generation with respect to the model's parameters. Theoretically, we demonstrate BRiTE's convergence at a rate of $1/T$ with $T$ representing the number of iterations. Empirical evaluations on math and coding benchmarks demonstrate that our approach consistently improves performance across different base models without requiring human-annotated thinking processes. In addition, BRiTE demonstrates superior performance compared to existing algorithms that bootstrap thinking processes use alternative methods such as rejection sampling, and can even match or exceed the results achieved through supervised fine-tuning with human-annotated data.
Published: 2025

8. Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

Author: He, Haorui, Shang, Zengqiang, Wang, Chaoren, Li, Xuyuan, Gu, Yicheng, Hua, Hua, Liu, Liwei, Yang, Chen, Li, Jiaqi, Shi, Peiyang, Wang, Yuancheng, Chen, Kai, Zhang, Pengyuan, and Wu, Zhizheng
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent advancements in speech generation have been driven by the large-scale training datasets. However, current models fall short of capturing the spontaneity and variability inherent in real-world human speech, due to their reliance on audiobook datasets limited to formal read-aloud speech styles. To bridge this gap, we introduce Emilia-Pipe, an open-source preprocessing pipeline to extract high-quality training data from valuable yet underexplored in-the-wild data that capture spontaneous human speech in real-world contexts. By leveraging Emilia-Pipe, we construct Emilia, the first multilingual speech generation dataset derived from in-the-wild speech data. This dataset comprises over 101k hours of speech across six languages: English, Chinese, German, French, Japanese, and Korean. Besides, we expand Emilia to Emilia-Large, a dataset exceeding 216k hours, making it the largest open-source speech generation dataset available. Extensive experiments demonstrate that Emilia significantly outperforms traditional audiobook datasets in generating spontaneous and human-like speech, showcasing superior performance in capturing diverse speaker timbre and speaking styles of real-world human speech. Furthermore, this work underscores the importance of scaling dataset size to advance speech generation research and validates the effectiveness of Emilia for both multilingual and crosslingual speech generation., Comment: Extended version of arXiv:2407.05361, submitted to TASLP, dataset is available at: https://huggingface.co/datasets/amphion/Emilia-Dataset
Published: 2025

9. Several classes of linear codes with few weights derived from Weil sums

Author: Hu, Zhao, Qiu, Mingxiu, Li, Nian, Tang, Xiaohu, and Wu, Liwei
Subjects: Computer Science - Information Theory
Abstract: Linear codes with few weights have applications in secret sharing, authentication codes, association schemes and strongly regular graphs. In this paper, several classes of $t$-weight linear codes over ${\mathbb F}_{q}$ are presented with the defining sets given by the intersection, difference and union of two certain sets, where $t=3,4,5,6$ and $q$ is an odd prime power. By using Weil sums and Gauss sums, the parameters and weight distributions of these codes are determined completely. Moreover, three classes of optimal codes meeting the Griesmer bound are obtained, and computer experiments show that many (almost) optimal codes can be derived from our constructions.
Published: 2025

10. Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation

Author: Zheng, Xiaoyun, Liao, Liwei, Jiao, Jianbo, Gao, Feng, and Wang, Ronggang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Self-supervised Object Segmentation (SOS) aims to segment objects without any annotations. Under conditions of multi-camera inputs, the structural, textural and geometrical consistency among each view can be leveraged to achieve fine-grained object segmentation. To make better use of the above information, we propose Surface representation based Self-supervised Object Segmentation (Surface-SOS), a new framework to segment objects for each view by 3D surface representation from multi-view images of a scene. To model high-quality geometry surfaces for complex scenes, we design a novel scene representation scheme, which decomposes the scene into two complementary neural representation modules respectively with a Signed Distance Function (SDF). Moreover, Surface-SOS is able to refine single-view segmentation with multi-view unlabeled images, by introducing coarse segmentation masks as additional input. To the best of our knowledge, Surface-SOS is the first self-supervised approach that leverages neural surface representation to break the dependence on large amounts of annotated data and strong constraints. These constraints typically involve observing target objects against a static background or relying on temporal supervision in videos. Extensive experiments on standard benchmarks including LLFF, CO3D, BlendedMVS, TUM and several real-world scenes show that Surface-SOS always yields finer object masks than its NeRF-based counterparts and surpasses supervised single-view baselines remarkably. Code is available at: https://github.com/zhengxyun/Surface-SOS., Comment: Accepted by TIP
Published: 2025

11. GDiffRetro: Retrosynthesis Prediction with Dual Graph Enhanced Molecular Representation and Diffusion Generation

Author: Sun, Shengyin, Yu, Wenhao, Ren, Yuxiang, Du, Weitao, Liu, Liwei, Zhang, Xuecang, Hu, Ying, and Ma, Chen
Subjects: Computer Science - Artificial Intelligence
Abstract: Retrosynthesis prediction focuses on identifying reactants capable of synthesizing a target product. Typically, the retrosynthesis prediction involves two phases: Reaction Center Identification and Reactant Generation. However, we argue that most existing methods suffer from two limitations in the two phases: (i) Existing models do not adequately capture the ``face'' information in molecular graphs for the reaction center identification. (ii) Current approaches for the reactant generation predominantly use sequence generation in a 2D space, which lacks versatility in generating reasonable distributions for completed reactive groups and overlooks molecules' inherent 3D properties. To overcome the above limitations, we propose GDiffRetro. For the reaction center identification, GDiffRetro uniquely integrates the original graph with its corresponding dual graph to represent molecular structures, which helps guide the model to focus more on the faces in the graph. For the reactant generation, GDiffRetro employs a conditional diffusion model in 3D to further transform the obtained synthon into a complete reactant. Our experimental findings reveal that GDiffRetro outperforms state-of-the-art semi-template models across various evaluative metrics.
Published: 2025

12. A Foundational Generative Model for Breast Ultrasound Image Analysis

Author: Yu, Haojun, Li, Youcheng, Zhang, Nan, Niu, Zihan, Gong, Xuantong, Luo, Yanwen, Ye, Haotian, He, Siyu, Wu, Quanlin, Qin, Wangyan, Zhou, Mengyuan, Han, Jie, Tao, Jia, Zhao, Ziwei, Dai, Di, He, Di, Wang, Dong, Tang, Binghui, Huo, Ling, Zou, James, Zhu, Qingli, Wang, Yong, and Wang, Liwei
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired extensive knowledge of breast structures, pathological features, and clinical variations. With few-shot adaptation, BUSGen can generate repositories of realistic and informative task-specific data, facilitating the development of models for a wide range of downstream tasks. Extensive experiments highlight BUSGen's exceptional adaptability, significantly exceeding real-data-trained foundational models in breast cancer screening, diagnosis, and prognosis. In breast cancer early diagnosis, our approach outperformed all board-certified radiologists (n=9), achieving an average sensitivity improvement of 16.5% (P-value<0.0001). Additionally, we characterized the scaling effect of using generated data which was as effective as the collected real-world data for training diagnostic models. Moreover, extensive experiments demonstrated that our approach improved the generalization ability of downstream models. Importantly, BUSGen protected patient privacy by enabling fully de-identified data sharing, making progress forward in secure medical data utilization. An online demo of BUSGen is available at https://aibus.bio., Comment: Peking University; Stanford University; Peking University Cancer Hospital & Institute; Peking Union Medical College Hospital; Cancer Hospital, Chinese Academy of Medical Sciences
Published: 2025

13. LarvSeg: Exploring Image Classification Data For Large Vocabulary Semantic Segmentation via Category-wise Attentive Classifier

Author: Yu, Haojun, Dai, Di, Zhao, Ziwei, He, Di, Hu, Han, and Wang, Liwei
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Scaling up the vocabulary of semantic segmentation models is extremely challenging because annotating large-scale mask labels is labour-intensive and time-consuming. Recently, language-guided segmentation models have been proposed to address this challenge. However, their performance drops significantly when applied to out-of-distribution categories. In this paper, we propose a new large vocabulary semantic segmentation framework, called LarvSeg. Different from previous works, LarvSeg leverages image classification data to scale the vocabulary of semantic segmentation models as large-vocabulary classification datasets usually contain balanced categories and are much easier to obtain. However, for classification tasks, the category is image-level, while for segmentation we need to predict the label at pixel level. To address this issue, we first propose a general baseline framework to incorporate image-level supervision into the training process of a pixel-level segmentation model, making the trained network perform semantic segmentation on newly introduced categories in the classification data. We then observe that a model trained on segmentation data can group pixel features of categories beyond the training vocabulary. Inspired by this finding, we design a category-wise attentive classifier to apply supervision to the precise regions of corresponding categories to improve the model performance. Extensive experiments demonstrate that LarvSeg significantly improves the large vocabulary semantic segmentation performance, especially in the categories without mask labels. For the first time, we provide a 21K-category semantic segmentation model with the help of ImageNet21K. The code is available at https://github.com/HaojunYu1998/large_voc_seg., Comment: PRCV 2024
Published: 2025

14. Generalize Your Face Forgery Detectors: An Insertable Adaptation Module Is All You Need

Author: Si, Xiaotian, Li, Linghui, Zhang, Liwei, Guo, Ziduo, Yuan, Kaiguo, Li, Bingyu, and Li, Xiaoyong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: A plethora of face forgery detectors exist to tackle facial deepfake risks. However, their practical application is hindered by the challenge of generalizing to forgeries unseen during the training stage. To this end, we introduce an insertable adaptation module that can adapt a trained off-the-shelf detector using only online unlabeled test data, without requiring modifications to the architecture or training process. Specifically, we first present a learnable class prototype-based classifier that generates predictions from the revised features and prototypes, enabling effective handling of various forgery clues and domain gaps during online testing. Additionally, we propose a nearest feature calibrator to further improve prediction accuracy and reduce the impact of noisy pseudo-labels during self-training. Experiments across multiple datasets show that our module achieves superior generalization compared to state-of-the-art methods. Moreover, it functions as a plug-and-play component that can be combined with various detectors to enhance the overall performance., Comment: ICASSP2025 accepted
Published: 2024

15. The discrete Painlev\'{e} XXXIV hierarchy arising from the gap probability distributions of Freud random matrix ensembles

Author: Min, Chao and Wang, Liwei
Subjects: Nonlinear Sciences - Exactly Solvable and Integrable Systems, Mathematical Physics, 60B20, 42C05
Abstract: We consider the symmetric gap probability distributions of certain Freud unitary ensembles. This problem is related to the Hankel determinants generated by the Freud weights supported on the complement of a symmetric interval. By using Chen and Ismail's ladder operator approach, we obtain the difference equations satisfied by the recurrence coefficients for the orthogonal polynomials with the discontinuous Freud weights. We find that these equations, with a minor change of variables, are the discrete Painlev\'{e} XXXIV hierarchy proposed by Cresswell and Joshi [{\em J. Phys. A: Math. Gen.} {\bf 32} ({1999}) {655--669}]. This is the first time that the discrete Painlev\'{e} XXXIV hierarchy appears in the study of Random Matrix Theory. We also derive the differential-difference equations for the recurrence coefficients and show the relationship between the logarithmic derivative of the gap probabilities, the nontrivial leading coefficients of the monic orthogonal polynomials and the recurrence coefficients.
Published: 2024

16. Detecting and Classifying Defective Products in Images Using YOLO

Author: Qi, Zhen, Ding, Liwei, Li, Xiangtian, Hu, Jiacheng, Lyu, Bin, and Xiang, Ao
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: With the continuous advancement of industrial automation, product quality inspection has become increasingly important in the manufacturing process. Traditional inspection methods, which often rely on manual checks or simple machine vision techniques, suffer from low efficiency and insufficient accuracy. In recent years, deep learning technology, especially the YOLO (You Only Look Once) algorithm, has emerged as a prominent solution in the field of product defect detection due to its efficient real-time detection capabilities and excellent classification performance. This study aims to use the YOLO algorithm to detect and classify defects in product images. By constructing and training a YOLO model, we conducted experiments on multiple industrial product datasets. The results demonstrate that this method can achieve real-time detection while maintaining high detection accuracy, significantly improving the efficiency and accuracy of product quality inspection. This paper further analyzes the advantages and limitations of the YOLO algorithm in practical applications and explores future research directions.
Published: 2024

17. Efficient Speech Command Recognition Leveraging Spiking Neural Network and Curriculum Learning-based Knowledge Distillation

Author: Wang, Jiaqi, Yu, Liutao, Huang, Liwei, Zhou, Chenlin, Zhang, Han, Song, Zhenxi, Zhang, Min, Ma, Zhengyu, and Zhang, Zhiguo
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: The intrinsic dynamics and event-driven nature of spiking neural networks (SNNs) make them excel in processing temporal information by naturally utilizing embedded time sequences as time steps. Recent studies adopting this approach have demonstrated SNNs' effectiveness in speech command recognition, achieving high performance by employing large time steps for long time sequences. However, the large time steps lead to increased deployment burdens for edge computing applications. Thus, it is important to balance high performance and low energy consumption when detecting temporal patterns in edge devices. Our solution comprises two key components. 1). We propose a high-performance fully spike-driven framework termed SpikeSCR, characterized by a global-local hybrid structure for efficient representation learning, which exhibits long-term learning capabilities with extended time steps. 2). To further fully embrace low energy consumption, we propose an effective knowledge distillation method based on curriculum learning (KDCL), where valuable representations learned from the easy curriculum are progressively transferred to the hard curriculum with minor loss, striking a trade-off between power efficiency and high performance. We evaluate our method on three benchmark datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC), and the Google Speech Commands (GSC) V2. Our experimental results demonstrate that SpikeSCR outperforms current state-of-the-art (SOTA) methods across these three datasets with the same time steps. Furthermore, by executing KDCL, we reduce the number of time steps by 60% and decrease energy consumption by 54.8% while maintaining comparable performance to recent SOTA results. Therefore, this work offers valuable insights for tackling temporal processing challenges with long time sequences in edge neuromorphic computing systems., Comment: Under Review
Published: 2024

18. A Survey on Sequential Recommendation

Author: Pan, Liwei, Pan, Weike, Wei, Meiyan, Yin, Hongzhi, and Ming, Zhong
Subjects: Computer Science - Information Retrieval
Abstract: Different from most conventional recommendation problems, sequential recommendation focuses on learning users' preferences by exploiting the internal order and dependency among the interacted items, which has received significant attention from both researchers and practitioners. In recent years, we have witnessed great progress and achievements in this field, necessitating a new survey. In this survey, we study the SR problem from a new perspective (i.e., the construction of an item's properties), and summarize the most recent techniques used in sequential recommendation such as pure ID-based SR, SR with side information, multi-modal SR, generative SR, LLM-powered SR, ultra-long SR and data-augmented SR. Moreover, we introduce some frontier research topics in sequential recommendation, e.g., open-domain SR, data-centric SR, could-edge collaborative SR, continuous SR, SR for good, and explainable SR. We believe that our survey could be served as a valuable roadmap for readers in this field.
Published: 2024

19. Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective

Author: Fang, Yuchen, Liang, Yuxuan, Hui, Bo, Shao, Zezhi, Deng, Liwei, Liu, Xu, Jiang, Xinke, and Zheng, Kai
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Road traffic forecasting is crucial in real-world intelligent transportation scenarios like traffic dispatching and path planning in city management and personal traveling. Spatio-temporal graph neural networks (STGNNs) stand out as the mainstream solution in this task. Nevertheless, the quadratic complexity of remarkable dynamic spatial modeling-based STGNNs has become the bottleneck over large-scale traffic data. From the spatial data management perspective, we present a novel Transformer framework called PatchSTG to efficiently and dynamically model spatial dependencies for large-scale traffic forecasting with interpretability and fidelity. Specifically, we design a novel irregular spatial patching to reduce the number of points involved in the dynamic calculation of Transformer. The irregular spatial patching first utilizes the leaf K-dimensional tree (KDTree) to recursively partition irregularly distributed traffic points into leaf nodes with a small capacity, and then merges leaf nodes belonging to the same subtree into occupancy-equaled and non-overlapped patches through padding and backtracking. Based on the patched data, depth and breadth attention are used interchangeably in the encoder to dynamically learn local and global spatial knowledge from points in a patch and points with the same index of patches. Experimental results on four real world large-scale traffic datasets show that our PatchSTG achieves train speed and memory utilization improvements up to $10\times$ and $4\times$ with the state-of-the-art performance., Comment: Accepted by SIGKDD 2025
Published: 2024

20. C$^2$LEVA: Toward Comprehensive and Contamination-Free Language Model Evaluation

Author: Li, Yanyang, Wong, Tin Long, Hung, Cheung To, Zhao, Jianqiao, Zheng, Duo, Liu, Ka Wai, Lyu, Michael R., and Wang, Liwei
Subjects: Computer Science - Computation and Language
Abstract: Recent advances in large language models (LLMs) have shown significant promise, yet their evaluation raises concerns, particularly regarding data contamination due to the lack of access to proprietary training data. To address this issue, we present C$^2$LEVA, a comprehensive bilingual benchmark featuring systematic contamination prevention. C$^2$LEVA firstly offers a holistic evaluation encompassing 22 tasks, each targeting a specific application or ability of LLMs, and secondly a trustworthy assessment due to our contamination-free tasks, ensured by a systematic contamination prevention strategy that fully automates test data renewal and enforces data protection during benchmark data release. Our large-scale evaluation of 15 open-source and proprietary models demonstrates the effectiveness of C$^2$LEVA.
Published: 2024

21. AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning

Author: Zhong, Yiwu, Liu, Zhuoming, Li, Yin, and Wang, Liwei
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) have enabled the creation of multi-modal LLMs that exhibit strong comprehension of visual data such as images and videos. However, these models usually rely on extensive visual tokens from visual encoders, leading to high computational demands, which limits their applicability in resource-constrained environments and for long-context tasks. In this work, we propose a training-free adaptive inference method for multi-modal LLMs that can accommodate a broad range of efficiency requirements with a minimum performance drop. Our method consists of a) iterative token merging based on embedding similarity before LLMs, and b) progressive token pruning within LLM layers based on multi-modal importance. With a minimalist design, our method can be applied to both video and image LLMs. Extensive experiments on diverse video and image benchmarks demonstrate that, our method substantially reduces computation load (e.g., a $\textbf{7-fold}$ reduction in FLOPs) while preserving the performance of video and image LLMs. Further, under a similar computational cost, our method outperforms the state-of-the-art methods in long video understanding (e.g., $\textbf{+4.6}$ on MLVU). Additionally, our in-depth analysis provides insights into token redundancy and LLM layer behaviors, offering guidance for future research in designing efficient multi-modal LLMs. Our code will be available at https://github.com/LaVi-Lab/AIM., Comment: 12 pages, 2 figures
Published: 2024

22. MILLION: A General Multi-Objective Framework with Controllable Risk for Portfolio Management

Author: Deng, Liwei, Wang, Tianfu, Zhao, Yan, and Zheng, Kai
Subjects: Quantitative Finance - Portfolio Management, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Portfolio management is an important yet challenging task in AI for FinTech, which aims to allocate investors' budgets among different assets to balance the risk and return of an investment. In this study, we propose a general Multi-objectIve framework with controLLable rIsk for pOrtfolio maNagement (MILLION), which consists of two main phases, i.e., return-related maximization and risk control. Specifically, in the return-related maximization phase, we introduce two auxiliary objectives, i.e., return rate prediction, and return rate ranking, combined with portfolio optimization to remit the overfitting problem and improve the generalization of the trained model to future markets. Subsequently, in the risk control phase, we propose two methods, i.e., portfolio interpolation and portfolio improvement, to achieve fine-grained risk control and fast risk adaption to a user-specified risk level. For the portfolio interpolation method, we theoretically prove that the risk can be perfectly controlled if the to-be-set risk level is in a proper interval. In addition, we also show that the return rate of the adjusted portfolio after portfolio interpolation is no less than that of the min-variance optimization, as long as the model in the reward maximization phase is effective. Furthermore, the portfolio improvement method can achieve greater return rates while keeping the same risk level compared to portfolio interpolation. Extensive experiments are conducted on three real-world datasets. The results demonstrate the effectiveness and efficiency of the proposed framework., Comment: accepted by VLDB 2025
Published: 2024

23. Analytical Study of the Non-Hermitian Semiclassical Rabi Model

Author: Liu, Yibo, Duan, Liwei, and Chen, Qing-Hu
Subjects: Quantum Physics
Abstract: The $\mathcal{PT}$ symmetric semiclassical Rabi model explores the fundamental interaction between a two-level atom and a classical field, revealing novel phenomena in open systems through the inclusion of non-Hermitian terms. We propose a single similarity transformation that yields an effective Hamiltonian in rotating-wave approximation, enabling an analytical solution. The phase boundary of the $\mathcal{PT}$-broken phase, derived from the analytical eigenvalues, closely matches the numerical exact one over a wide range of atomic frequencies, demonstrating the effectiveness of the analytical approach, especially at the main resonance. The Floquet parity operator is also introduced, providing a deeper physical understanding of the emergence of the $\mathcal{PT}$-broken phase. Furthermore, by analyzing the dynamics of excited-state population, we observe several stable oscillations in the Fourier spectrum, demonstrating the applicability of the analytical method beyond the single-photon resonance region. The Bloch-Siegert shift is also discussed and, surprisingly, resembles its Hermitian counterpart, except for the higher-order terms in the coupling strength. The present analytical treatment provides a concise and accurate description of the main physics of this non-Hermitian atom-field interaction system., Comment: 11 pages, 7 figures
Published: 2024

24. Occupational physical activity and cardiovascular disease mortality in the United States, 1988-2019.

Author: Xia, Tong, Chen, Liwei, and Li, Jian
Subjects: Cardiovascular disease, Cohort study, Mortality, Occupational physical activity, Humans, Male, Female, United States, Cardiovascular Diseases, Adult, Middle Aged, Exercise, Prospective Studies, Occupations, Social Class, Health Surveys, Young Adult
Abstract: BACKGROUND: Although leisure time physical activity (LTPA) is a beneficial factor for cardiovascular disease (CVD) mortality, relationships between occupational physical activity (OPA) and CVD mortality are inconclusive. We aimed to examine prospective associations of OPA with CVD mortality using a large representative sample of adult workers in the United States (US), and explore how socioeconomic status (SES) may influence these associations. METHODS: This cohort study included US workers (≥ 18 years) participating in the 1988 National Health Interview Survey (NHIS) and passively followed until December 31, 2019. Time (minutes/week) on strenuous OPA (e.g., lifting, pushing, or pulling heavy objects) was assessed at baseline by a questionnaire and categorized into 4 groups [i.e., none, low, medium, and high]. CVD mortality was identified by International Classification of Diseases, Tenth Version (ICD-10) and collected by the National Death Index database. We examined the association of OPA with CVD mortality using multivariable Cox proportional hazard regressions, controlling for age, sex, race/ethnicity, marital status, education, annual household income, occupation type, and pre-existing cardiometabolic disorders. RESULTS: In 28,604 participants (46.2% women; mean age 37.86 years), adjusted hazard ratios (95% CIs) of none, low, medium, and high OPA groups were 1.39 (1.01-1.91), 1.00 (reference), 1.18 (0.83-1.66) and 1.58 (1.12-2.22) for CVD mortality. The associations were stronger in workers with low education level (i.e., high school or less) [estimates of none, low, medium, and high OPA groups were 1.74 (1.09-2.78, P = 0.02), 1.00, 1.49 (0.92-2.42), and 1.87 (1.16-3.00)] or annual household income
Published: 2025

25. Correction: Zhang et al. Hepatitis B Virus X Protein (HBx) Suppresses Transcription Factor EB (TFEB) Resulting in Stabilization of Integrin Beta 1 (ITGB1) in Hepatocellular Carcinoma Cells. Cancers 2021, 13, 1181.

Author: Zhang, Chunyan, Yang, Huan, Pan, Liwei, Zhao, Guangfu, Zhang, Ruofei, Zhang, Tianci, Xiao, Zhixiong, Tong, Ying, Zhang, Yi, Hu, Richard, Pandol, Stephen, and Han, Yuan-Ping
Abstract: In the original publication [...].
Published: 2024

26. High sound pressure piezoelectric micromachined ultrasonic transducers using sputtered potassium sodium niobate.

Author: Xia, Fan, Peng, Yande, Yue, Wei, Luo, Mingze, Teng, Megan, Chen, Chun-Ming, Pala, Sedat, Yu, Xiaoyang, Ma, Yuanzheng, Acharya, Megha, Arakawa, Ryuichi, Martin, Lane, and Lin, Liwei
Abstract: This work presents air-coupled piezoelectric micromachined ultrasonic transducers (pMUTs) with high sound pressure level (SPL) under low-driving voltages by utilizing sputtered potassium sodium niobate K0.34Na0.66NbO3 (KNN) films. A prototype single KNN pMUT has been tested to show a resonant frequency at 106.3 kHz under 4 Vp-p with outstanding characteristics: (1) a large vibration amplitude of 3.74 μm/V, and (2) a high acoustic root mean square (RMS) sound pressure level of 105.5 dB/V at 10 cm, which is 5-10 times higher than those of AlN-based pMUTs at a similar frequency. There are various potential sensing and actuating applications, such as fingerprint sensing, touch point, and gesture recognition. In this work, we present demonstrations in three fields: haptics, loudspeakers, and rangefinders. For haptics, an array of 15 × 15 KNN pMUTs is used as a non-contact actuator to provide a focal pressure of around 160.3 dB RMS SPL at a distance of 15 mm. This represents the highest output pressure achieved by an airborne pMUT for haptic sensation on human palms. When used as a loudspeaker, a single pMUT element with a resonant frequency close to the audible range at 22.8 kHz is characterized. It is shown to be able to generate a uniform acoustic output with an amplitude modulation scheme. In the rangefinder application, pulse-echo measurements using a single pMUT element demonstrate good transceiving results, capable of detecting objects up to 2.82 m away. As such, this new class of high-SPL and low-driving-voltage pMUTs could be further extended to other applications requiring high acoustic pressure and a small form factor.
Published: 2024

27. Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Author: Zheng, Duo, Huang, Shijia, and Wang, Liwei
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: The rapid advancement of Multimodal Large Language Models (MLLMs) has significantly impacted various multimodal tasks. However, these models face challenges in tasks that require spatial understanding within 3D environments. Efforts to enhance MLLMs, such as incorporating point cloud features, have been made, yet a considerable gap remains between the models' learned representations and the inherent complexity of 3D scenes. This discrepancy largely stems from the training of MLLMs on predominantly 2D data, which restricts their effectiveness in comprehending 3D spaces. To address this issue, in this paper, we propose a novel generalist model, i.e., Video-3D LLM, for 3D scene understanding. By treating 3D scenes as dynamic videos and incorporating 3D position encoding into these representations, our Video-3D LLM aligns video representations with real-world spatial contexts more accurately. Additionally, we have implemented a maximum coverage sampling technique to optimize the balance between computational costs and performance efficiency. Extensive experiments demonstrate that our model achieves state-of-the-art performance on several 3D scene understanding benchmarks, including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D., Comment: 14 pages, 4 figures
Published: 2024

28. Efficient Data-aware Distance Comparison Operations for High-Dimensional Approximate Nearest Neighbor Search

Author: Deng, Liwei, Chen, Penghao, Zeng, Ximu, Wang, Tianfu, Zhao, Yan, and Zheng, Kai
Subjects: Computer Science - Databases, Computer Science - Information Retrieval
Abstract: High-dimensional approximate $K$ nearest neighbor search (AKNN) is a fundamental task for various applications, including information retrieval. Most existing algorithms for AKNN can be decomposed into two main components, i.e., candidate generation and distance comparison operations (DCOs). While different methods have unique ways of generating candidates, they all share the same DCO process. In this study, we focus on accelerating the process of DCOs that dominates the time cost in most existing AKNN algorithms. To achieve this, we propose an Data-Aware Distance Estimation approach, called DADE, which approximates the exact distance in a lower-dimensional space. We theoretically prove that the distance estimation in DADE is unbiased in terms of data distribution. Furthermore, we propose an optimized estimation based on the unbiased distance estimation formulation. In addition, we propose a hypothesis testing approach to adaptively determine the number of dimensions needed to estimate the exact distance with sufficient confidence. We integrate DADE into widely-used AKNN search algorithms, e.g., IVF and HNSW, and conduct extensive experiments to demonstrate the superiority., Comment: Accepted by VLDB 2025
Published: 2024

29. Traditional Chinese Medicine Case Analysis System for High-Level Semantic Abstraction: Optimized with Prompt and RAG

Author: Xu, Peng, Wu, Hongjin, Wang, Jinle, Lin, Rongjia, and Tan, Liwei
Subjects: Computer Science - Computation and Language
Abstract: This paper details a technical plan for building a clinical case database for Traditional Chinese Medicine (TCM) using web scraping. Leveraging multiple platforms, including 360doc, we gathered over 5,000 TCM clinical cases, performed data cleaning, and structured the dataset with crucial fields such as patient details, pathogenesis, syndromes, and annotations. Using the $Baidu\_ERNIE\_Speed\_128K$ API, we removed redundant information and generated the final answers through the $DeepSeekv2$ API, outputting results in standard JSON format. We optimized data recall with RAG and rerank techniques during retrieval and developed a hybrid matching scheme. By combining two-stage retrieval method with keyword matching via Jieba, we significantly enhanced the accuracy of model outputs.
Published: 2024

30. Reaching Resilient Leader-Follower Consensus in Time-Varying Networks via Multi-Hop Relays

Author: Yuan, Liwei and Ishii, Hideaki
Subjects: Computer Science - Multiagent Systems, Electrical Engineering and Systems Science - Systems and Control
Abstract: We study resilient leader-follower consensus of multi-agent systems (MASs) in the presence of adversarial agents, where agents' communication is modeled by time-varying topologies. The objective is to develop distributed algorithms for the nonfaulty/normal followers to track an arbitrary reference value propagated by a set of leaders while they are in interaction with the unknown adversarial agents. Our approaches are based on the weighted mean subsequence reduced (W-MSR) algorithms with agents being capable to communicate with multi-hop neighbors. Our algorithms can handle agents possessing first-order and second-order dynamics. Moreover, we characterize necessary and sufficient graph conditions for our algorithms to succeed by the novel notion of jointly robust following graphs. Our graph condition is tighter than the sufficient conditions in the literature when agents use only one-hop communication (without relays). Using multi-hop relays, we can enhance robustness of leader-follower networks without increasing communication links and obtain further relaxed graph requirements for our algorithms to succeed. Numerical examples are given to verify the efficacy of our algorithms., Comment: 15 pages
Published: 2024

31. OpenLS-DGF: An Adaptive Open-Source Dataset Generation Framework for Machine Learning Tasks in Logic Synthesis

Author: Ni, Liwei, Wang, Rui, Liu, Miao, Meng, Xingyu, Lin, Xiaoze, Liu, Junfeng, Luo, Guojie, Chu, Zhufei, Qian, Weikang, Yang, Xiaoyan, Xie, Biwei, Li, Xingquan, and Li, Huawei
Subjects: Computer Science - Artificial Intelligence
Abstract: This paper introduces OpenLS-DGF, an adaptive logic synthesis dataset generation framework, to enhance machine learning~(ML) applications within the logic synthesis process. Previous dataset generation flows were tailored for specific tasks or lacked integrated machine learning capabilities. While OpenLS-DGF supports various machine learning tasks by encapsulating the three fundamental steps of logic synthesis: Boolean representation, logic optimization, and technology mapping. It preserves the original information in both Verilog and machine-learning-friendly GraphML formats. The verilog files offer semi-customizable capabilities, enabling researchers to insert additional steps and incrementally refine the generated dataset. Furthermore, OpenLS-DGF includes an adaptive circuit engine that facilitates the final dataset management and downstream tasks. The generated OpenLS-D-v1 dataset comprises 46 combinational designs from established benchmarks, totaling over 966,000 Boolean circuits. OpenLS-D-v1 supports integrating new data features, making it more versatile for new challenges. This paper demonstrates the versatility of OpenLS-D-v1 through four distinct downstream tasks: circuit classification, circuit ranking, quality of results (QoR) prediction, and probability prediction. Each task is chosen to represent essential steps of logic synthesis, and the experimental results show the generated dataset from OpenLS-DGF achieves prominent diversity and applicability. The source code and datasets are available at https://github.com/Logic-Factory/ACE/blob/master/OpenLS-DGF/readme.md., Comment: 14 pages
Published: 2024

32. Boolean-aware Boolean Circuit Classification: A Comprehensive Study on Graph Neural Network

Author: Ni, Liwei, Li, Xinquan, Xie, Biwei, and Li, Huawei
Subjects: Computer Science - Machine Learning, Computer Science - Hardware Architecture
Abstract: Boolean circuit is a computational graph that consists of the dynamic directed graph structure and static functionality. The commonly used logic optimization and Boolean matching-based transformation can change the behavior of the Boolean circuit for its graph structure and functionality in logic synthesis. The graph structure-based Boolean circuit classification can be grouped into the graph classification task, however, the functionality-based Boolean circuit classification remains an open problem for further research. In this paper, we first define the proposed matching-equivalent class based on its ``Boolean-aware'' property. The Boolean circuits in the proposed class can be transformed into each other. Then, we present a commonly study framework based on graph neural network~(GNN) to analyze the key factors that can affect the Boolean-aware Boolean circuit classification. The empirical experiment results verify the proposed analysis, and it also shows the direction and opportunity to improve the proposed problem. The code and dataset will be released after acceptance.
Published: 2024

33. Sequential optimal contracting in continuous time

Author: Alvarez, Guillermo Alonso, Bayraktar, Erhan, Ekren, Ibrahim, and Huang, Liwei
Subjects: Mathematics - Optimization and Control, Economics - Theoretical Economics, Mathematics - Probability
Abstract: In this paper we study a principal-agent problem in continuous time with multiple lump-sum payments (contracts) paid at different deterministic times. We reduce the non-zero sum Stackelberg game between the principal and agent to a standard stochastic optimal control problem. We apply our result to a benchmark model for which we investigate how different inputs (payment frequencies, payments' distribution, discounting factors, agent's reservation utility) affect the principal's value and agent's optimal compensations.
Published: 2024

34. Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis

Author: Peng, Rui, Xu, Wangze, Tang, Luyang, Liao, Liwei, Jiao, Jianbo, and Wang, Ronggang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite the substantial progress of novel view synthesis, existing methods, either based on the Neural Radiance Fields (NeRF) or more recently 3D Gaussian Splatting (3DGS), suffer significant degradation when the input becomes sparse. Numerous efforts have been introduced to alleviate this problem, but they still struggle to synthesize satisfactory results efficiently, especially in the large scene. In this paper, we propose SCGaussian, a Structure Consistent Gaussian Splatting method using matching priors to learn 3D consistent scene structure. Considering the high interdependence of Gaussian attributes, we optimize the scene structure in two folds: rendering geometry and, more importantly, the position of Gaussian primitives, which is hard to be directly constrained in the vanilla 3DGS due to the non-structure property. To achieve this, we present a hybrid Gaussian representation. Besides the ordinary non-structure Gaussian primitives, our model also consists of ray-based Gaussian primitives that are bound to matching rays and whose optimization of their positions is restricted along the ray. Thus, we can utilize the matching correspondence to directly enforce the position of these Gaussian primitives to converge to the surface points where rays intersect. Extensive experiments on forward-facing, surrounding, and complex large scenes show the effectiveness of our approach with state-of-the-art performance and high efficiency. Code is available at https://github.com/prstrive/SCGaussian., Comment: NeurIPS 2024 Accepted
Published: 2024

35. Bridging Geometric States via Geometric Diffusion Bridge

Author: Luo, Shengjie, Xu, Yixian, He, Di, Zheng, Shuxin, Liu, Tie-Yan, and Wang, Liwei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Quantitative Biology - Quantitative Methods, Statistics - Machine Learning
Abstract: The accurate prediction of geometric state evolution in complex systems is critical for advancing scientific domains such as quantum chemistry and material modeling. Traditional experimental and computational methods face challenges in terms of environmental constraints and computational demands, while current deep learning approaches still fall short in terms of precision and generality. In this work, we introduce the Geometric Diffusion Bridge (GDB), a novel generative modeling framework that accurately bridges initial and target geometric states. GDB leverages a probabilistic approach to evolve geometric state distributions, employing an equivariant diffusion bridge derived by a modified version of Doob's $h$-transform for connecting geometric states. This tailored diffusion process is anchored by initial and target geometric states as fixed endpoints and governed by equivariant transition kernels. Moreover, trajectory data can be seamlessly leveraged in our GDB framework by using a chain of equivariant diffusion bridges, providing a more detailed and accurate characterization of evolution dynamics. Theoretically, we conduct a thorough examination to confirm our framework's ability to preserve joint distributions of geometric states and capability to completely model the underlying dynamics inducing trajectory distributions with negligible error. Experimental evaluations across various real-world scenarios show that GDB surpasses existing state-of-the-art approaches, opening up a new pathway for accurately bridging geometric states and tackling crucial scientific challenges with improved accuracy and applicability., Comment: 33 pages, 5 tables; NeurIPS 2024 Camera Ready version
Published: 2024

36. TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Author: Wang, Haiyang, Fan, Yue, Naeem, Muhammad Ferjad, Xian, Yongqin, Lenssen, Jan Eric, Wang, Liwei, Tombari, Federico, and Schiele, Bernt
Subjects: Computer Science - Machine Learning
Abstract: Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high computational costs and becomes unsustainable. To overcome this problem, we introduce TokenFormer, a natively scalable architecture that leverages the attention mechanism not only for computations among input tokens but also for interactions between tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This reformulation allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124M to 1.4B parameters by incrementally adding new key-value parameter pairs, achieving performance comparable to Transformers trained from scratch while greatly reducing training costs. Code and models are available at \url{https://github.com/Haiyang-W/TokenFormer}.
Published: 2024

37. Towards Constraint-aware Learning for Resource Allocation in NFV-enabled Networks

Author: Wang, Tianfu, Yang, Long, Wang, Chao, Qin, Chuan, Deng, Liwei, Shen, Li, and Xiong, Hui
Subjects: Computer Science - Networking and Internet Architecture
Abstract: Virtual Network Embedding (VNE) is a challenging combinatorial optimization problem that refers to resource allocation associated with hard and multifaceted constraints in network function virtualization (NFV). Existing works for VNE struggle to handle such complex constraints, leading to compromised system performance and stability. In this paper, we propose a \textbf{CON}straint-\textbf{A}ware \textbf{L}earning framework for VNE, named \textbf{CONAL}, to achieve efficient constraint management. Concretely, we formulate the VNE problem as a constrained Markov decision process with violation tolerance. This modeling approach aims to improve both resource utilization and solution feasibility by precisely evaluating solution quality and the degree of constraint violation. We also propose a reachability-guided optimization with an adaptive reachability budget method that dynamically assigns budget values. This method achieves persistent zero violation to guarantee the feasibility of VNE solutions and more stable policy optimization by handling instances without any feasible solution. Furthermore, we propose a constraint-aware graph representation method to efficiently learn cross-graph relations and constrained path connectivity in VNE. Finally, extensive experimental results demonstrate the superiority of our proposed method over state-of-the-art baselines. Our code is available at https://github.com/GeminiLight/conal-vne.
Published: 2024

38. SafetyAnalyst: Interpretable, transparent, and steerable safety moderation for AI behavior

Author: Li, Jing-Jing, Pyatkin, Valentina, Kleiman-Weiner, Max, Jiang, Liwei, Dziri, Nouha, Collins, Anne G. E., Borg, Jana Schaich, Sap, Maarten, Choi, Yejin, and Levine, Sydney
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: The ideal AI safety moderation system would be both structurally interpretable (so its decisions can be reliably explained) and steerable (to align to safety standards and reflect a community's values), which current systems fall short on. To address this gap, we present SafetyAnalyst, a novel AI safety moderation framework. Given an AI behavior, SafetyAnalyst uses chain-of-thought reasoning to analyze its potential consequences by creating a structured "harm-benefit tree," which enumerates harmful and beneficial actions and effects the AI behavior may lead to, along with likelihood, severity, and immediacy labels that describe potential impact on any stakeholders. SafetyAnalyst then aggregates all harmful and beneficial effects into a harmfulness score using fully interpretable weight parameters, which can be aligned to particular safety preferences. We applied this conceptual framework to develop, test, and release an open-source LLM prompt safety classification system, distilled from 18.5 million harm-benefit features generated by frontier LLMs on 19k prompts. On a comprehensive set of prompt safety benchmarks, we show that SafetyReporter (average F1=0.81) outperforms existing LLM safety moderation systems (average F1$<$0.72) on prompt safety classification, while offering the additional advantages of interpretability, transparency, and steerability.
Published: 2024

39. Pathologist-like explainable AI for interpretable Gleason grading in prostate cancer

Author: Mittmann, Gesa, Laiouar-Pedari, Sara, Mehrtens, Hendrik A., Haggenmüller, Sarah, Bucher, Tabea-Clara, Chanda, Tirtha, Gaisa, Nadine T., Wagner, Mathias, Klamminger, Gilbert Georg, Rau, Tilman T., Neppl, Christina, Compérat, Eva Maria, Gocht, Andreas, Hämmerle, Monika, Rupp, Niels J., Westhoff, Jula, Krücken, Irene, Seidl, Maximillian, Schürch, Christian M., Bauer, Marcus, Solass, Wiebke, Tam, Yu Chun, Weber, Florian, Grobholz, Rainer, Augustyniak, Jaroslaw, Kalinski, Thomas, Hörner, Christian, Mertz, Kirsten D., Döring, Constanze, Erbersdobler, Andreas, Deubler, Gabriele, Bremmer, Felix, Sommer, Ulrich, Brodhun, Michael, Griffin, Jon, Lenon, Maria Sarah L., Trpkov, Kiril, Cheng, Liang, Chen, Fei, Levi, Angelique, Cai, Guoping, Nguyen, Tri Q., Amin, Ali, Cimadamore, Alessia, Shabaik, Ahmed, Manucha, Varsha, Ahmad, Nazeel, Messias, Nidia, Sanguedolce, Francesca, Taheri, Diana, Baraban, Ezra, Jia, Liwei, Shah, Rajal B., Siadat, Farshid, Swarbrick, Nicole, Park, Kyung, Hassan, Oudai, Sakhaie, Siamak, Downes, Michelle R., Miyamoto, Hiroshi, Williamson, Sean R., Holland-Letz, Tim, Schneider, Carolin V., Kather, Jakob Nikolas, Tolkach, Yuri, and Brinker, Titus J.
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: The aggressiveness of prostate cancer, the most common cancer in men worldwide, is primarily assessed based on histopathological data using the Gleason scoring system. While artificial intelligence (AI) has shown promise in accurately predicting Gleason scores, these predictions often lack inherent explainability, potentially leading to distrust in human-machine interactions. To address this issue, we introduce a novel dataset of 1,015 tissue microarray core images, annotated by an international group of 54 pathologists. The annotations provide detailed localized pattern descriptions for Gleason grading in line with international guidelines. Utilizing this dataset, we develop an inherently explainable AI system based on a U-Net architecture that provides predictions leveraging pathologists' terminology. This approach circumvents post-hoc explainability methods while maintaining or exceeding the performance of methods trained directly for Gleason pattern segmentation (Dice score: 0.713 $\pm$ 0.003 trained on explanations vs. 0.691 $\pm$ 0.010 trained on Gleason patterns). By employing soft labels during training, we capture the intrinsic uncertainty in the data, yielding strong results in Gleason pattern segmentation even in the context of high interobserver variability. With the release of this dataset, we aim to encourage further research into segmentation in medical tasks with high levels of subjectivity and to advance the understanding of pathologists' reasoning processes., Comment: 58 pages, 15 figures (incl. supplementary)
Published: 2024

40. How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs

Author: Feng, Guhao, Yang, Kai, Gu, Yuntian, Ai, Xinyue, Luo, Shengjie, Sun, Jiacheng, He, Di, Li, Zhenguo, and Wang, Liwei
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Statistics - Machine Learning
Abstract: Despite the remarkable success of Transformer-based Large Language Models (LLMs) across various domains, understanding and enhancing their mathematical capabilities remains a significant challenge. In this paper, we conduct a rigorous theoretical analysis of LLMs' mathematical abilities, with a specific focus on their arithmetic performances. We identify numerical precision as a key factor that influences their effectiveness in mathematical tasks. Our results show that Transformers operating with low numerical precision fail to address arithmetic tasks, such as iterated addition and integer multiplication, unless the model size grows super-polynomially with respect to the input length. In contrast, Transformers with standard numerical precision can efficiently handle these tasks with significantly smaller model sizes. We further support our theoretical findings through empirical experiments that explore the impact of varying numerical precision on arithmetic tasks, providing valuable insights for improving the mathematical reasoning capabilities of LLMs.
Published: 2024

41. To Err is AI : A Case Study Informing LLM Flaw Reporting Practices

Author: McGregor, Sean, Ettinger, Allyson, Judd, Nick, Albee, Paul, Jiang, Liwei, Rao, Kavel, Smith, Will, Longpre, Shayne, Ghosh, Avijit, Fiorelli, Christopher, Hoang, Michelle, Cattell, Sven, and Dziri, Nouha
Subjects: Computer Science - Computers and Society, Computer Science - Machine Learning, Computer Science - Software Engineering
Abstract: In August of 2024, 495 hackers generated evaluations in an open-ended bug bounty targeting the Open Language Model (OLMo) from The Allen Institute for AI. A vendor panel staffed by representatives of OLMo's safety program adjudicated changes to OLMo's documentation and awarded cash bounties to participants who successfully demonstrated a need for public disclosure clarifying the intent, capacities, and hazards of model deployment. This paper presents a collection of lessons learned, illustrative of flaw reporting best practices intended to reduce the likelihood of incidents and produce safer large language models (LLMs). These include best practices for safety reporting processes, their artifacts, and safety program staffing., Comment: 8 pages, 5 figures
Published: 2024

42. Strain-induced two-dimensional topological crystalline insulator

Author: Jing, Liwei, Amini, Mohammad, Fumega, Adolfo O., Silveira, Orlando J., Lado, Jose L., Liljeroth, Peter, and Kezilebieke, Shawulienu
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Condensed Matter - Materials Science
Abstract: Topological crystalline insulators (TCIs) host topological phases of matter protected by crystal symmetries. Topological surface states in three-dimensional TCIs have been predicted and observed in IV-VI SnTe-class semiconductors. Despite the prediction of a two-dimensional (2D) TCI characterized by two pairs of edge states inside the bulk gap, materials challenges have thus far prevented its experimental realization. Here we report the growth and characterization of bilayer SnTe on the 2$H$-NbSe$_2$ substrate by molecular beam epitaxy and scanning tunneling microscopy. We experimentally observe two anticorrelated, periodically modulated pairs of conducting edge states along the perimeters of the sample with a large band gap exceeding $0.2$ eV. We identify these states with a 2D TCI through first principles calculations. Finally, we probe the coupling of adjacent topological edge states and demonstrate the resulting energy shift driven by a combination of electrostatic interactions and tunneling coupling. Our work opens the door to investigations of tunable topological states in 2D TCIs, of potential impact for spintronics and nanoelectronics applications at room temperature.
Published: 2024

43. Enhancing Temporal Modeling of Video LLMs via Time Gating

Author: Hu, Zi-Yuan, Zhong, Yiwu, Huang, Shijia, Lyu, Michael R., and Wang, Liwei
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Video Large Language Models (Video LLMs) have achieved impressive performance on video-and-language tasks, such as video question answering. However, most existing Video LLMs neglect temporal information in video data, leading to struggles with temporal-aware video understanding. To address this gap, we propose a Time Gating Video LLM (TG-Vid) designed to enhance temporal modeling through a novel Time Gating module (TG). The TG module employs a time gating mechanism on its sub-modules, comprising gating spatial attention, gating temporal attention, and gating MLP. This architecture enables our model to achieve a robust understanding of temporal information within videos. Extensive evaluation of temporal-sensitive video benchmarks (i.e., MVBench, TempCompass, and NExT-QA) demonstrates that our TG-Vid model significantly outperforms the existing Video LLMs. Further, comprehensive ablation studies validate that the performance gains are attributed to the designs of our TG module. Our code is available at https://github.com/LaVi-Lab/TG-Vid., Comment: EMNLP 2024 Findings (Short)
Published: 2024

44. AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Author: Lu, Ximing, Sclar, Melanie, Hallinan, Skyler, Mireshghallah, Niloofar, Liu, Jiacheng, Han, Seungju, Ettinger, Allyson, Jiang, Liwei, Chandu, Khyathi, Dziri, Nouha, and Choi, Yejin
Subjects: Computer Science - Computation and Language
Abstract: Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions about whether AI can match or even surpass human creativity. We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text by reconstructing it from existing text snippets on the web. CREATIVITY INDEX is motivated by the hypothesis that the seemingly remarkable creativity of LLMs may be attributable in large part to the creativity of human-written texts on the web. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm that can search verbatim and near-verbatim matches of text snippets from a given document against the web. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs, and that alignment reduces the CREATIVITY INDEX of LLMs by an average of 30.1%. In addition, we find that distinguished authors like Hemingway exhibit measurably higher CREATIVITY INDEX compared to other human writers. Finally, we demonstrate that CREATIVITY INDEX can be used as a surprisingly effective criterion for zero-shot machine text detection, surpassing the strongest existing zero-shot system, DetectGPT, by a significant margin of 30.2%, and even outperforming the strongest supervised system, GhostBuster, in five out of six domains.
Published: 2024

45. Can Language Models Reason about Individualistic Human Values and Preferences?

Author: Jiang, Liwei, Sorensen, Taylor, Levine, Sydney, and Choi, Yejin
Subjects: Computer Science - Computation and Language
Abstract: Recent calls for pluralistic alignment emphasize that AI systems should address the diverse needs of all people. Yet, efforts in this space often require sorting people into fixed buckets of pre-specified diversity-defining dimensions (e.g., demographics, personalities, communication styles), risking smoothing out or even stereotyping the rich spectrum of individualistic variations. To achieve an authentic representation of diversity that respects individuality, we propose individualistic alignment. While individualistic alignment can take various forms, in this paper, we introduce IndieValueCatalog, a dataset transformed from the influential World Values Survey (WVS), to study language models (LMs) on the specific challenge of individualistic value reasoning. Specifically, given a sample of an individual's value-expressing statements, models are tasked with predicting their value judgments in novel cases. With IndieValueCatalog, we reveal critical limitations in frontier LMs' abilities to reason about individualistic human values with accuracies, only ranging between 55% to 65%. Moreover, our results highlight that a precise description of individualistic values cannot be approximated only via demographic information. We also identify a partiality of LMs in reasoning about global individualistic values, as measured by our proposed Value Inequity Index ({\sigma}INEQUITY). Finally, we train a series of Individualistic Value Reasoners (IndieValueReasoner) using IndieValueCatalog to enhance models' individualistic value reasoning capability, revealing new patterns and dynamics into global human values. We outline future research challenges and opportunities for advancing individualistic alignment.
Published: 2024

46. DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life

Author: Chiu, Yu Ying, Jiang, Liwei, and Choi, Yejin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: As we increasingly seek guidance from LLMs for decision-making in daily life, many of these decisions are not clear-cut and depend significantly on the personal values and ethical standards of the users. We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life. Each dilemma includes two possible actions and with each action, the affected parties and human values invoked. Based on these dilemmas, we consolidated a set of human values across everyday topics e.g., interpersonal relationships, workplace, and environmental issues. We evaluated LLMs on these dilemmas to determine what action they will take and the values represented by these actions. Then, we analyzed these values through the lens of five popular theories inspired by sociology, psychology and philosophy. These theories are: World Value Survey, Moral Foundation Theory, Maslow's Hierarchy of Needs, Aristotle's Virtues, and Plutchik Wheel of Emotion. We find that LLMs are most aligned with the self-expression over survival values in terms of World Value Survey, care over loyalty in Moral Foundation Theory. Interestingly, we find large preferences differences in models for some core values such as truthfulness e.g., Mixtral-8x7B model tends to neglect it by 9.7% while GPT-4-turbo model tends to select it by 9.4%. We also study the recent guidance released by OpenAI (ModelSpec), and Anthropic (Constitutional AI) to understand how their released principles reflect their actual value prioritization when facing nuanced moral reasoning in daily-life settings. We find that end users cannot effectively steer such prioritization using system prompts., Comment: Preprint. Under Review
Published: 2024

47. CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Author: Chiu, Yu Ying, Jiang, Liwei, Lin, Bill Yuchen, Park, Chan Young, Li, Shuyue Stella, Ravi, Sahithya, Bhatia, Mehar, Antoniak, Maria, Tsvetkov, Yulia, Shwartz, Vered, and Choi, Yejin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: To make large language models (LLMs) more helpful across diverse cultures, it is essential to have effective cultural knowledge benchmarks to measure and track our progress. Effective benchmarks need to be robust, diverse, and challenging. We introduce CulturalBench: a set of 1,227 human-written and human-verified questions for effectively assessing LLMs' cultural knowledge, covering 45 global regions including the underrepresented ones like Bangladesh, Zimbabwe, and Peru. Questions - each verified by five independent annotators - span 17 diverse topics ranging from food preferences to greeting etiquettes. We evaluate models on two setups: CulturalBench-Easy and CulturalBench-Hard which share the same questions but asked differently. We find that LLMs are sensitive to such difference in setups (e.g., GPT-4o with 27.3% difference). Compared to human performance (92.6% accuracy), CulturalBench-Hard is more challenging for frontier LLMs with the best performing model (GPT-4o) at only 61.5% and the worst (Llama3-8b) at 21.4%. Moreover, we find that LLMs often struggle with tricky questions that have multiple correct answers (e.g., What utensils do the Chinese usually use?), revealing a tendency to converge to a single answer. Our results also indicate that OpenAI GPT-4o substantially outperform other proprietary and open source models in questions related to all but one region (Oceania). Nonetheless, all models consistently underperform on questions related to South America and the Middle East., Comment: Preprint. Under review
Published: 2024

48. Extracting the Potential of Emerging Hardware Accelerators for Symmetric Eigenvalue Decomposition

Author: Wang, Hansheng, Shi, Lu, duan, Zhekai, Wu, Panruo, Guo, Liwei, and Zhang, Shaoshuai
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Benefiting from the advancement of hardware accelerators such as GPUs, deep neural networks and scientific computing applications can achieve superior performance. Recently, the computing capacity of emerging hardware accelerators has increased rapidly, while memory bandwidth has not kept pace with this growth. This disparity exacerbates the gap between computing and memory, leading to inefficiencies on conventional algorithms, as they're likely to be converted from compute-bound to memory-bound. Symmetric eigenvalue decomposition (EVD), a critical operation in various research domains including scientific computing, deep learning training, and inference algorithms, exhibits suboptimal performance due to achieving less than 3\% hardware computing utilization on the H100 GPU. In this paper, we analyze the features of emerging hardware accelerators to identify the bottlenecks inherent in conventional EVD algorithms. To improve EVD performance, we propose several algorithmic optimizations aimed at solving the memory-bound problem and providing a better utilization of the rich computing capacity and parallelism on the emerging hardware accelerators. Experimentally, our proposed method demonstrates significant speedups on tridiagonalization, which is the main workload that takes over 90\% elapsed time of EVD, compared to the SOTA cuSOLVER tridiagonalization, achieving up to 10.1x, 7.5x, and 2.3x improvements on H100, A100, and RTX 4090 GPUs, respectively. And the end-to-end the performance of EVD solver is also up to 4.1x faster than cuSOVLER.
Published: 2024

49. Explainable Diagnosis Prediction through Neuro-Symbolic Integration

Author: Lu, Qiuhao, Li, Rui, Sagheb, Elham, Wen, Andrew, Wang, Jinlian, Wang, Liwei, Fan, Jungwei W., and Liu, Hongfang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Diagnosis prediction is a critical task in healthcare, where timely and accurate identification of medical conditions can significantly impact patient outcomes. Traditional machine learning and deep learning models have achieved notable success in this domain but often lack interpretability which is a crucial requirement in clinical settings. In this study, we explore the use of neuro-symbolic methods, specifically Logical Neural Networks (LNNs), to develop explainable models for diagnosis prediction. Essentially, we design and implement LNN-based models that integrate domain-specific knowledge through logical rules with learnable thresholds. Our models, particularly $M_{\text{multi-pathway}}$ and $M_{\text{comprehensive}}$, demonstrate superior performance over traditional models such as Logistic Regression, SVM, and Random Forest, achieving higher accuracy (up to 80.52\%) and AUROC scores (up to 0.8457) in the case study of diabetes prediction. The learned weights and thresholds within the LNN models provide direct insights into feature contributions, enhancing interpretability without compromising predictive power. These findings highlight the potential of neuro-symbolic approaches in bridging the gap between accuracy and explainability in healthcare AI applications. By offering transparent and adaptable diagnostic models, our work contributes to the advancement of precision medicine and supports the development of equitable healthcare solutions. Future research will focus on extending these methods to larger and more diverse datasets to further validate their applicability across different medical conditions and populations., Comment: Proceedings of AMIA Informatics Summit 2025
Published: 2024

50. Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

Author: Davis, Damek, Drusvyatskiy, Dmitriy, and Jiang, Liwei
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, 65K05, 65K10, 90C30, 90C06
Abstract: A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime., Comment: 58 pages, 5 figures
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

18,329 results on '"An, Liwei"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources