Author: "Mou A" - Searchworks@Jio Institute Digital Library Search Results

1. AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios

Author: Mou, Xinyi, Liang, Jingcong, Lin, Jiayu, Zhang, Xinnong, Liu, Xiawei, Yang, Shiyue, Ye, Rong, Chen, Lei, Kuang, Haoyu, Huang, Xuanjing, and Wei, Zhongyu
Subjects: Computer Science - Computation and Language, Computer Science - Computers and Society
Abstract: Large language models (LLMs) are increasingly leveraged to empower autonomous agents to simulate human beings in various fields of behavioral research. However, evaluating their capacity to navigate complex social interactions remains a challenge. Previous studies face limitations due to insufficient scenario diversity, complexity, and a single-perspective focus. To this end, we introduce AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios. Drawing on Dramaturgical Theory, AgentSense employs a bottom-up approach to create 1,225 diverse social scenarios constructed from extensive scripts. We evaluate LLM-driven agents through multi-turn interactions, emphasizing both goal completion and implicit reasoning. We analyze goals using ERG theory and conduct comprehensive experiments. Our findings highlight that LLMs struggle with goals in complex social scenarios, especially high-level growth needs, and even GPT-4o requires improvement in private information reasoning.
Published: 2024

2. Synth4Seg -- Learning Defect Data Synthesis for Defect Segmentation using Bi-level Optimization

Author: Mou, Shancong, Vemulapalli, Raviteja, Li, Shiyu, Liu, Yuxuan, Thomas, C, Cao, Meng, Bai, Haoping, Tuzel, Oncel, Huang, Ping, Shan, Jiulong, and Shi, Jianjun
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Defect segmentation is crucial for quality control in advanced manufacturing, yet data scarcity poses challenges for state-of-the-art supervised deep learning. Synthetic defect data generation is a popular approach for mitigating data challenges. However, many current methods simply generate defects following a fixed set of rules, which may not directly relate to downstream task performance. This can lead to suboptimal performance and may even hinder the downstream task. To solve this problem, we leverage a novel bi-level optimization-based synthetic defect data generation framework. We use an online synthetic defect generation module grounded in the commonly-used Cut\&Paste framework, and adopt an efficient gradient-based optimization algorithm to solve the bi-level optimization problem. We achieve simultaneous training of the defect segmentation network, and learn various parameters of the data synthesis module by maximizing the validation performance of the trained defect segmentation network. Our experimental results on benchmark datasets under limited data settings show that the proposed bi-level optimization method can be used for learning the most effective locations for pasting synthetic defects thereby improving the segmentation performance by up to 18.3\% when compared to pasting defects at random locations. We also demonstrate up to 2.6\% performance gain by learning the importance weights for different augmentation-specific defect data sources when compared to giving equal importance to all the data sources.
Published: 2024

3. Computing real-time quantum path integrals on Sewed, almost-Lefschetz thimbles

Author: Mou, Zong-Gang, Saffin, Paul M., and Tranberg, Anders
Subjects: High Energy Physics - Lattice, High Energy Physics - Phenomenology, High Energy Physics - Theory
Abstract: We present a method to compute real-time path integrals numerically, by Monte-Carlo sampling on near-Lefschetz thimbles. We present a collection of new tools, which together provide an alternative to existing methods such as the Generalised thimble. These involve a convenient coordinate parameterization of the thimble, direct numerical integration along a radial coordinate into an effective path integral weight and locally deforming the Lefschetz thimble using its Gaussian (non-interacting theory) counterpart in a region about the critical point. We apply this to quantum mechanics, identify possible pitfalls and benefits, and benchmark its efficiency., Comment: 24 pages, 10 figures
Published: 2024

4. Development and Testing of a Wood Panels Bark Removal Equipment Based on Deep Learning

Author: Wang, Rijun, Zhang, Guanghao, Chen, Hongyang, Yu, Xinye, Chen, Yesheng, Liang, Fulong, Mou, Xiangwei, and Wang, Bo
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Attempting to apply deep learning methods to wood panels bark removal equipment to enhance the quality and efficiency of bark removal is a significant and challenging endeavor. This study develops and tests a deep learning-based wood panels bark removal equipment. In accordance with the practical requirements of sawmills, a wood panels bark removal equipment equipped with a vision inspection system is designed. Based on a substantial collection of wood panel images obtained using the visual inspection system, the first general wood panels semantic segmentation dataset is constructed for training the BiSeNetV1 model employed in this study. Furthermore, the calculation methods and processes for the essential key data required in the bark removal process are presented in detail. Comparative experiments of the BiSeNetV1 model and tests of bark removal effectiveness are conducted in both laboratory and sawmill environments. The results of the comparative experiments indicate that the application of the BiSeNetV1 segmentation model is rational and feasible. The results of the bark removal effectiveness tests demonstrate a significant improvement in both the quality and efficiency of bark removal. The developed equipment fully meets the sawmill's requirements for precision and efficiency in bark removal processing.
Published: 2024

5. RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

Author: Zhou, Enyu, Zheng, Guodong, Wang, Binghai, Xi, Zhiheng, Dou, Shihan, Bao, Rong, Shen, Wei, Xiong, Limao, Fan, Jessica, Mou, Yurong, Zheng, Rui, Gui, Tao, Zhang, Qi, and Huang, Xuanjing
Subjects: Computer Science - Computation and Language
Abstract: Reward models (RMs) guide the alignment of large language models (LLMs), steering them toward behaviors preferred by humans. Evaluating RMs is the key to better aligning LLMs. However, the current evaluation of RMs may not directly correspond to their alignment performance due to the limited distribution of evaluation data and evaluation methods that are not closely related to alignment objectives. To address these limitations, we propose RMB, a comprehensive RM benchmark that covers over 49 real-world scenarios and includes both pairwise and Best-of-N (BoN) evaluations to better reflect the effectiveness of RMs in guiding alignment optimization. We demonstrate a positive correlation between our benchmark and the downstream alignment task performance. Based on our benchmark, we conduct extensive analysis on the state-of-the-art RMs, revealing their generalization defects that were not discovered by previous benchmarks, and highlighting the potential of generative RMs. Furthermore, we delve into open questions in reward models, specifically examining the effectiveness of majority voting for the evaluation of reward models and analyzing the impact factors of generative RMs, including the influence of evaluation criteria and instructing methods. Our evaluation code and datasets are available at https://github.com/Zhou-Zoey/RMB-Reward-Model-Benchmark.
Published: 2024

6. Tracking the jet-like corona of black hole Swift J1727.8-1613 during a flare state through Type-C quasi-periodic oscillations

Author: Liao, Jie, Chang, Ning, Cui, Lang, Jiang, Pengfei, Mou, Didong, Huang, Yongfeng, An, Tao, Ho, Luis C., Feng, Hua, Fu, Yu-Cong, Cao, Hongmin, and Liu, Xiang
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: Type-C quasi-periodic oscillations (QPOs) in black hole X-ray transients typically manifest in the low-hard and hard-intermediate states. This study presents a detailed spectral and temporal analysis of the black hole candidate Swift J1727.8-1613 using NICER observations from August and September 2023, with a focus on the first flare period. The time-averaged spectra, along with the rms and phase-lag spectra of the QPOs, were jointly fitted using the time-dependent Comptonization model \text{vkompthdk} to examine the geometry of the corona during this flare. The results provide a comprehensive view of the QPO, where we detected type-C QPOs with a centroid frequency increasing from 0.32 Hz to 2.63 Hz, while it elevated when it entered the flare state, and its energy spectral properties as they evolved during the first flare period. Correlations between spectral and temporal properties suggest that type-C QPOs are primarily modulated by Lense-Thirring precession. Based on simultaneous radio observations indicating discrete jet ejections, we propose, for the first time, a scenario where the temporarily extended corona contracts vertically from approximately 2714 km to less than 900 km, overlying the inner accretion disc, with a transient jet seemingly being launched. The corona then recovers to nearly 2000 km by the end of the first flare period of Swift J1727.8-1613, rather than undergoing horizontal changes. A phenomenological analysis of the corona scenario during this flare period was also conducted., Comment: 16 pages, 6 figures, 3 tables, Submitted to ApJ
Published: 2024

7. Online Control-Informed Learning

Author: Liang, Zihao, Zhou, Tianyu, Lu, Zehui, and Mou, Shaoshuai
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Computer Science - Robotics, Electrical Engineering and Systems Science - Systems and Control
Abstract: This paper proposes an Online Control-Informed Learning (OCIL) framework, which synthesizes the well-established control theories to solve a broad class of learning and control tasks in real time. This novel integration effectively handles practical issues in machine learning such as noisy measurement data, online learning, and data efficiency. By considering any robot as a tunable optimal control system, we propose an online parameter estimator based on extended Kalman filter (EKF) to incrementally tune the system in real time, enabling it to complete designated learning or control tasks. The proposed method also improves robustness in learning by effectively managing noise in the data. Theoretical analysis is provided to demonstrate the convergence and regret of OCIL. Three learning modes of OCIL, i.e. Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, are investigated via experiments, which validate their effectiveness.
Published: 2024

8. $L_2$-approximation using randomized lattice algorithms

Author: Cai, Mou, Goda, Takashi, and Kazashi, Yoshihito
Subjects: Mathematics - Numerical Analysis
Abstract: We propose a randomized lattice algorithm for approximating multivariate periodic functions over the $d$-dimensional unit cube from the weighted Korobov space with mixed smoothness $\alpha > 1/2$ and product weights $\gamma_1,\gamma_2,\ldots\in [0,1]$. Building upon the deterministic lattice algorithm by Kuo, Sloan, and Wo\'{z}niakowski (2006), we incorporate a randomized quadrature rule by Dick, Goda, and Suzuki (2022) to accelerate the convergence rate. This randomization involves drawing the number of points for function evaluations randomly, and selecting a good generating vector for rank-1 lattice points using the randomized component-by-component algorithm. We prove that our randomized algorithm achieves a worst-case root mean squared $L_2$-approximation error of order $M^{-\alpha/2 - 1/8 + \varepsilon}$ for an arbitrarily small $\varepsilon > 0$, where $M$ denotes the maximum number of function evaluations, and that the error bound is independent of the dimension $d$ if the weights satisfy $\sum_{j=1}^\infty \gamma_j^{1/\alpha} < \infty$. Our upper bound converges faster than a lower bound on the worst-case $L_2$-approximation error for deterministic rank-1 lattice-based approximation proved by Byrenheid, K\"{a}mmerer, Ullrich, and Volkmer (2017). We also show a lower error bound of order $M^{-\alpha/2-1/2}$ for our randomized algorithm, leaving a slight gap between the upper and lower bounds open for future research., Comment: 22 pages
Published: 2024

9. Deep Heterogeneous Contrastive Hyper-Graph Learning for In-the-Wild Context-Aware Human Activity Recognition

Author: Ge, Wen, Mou, Guanyi, Agu, Emmanuel O., and Lee, Kyumin
Subjects: Computer Science - Machine Learning
Abstract: Human Activity Recognition (HAR) is a challenging, multi-label classification problem as activities may co-occur and sensor signals corresponding to the same activity may vary in different contexts (e.g., different device placements). This paper proposes a Deep Heterogeneous Contrastive Hyper-Graph Learning (DHC-HGL) framework that captures heterogenous Context-Aware HAR (CA-HAR) hypergraph properties in a message-passing and neighborhood-aggregation fashion. Prior work only explored homogeneous or shallow-node-heterogeneous graphs. DHC-HGL handles heterogeneous CA-HAR data by innovatively 1) Constructing three different types of sub-hypergraphs that are each passed through different custom HyperGraph Convolution (HGC) layers designed to handle edge-heterogeneity and 2) Adopting a contrastive loss function to ensure node-heterogeneity. In rigorous evaluation on two CA-HAR datasets, DHC-HGL significantly outperformed state-of-the-art baselines by 5.8% to 16.7% on Matthews Correlation Coefficient (MCC) and 3.0% to 8.4% on Macro F1 scores. UMAP visualizations of learned CA-HAR node embeddings are also presented to enhance model explainability., Comment: IMWUT 2023
Published: 2024
Full Text: View/download PDF

10. Heterogeneous Hyper-Graph Neural Networks for Context-aware Human Activity Recognition

Author: Ge, Wen, Mou, Guanyi, Agu, Emmanuel O., and Lee, Kyumin
Subjects: Computer Science - Machine Learning
Abstract: Context-aware Human Activity Recognition (CHAR) is challenging due to the need to recognize the user's current activity from signals that vary significantly with contextual factors such as phone placements and the varied styles with which different users perform the same activity. In this paper, we argue that context-aware activity visit patterns in realistic in-the-wild data can equivocally be considered as a general graph representation learning task. We posit that exploiting underlying graphical patterns in CHAR data can improve CHAR task performance and representation learning. Building on the intuition that certain activities are frequently performed with the phone placed in certain positions, we focus on the context-aware human activity problem of recognizing the tuple. We demonstrate that CHAR data has an underlying graph structure that can be viewed as a heterogenous hypergraph that has multiple types of nodes and hyperedges (an edge connecting more than two nodes). Subsequently, learning representations becomes a graph node representation learning problem. After task transformation, we further propose a novel Heterogeneous HyperGraph Neural Network architecture for Context-aware Human Activity Recognition (HHGNN-CHAR), with three types of heterogeneous nodes (user, phone placement, and activity). Connections between all types of nodes are represented by hyperedges. Rigorous evaluation demonstrated that on an unscripted, in-the-wild CHAR dataset, our proposed framework significantly outperforms state-of-the-art (SOTA) baselines including CHAR models that do not exploit graphs, and GNN variants that do not incorporate heterogeneous nodes or hyperedges with overall improvements 14.04% on Matthews Correlation Coefficient (MCC) and 7.01% on Macro F1 scores., Comment: PerCom 2023
Published: 2024
Full Text: View/download PDF

11. Reducing and Exploiting Data Augmentation Noise through Meta Reweighting Contrastive Learning for Text Classification

Author: Mou, Guanyi, Li, Yichuan, and Lee, Kyumin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Data augmentation has shown its effectiveness in resolving the data-hungry problem and improving model's generalization ability. However, the quality of augmented data can be varied, especially compared with the raw/original data. To boost deep learning models' performance given augmented data/samples in text classification tasks, we propose a novel framework, which leverages both meta learning and contrastive learning techniques as parts of our design for reweighting the augmented samples and refining their feature representations based on their quality. As part of the framework, we propose novel weight-dependent enqueue and dequeue algorithms to utilize augmented samples' weight/quality information effectively. Through experiments, we show that our framework can reasonably cooperate with existing deep learning models (e.g., RoBERTa-base and Text-CNN) and augmentation techniques (e.g., Wordnet and Easydata) for specific supervised learning tasks. Experiment results show that our framework achieves an average of 1.6%, up to 4.3% absolute improvement on Text-CNN encoders and an average of 1.4%, up to 4.4% absolute improvement on RoBERTa-base encoders on seven GLUE benchmark datasets compared with the best baseline. We present an indepth analysis of our framework design, revealing the non-trivial contributions of our network components. Our code is publicly available for better reproducibility., Comment: IEEE BigData 2021
Published: 2024
Full Text: View/download PDF

12. An Effective, Robust and Fairness-aware Hate Speech Detection Framework

Author: Mou, Guanyi and Lee, Kyumin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: With the widespread online social networks, hate speeches are spreading faster and causing more damage than ever before. Existing hate speech detection methods have limitations in several aspects, such as handling data insufficiency, estimating model uncertainty, improving robustness against malicious attacks, and handling unintended bias (i.e., fairness). There is an urgent need for accurate, robust, and fair hate speech classification in online social networks. To bridge the gap, we design a data-augmented, fairness addressed, and uncertainty estimated novel framework. As parts of the framework, we propose Bidirectional Quaternion-Quasi-LSTM layers to balance effectiveness and efficiency. To build a generalized model, we combine five datasets collected from three platforms. Experiment results show that our model outperforms eight state-of-the-art methods under both no attack scenario and various attack scenarios, indicating the effectiveness and robustness of our model. We share our code along with combined dataset for better future research, Comment: IEEE BigData 2021
Published: 2024
Full Text: View/download PDF

13. SWE2: SubWord Enriched and Significant Word Emphasized Framework for Hate Speech Detection

Author: Mou, Guanyi, Ye, Pengyi, and Lee, Kyumin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Hate speech detection on online social networks has become one of the emerging hot topics in recent years. With the broad spread and fast propagation speed across online social networks, hate speech makes significant impacts on society by increasing prejudice and hurting people. Therefore, there are aroused attention and concern from both industry and academia. In this paper, we address the hate speech problem and propose a novel hate speech detection framework called SWE2, which only relies on the content of messages and automatically identifies hate speech. In particular, our framework exploits both word-level semantic information and sub-word knowledge. It is intuitively persuasive and also practically performs well under a situation with/without character-level adversarial attack. Experimental results show that our proposed model achieves 0.975 accuracy and 0.953 macro F1, outperforming 7 state-of-the-art baselines under no adversarial attack. Our model robustly and significantly performed well under extreme adversarial attack (manipulation of 50% messages), achieving 0.967 accuracy and 0.934 macro F1., Comment: Published in CIKM 2020
Published: 2024
Full Text: View/download PDF

14. Wildlife Product Trading in Online Social Networks: A Case Study on Ivory-Related Product Sales Promotion Posts

Author: Mou, Guanyi, Yue, Yun, Lee, Kyumin, and Zhang, Ziming
Subjects: Computer Science - Social and Information Networks, Computer Science - Machine Learning
Abstract: Wildlife trafficking (WLT) has emerged as a global issue, with traffickers expanding their operations from offline to online platforms, utilizing e-commerce websites and social networks to enhance their illicit trade. This paper addresses the challenge of detecting and recognizing wildlife product sales promotion behaviors in online social networks, a crucial aspect in combating these environmentally harmful activities. To counter these environmentally damaging illegal operations, in this research, we focus on wildlife product sales promotion behaviors in online social networks. Specifically, 1) A scalable dataset related to wildlife product trading is collected using a network-based approach. This dataset is labeled through a human-in-the-loop machine learning process, distinguishing positive class samples containing wildlife product selling posts and hard-negatives representing normal posts misclassified as potential WLT posts, subsequently corrected by human annotators. 2) We benchmark the machine learning results on the proposed dataset and build a practical framework that automatically identifies suspicious wildlife selling posts and accounts, sufficiently leveraging the multi-modal nature of online social networks. 3) This research delves into an in-depth analysis of trading posts, shedding light on the systematic and organized selling behaviors prevalent in the current landscape. We provide detailed insights into the nature of these behaviors, contributing valuable information for understanding and countering illegal wildlife product trading., Comment: ICWSM 2024
Published: 2024
Full Text: View/download PDF

15. HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Author: Que, Haoran, Duan, Feiyu, He, Liqun, Mou, Yutao, Zhou, Wangchunshu, Liu, Jiaheng, Rong, Wenge, Wang, Zekun Moore, Yang, Jian, Zhang, Ge, Peng, Junran, Zhang, Zhaoxiang, Zhang, Songyang, and Chen, Kai
Subjects: Computer Science - Computation and Language
Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks (e.g., long-context understanding), and many benchmarks have been proposed. However, we observe that long text generation capabilities are not well investigated. Therefore, we introduce the Hierarchical Long Text Generation Benchmark (HelloBench), a comprehensive, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text. Based on Bloom's Taxonomy, HelloBench categorizes long text generation tasks into five subtasks: open-ended QA, summarization, chat, text completion, and heuristic text generation. Besides, we propose Hierarchical Long Text Evaluation (HelloEval), a human-aligned evaluation method that significantly reduces the time and effort required for human evaluation while maintaining a high correlation with human evaluation. We have conducted extensive experiments across around 30 mainstream LLMs and observed that the current LLMs lack long text generation capabilities. Specifically, first, regardless of whether the instructions include explicit or implicit length constraints, we observe that most LLMs cannot generate text that is longer than 4000 words. Second, we observe that while some LLMs can generate longer text, many issues exist (e.g., severe repetition and quality degradation). Third, to demonstrate the effectiveness of HelloEval, we compare HelloEval with traditional metrics (e.g., ROUGE, BLEU, etc.) and LLM-as-a-Judge methods, which show that HelloEval has the highest correlation with human evaluation. We release our code in https://github.com/Quehry/HelloBench.
Published: 2024

16. Scattering diagrams, tight gradings, and generalized positivity

Author: Burcroff, Amanda, Lee, Kyungyong, and Mou, Lang
Subjects: Mathematics - Combinatorics, Mathematics - Commutative Algebra, Mathematics - Algebraic Geometry, Mathematics - Rings and Algebras, Mathematics - Representation Theory, 13F60, 05E10, 14N35
Abstract: In 2013, Lee, Li, and Zelevinsky introduced combinatorial objects called compatible pairs to construct the greedy bases for rank-2 cluster algebras, consisting of indecomposable positive elements including the cluster monomials. Subsequently, Rupel extended this construction to the setting of generalized rank-2 cluster algebras by defining compatible gradings. We discover a new class of combinatorial objects which we call tight gradings. Using this, we give a directly computable, manifestly positive, and elementary but highly nontrivial formula describing rank-2 consistent scattering diagrams. This allows us to show that the coefficients of the wall-functions on a generalized cluster scattering diagram of any rank are positive, which implies the Laurent positivity for generalized cluster algebras and the strong positivity of their theta bases., Comment: 26 pages, this is an announcement of results with a full paper coming soon. Comments welcome
Published: 2024

17. LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Author: Li, Dongheng, Hao, Yongchang, and Mou, Lili
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large language models have become increasingly popular and demonstrated remarkable performance in various natural language processing (NLP) tasks. However, these models are typically computationally expensive and difficult to be deployed in resource-constrained environments. In this paper, we propose LLMR, a novel knowledge distillation (KD) method based on a reward function induced from large language models. We conducted experiments on multiple datasets in the dialogue generation and summarization tasks. Empirical results demonstrate that our LLMR approach consistently outperforms traditional KD methods in different tasks and datasets., Comment: Accepted by LERC COLING 2024
Published: 2024

18. Distributed Deep Koopman Learning for Nonlinear Dynamics

Author: Hao, Wenjian, Wang, Lili, Rai, Ayush, and Mou, Shaoshuai
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Koopman operator theory has proven to be highly significant in system identification, even for challenging scenarios involving nonlinear time-varying systems (NTVS). In this context, we examine a network of connected agents, each with limited observation capabilities, aiming to estimate the dynamics of an NTVS collaboratively. Drawing inspiration from Koopman operator theory, deep neural networks, and distributed consensus, we introduce a distributed algorithm for deep Koopman learning of the dynamics of an NTVS. This approach enables individual agents to approximate the entire dynamics despite having access to only partial state observations. We guarantee consensus not only on the estimated dynamics but also on its structure, i.e., the matrices encountered in the linear equation of the lifted Koopman system. We provide theoretical insights into the convergence of the learning process and accompanying numerical simulations.
Published: 2024

19. Ideal flat and resolved SU(3) Landau levels in three dimensions

Author: Peng, Mian, Wei, Qiang, Yuan, Jiale, Wang, Da-Wei, Yan, Mou, Cai, Han, and Chen, Gang
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics, Quantum Physics
Abstract: Landau levels (LLs) are of great importance for understanding the quantum Hall effect and associated many-body physics. Recently, their three-dimensional (3D) counterparts, i.e., dispersionless 3D LLs with well-defined quantum numbers, have attracted significant attention but have not yet been reported. Here we theoretically propose and experimentally observe 3D LLs with a sharply quantized spectrum in a diamond acoustic lattice, where the eigenstates are characterized by SU(3) quantum numbers. The engineered inhomogeneous hopping strengths not only introduce pseudomagnetic fields that quantize the nodal lines into LLs but also provide three bosonic degrees of freedom, embedding a generic SU(3) symmetry into the LLs. Using a phased array of acoustic sources, we selectively excite distinct eigenstates within the degenerate LL multiplets and visualize their 3D eigenmodes. Importantly, our approach enables the precise reconstruction of SU(3) quantum numbers directly from eigenmode correlations. Our results establish SU(3) LLs as a tractable model in artificial platforms, and pave the way for synthesizing LLs with zero dispersion and countable quantum numbers in arbitrary dimensions., Comment: 6 pages, 4 figures
Published: 2024

20. HOLA-Drone: Hypergraphic Open-ended Learning for Zero-Shot Multi-Drone Cooperative Pursuit

Author: Li, Yang, Zhang, Dengyu, Chen, Junfan, Wen, Ying, Zhang, Qingrui, Mou, Shaoshuai, and Pan, Wei
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Zero-shot coordination (ZSC) is a significant challenge in multi-agent collaboration, aiming to develop agents that can coordinate with unseen partners they have not encountered before. Recent cutting-edge ZSC methods have primarily focused on two-player video games such as OverCooked!2 and Hanabi. In this paper, we extend the scope of ZSC research to the multi-drone cooperative pursuit scenario, exploring how to construct a drone agent capable of coordinating with multiple unseen partners to capture multiple evaders. We propose a novel Hypergraphic Open-ended Learning Algorithm (HOLA-Drone) that continuously adapts the learning objective based on our hypergraphic-form game modeling, aiming to improve cooperative abilities with multiple unknown drone teammates. To empirically verify the effectiveness of HOLA-Drone, we build two different unseen drone teammate pools to evaluate their performance in coordination with various unseen partners. The experimental results demonstrate that HOLA-Drone outperforms the baseline methods in coordination with unseen drone teammates. Furthermore, real-world experiments validate the feasibility of HOLA-Drone in physical systems. Videos can be found on the project homepage~\url{https://sites.google.com/view/hola-drone}., Comment: 10 pages
Published: 2024

21. Robust Robot Walker: Learning Agile Locomotion over Tiny Traps

Author: Zhu, Shaoting, Huang, Runhan, Mou, Linzhan, and Zhao, Hang
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Quadruped robots must exhibit robust walking capabilities in practical applications. In this work, we propose a novel approach that enables quadruped robots to pass various small obstacles, or "tiny traps". Existing methods often rely on exteroceptive sensors, which can be unreliable for detecting such tiny traps. To overcome this limitation, our approach focuses solely on proprioceptive inputs. We introduce a two-stage training framework incorporating a contact encoder and a classification head to learn implicit representations of different traps. Additionally, we design a set of tailored reward functions to improve both the stability of training and the ease of deployment for goal-tracking tasks. To benefit further research, we design a new benchmark for tiny trap task. Extensive experiments in both simulation and real-world settings demonstrate the effectiveness and robustness of our method. Project Page: https://robust-robot-walker.github.io/, Comment: 10 pages, 17 figures
Published: 2024

22. Generic bases of skew-symmetrizable affine type cluster algebras

Author: Mou, Lang and Su, Xiuping
Subjects: Mathematics - Representation Theory, 13F60
Abstract: Geiss, Leclerc and Schr\"oer introduced a class of 1-Iwanaga-Gorenstein algebras $H$ associated to symmetrizable Cartan matrices with acyclic orientations, generalizing the path algebras of acyclic quivers. They also proved that indecomposable rigid $H$-modules of finite projective dimension are in bijection with non-initial cluster variables of the corresponding Fomin-Zelevinsky cluster algebra. In this article, we prove in all affine types that their conjectural Caldero-Chapoton type formula on these modules coincide with the Laurent expression of cluster variables. By taking generic Caldero-Chapoton functions on varieties of modules of finite projective dimension, we obtain bases for affine type cluster algebras with full-rank coefficients containing all cluster monomials., Comment: 23 pages
Published: 2024

23. Distributed Optimization under Edge Agreement with Application in Battery Network Management

Author: Lu, Zehui and Mou, Shaoshuai
Subjects: Mathematics - Optimization and Control, Electrical Engineering and Systems Science - Systems and Control
Abstract: This paper investigates a distributed optimization problem under edge agreements, where each agent in the network is also subject to local convex constraints. Generalized from the concept of consensus, a group of edge agreements represents the constraints defined for neighboring agents, with each pair of neighboring agents required to satisfy one edge agreement constraint. Edge agreements are defined locally to allow more flexibility than a global consensus, enabling heterogeneous coordination within the network. This paper proposes a discrete-time algorithm to solve such problems, providing a theoretical analysis to prove its convergence. Additionally, this paper illustrates the connection between the theory of distributed optimization under edge agreements and distributed model predictive control through a distributed battery network energy management problem. This approach enables a new perspective to formulate and solve network control and optimization problems.
Published: 2024

24. Uni-3DAD: GAN-Inversion Aided Universal 3D Anomaly Detection on Model-free Products

Author: Liu, Jiayu, Mou, Shancong, Gaw, Nathan, and Wang, Yinan
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Anomaly detection is a long-standing challenge in manufacturing systems. Traditionally, anomaly detection has relied on human inspectors. However, 3D point clouds have gained attention due to their robustness to environmental factors and their ability to represent geometric data. Existing 3D anomaly detection methods generally fall into two categories. One compares scanned 3D point clouds with design files, assuming these files are always available. However, such assumptions are often violated in many real-world applications where model-free products exist, such as fresh produce (i.e., ``Cookie", ``Potato", etc.), dentures, bone, etc. The other category compares patches of scanned 3D point clouds with a library of normal patches named memory bank. However, those methods usually fail to detect incomplete shapes, which is a fairly common defect type (i.e., missing pieces of different products). The main challenge is that missing areas in 3D point clouds represent the absence of scanned points. This makes it infeasible to compare the missing region with existing point cloud patches in the memory bank. To address these two challenges, we proposed a unified, unsupervised 3D anomaly detection framework capable of identifying all types of defects on model-free products. Our method integrates two detection modules: a feature-based detection module and a reconstruction-based detection module. Feature-based detection covers geometric defects, such as dents, holes, and cracks, while the reconstruction-based method detects missing regions. Additionally, we employ a One-class Support Vector Machine (OCSVM) to fuse the detection results from both modules. The results demonstrate that (1) our proposed method outperforms the state-of-the-art methods in identifying incomplete shapes and (2) it still maintains comparable performance with the SOTA methods in detecting all other types of anomalies.
Published: 2024

25. Performance Analysis of Photon-Limited Free-Space Optical Communications with Practical Photon-Counting Receivers

Author: Wang, Chen, Xu, Zhiyong, Wang, Jingyuan, Li, Jianhua, Mou, Weifeng, Zhu, Huatao, Zhao, Jiyong, Su, Yang, Wang, Yimin, and Qi, Ailin
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: The non-perfect factors of practical photon-counting receiver are recognized as a significant challenge for long-distance photon-limited free-space optical (FSO) communication systems. This paper presents a comprehensive analytical framework for modeling the statistical properties of time-gated single-photon avalanche diode (TG-SPAD) based photon-counting receivers in presence of dead time, non-photon-number-resolving and afterpulsing effect. Drawing upon the non-Markovian characteristic of afterpulsing effect, we formulate a closed-form approximation for the probability mass function (PMF) of photon counts, when high-order pulse amplitude modulation (PAM) is used. Unlike the photon counts from a perfect photon-counting receiver, which adhere to a Poisson arrival process, the photon counts from a practical TG-SPAD based receiver are instead approximated by a binomial distribution. Additionally, by employing the maximum likelihood (ML) criterion, we derive a refined closed-form formula for determining the threshold in high-order PAM, thereby facilitating the development of an analytical model for the symbol error rate (SER). Utilizing this analytical SER model, the system performance is investigated. The numerical results underscore the crucial need to suppress background radiation below the tolerated threshold and to maintain a sufficient number of gates in order to achieve a target SER.
Published: 2024

26. DimeRec: A Unified Framework for Enhanced Sequential Recommendation via Generative Diffusion Models

Author: Li, Wuchao, Huang, Rui, Zhao, Haijun, Liu, Chi, Zheng, Kai, Liu, Qi, Mou, Na, Zhou, Guorui, Lian, Defu, Song, Yang, Bao, Wentian, Yu, Enyun, and Ou, Wenwu
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Sequential Recommendation (SR) plays a pivotal role in recommender systems by tailoring recommendations to user preferences based on their non-stationary historical interactions. Achieving high-quality performance in SR requires attention to both item representation and diversity. However, designing an SR method that simultaneously optimizes these merits remains a long-standing challenge. In this study, we address this issue by integrating recent generative Diffusion Models (DM) into SR. DM has demonstrated utility in representation learning and diverse image generation. Nevertheless, a straightforward combination of SR and DM leads to sub-optimal performance due to discrepancies in learning objectives (recommendation vs. noise reconstruction) and the respective learning spaces (non-stationary vs. stationary). To overcome this, we propose a novel framework called DimeRec (\textbf{Di}ffusion with \textbf{m}ulti-interest \textbf{e}nhanced \textbf{Rec}ommender). DimeRec synergistically combines a guidance extraction module (GEM) and a generative diffusion aggregation module (DAM). The GEM extracts crucial stationary guidance signals from the user's non-stationary interaction history, while the DAM employs a generative diffusion process conditioned on GEM's outputs to reconstruct and generate consistent recommendations. Our numerical experiments demonstrate that DimeRec significantly outperforms established baseline methods across three publicly available datasets. Furthermore, we have successfully deployed DimeRec on a large-scale short video recommendation platform, serving hundreds of millions of users. Live A/B testing confirms that our method improves both users' time spent and result diversification.
Published: 2024

27. Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter

Author: Haque, Farhanul, Al-Hasan, Md., Mou, Sumaiya Tabssum, Miah, Abu Saleh Musa, Shin, Jungpil, and Rahim, Md Abdur
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The Bengali language is the 5th most spoken native and 7th most spoken language in the world, and Bengali handwritten character recognition has attracted researchers for decades. However, other languages such as English, Arabic, Turkey, and Chinese character recognition have contributed significantly to developing handwriting recognition systems. Still, little research has been done on Bengali character recognition because of the similarity of the character, curvature and other complexities. However, many researchers have used traditional machine learning and deep learning models to conduct Bengali hand-written recognition. The study employed a convolutional neural network (CNN) with ensemble transfer learning and a multichannel attention network. We generated the feature from the two branches of the CNN, including Inception Net and ResNet and then produced an ensemble feature fusion by concatenating them. After that, we applied the attention module to produce the contextual information from the ensemble features. Finally, we applied a classification module to refine the features and classification. We evaluated the proposed model using the CAMTERdb 3.1.2 data set and achieved 92\% accuracy for the raw dataset and 98.00\% for the preprocessed dataset. We believe that our contribution to the Bengali handwritten character recognition domain will be considered a great development.
Published: 2024

28. A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

Author: Wang, Chen, Xu, Zhiyong, Wang, Jingyuan, Li, Jianhua, Mou, Weifeng, Zhu, Huatao, Zhao, Jiyong, Su, Yang, Wang, Yimin, and Qi, Ailin
Subjects: Electrical Engineering and Systems Science - Signal Processing
Abstract: This paper proposes a method for estimating and detecting optical signals in practical photon-counting receivers. There are two important aspects of non-perfect photon-counting receivers, namely, (i) dead time which results in blocking loss, and (ii) non-photon-number-resolving, which leads to counting loss during the gate-ON interval. These factors introduce nonlinear distortion to the detected photon counts. The detected photon counts depend not only on the optical intensity but also on the signal waveform, and obey a Poisson binomial process. Using the discrete Fourier transform characteristic function (DFT-CF) method, we derive the probability mass function (PMF) of the detected photon counts. Furthermore, unlike conventional methods that assume an ideal rectangle wave, we propose a novel signal estimation and decision method applicable to arbitrary waveform. We demonstrate that the proposed method achieves superior error performance compared to conventional methods. The proposed algorithm has the potential to become an essential signal processing tool for photon-counting receivers.
Published: 2024

29. FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant

Author: Huang, Zhengchao, Xia, Bin, Lin, Zicheng, Mou, Zhun, and Yang, Wenming
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The rapid advancement of deepfake technologies has sparked widespread public concern, particularly as face forgery poses a serious threat to public information security. However, the unknown and diverse forgery techniques, varied facial features and complex environmental factors pose significant challenges for face forgery analysis. Existing datasets lack descriptions of these aspects, making it difficult for models to distinguish between real and forged faces using only visual information amid various confounding factors. In addition, existing methods do not yield user-friendly and explainable results, complicating the understanding of the model's decision-making process. To address these challenges, we introduce a novel Open-World Face Forgery Analysis VQA (OW-FFA-VQA) task and the corresponding benchmark. To tackle this task, we first establish a dataset featuring a diverse collection of real and forged face images with essential descriptions and reliable forgery reasoning. Base on this dataset, we introduce FFAA: Face Forgery Analysis Assistant, consisting of a fine-tuned Multimodal Large Language Model (MLLM) and Multi-answer Intelligent Decision System (MIDS). By integrating hypothetical prompts with MIDS, the impact of fuzzy classification boundaries is effectively mitigated, enhancing the model's robustness. Extensive experiments demonstrate that our method not only provides user-friendly explainable results but also significantly boosts accuracy and robustness compared to previous methods., Comment: 17 pages, 18 figures; project page: https://ffaa-vl.github.io
Published: 2024

30. Image-based Freeform Handwriting Authentication with Energy-oriented Self-Supervised Learning

Author: Wang, Jingyao, Mou, Luntian, Zheng, Changwen, and Gao, Wen
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Freeform handwriting authentication verifies a person's identity from their writing style and habits in messy handwriting data. This technique has gained widespread attention in recent years as a valuable tool for various fields, e.g., fraud prevention and cultural heritage protection. However, it still remains a challenging task in reality due to three reasons: (i) severe damage, (ii) complex high-dimensional features, and (iii) lack of supervision. To address these issues, we propose SherlockNet, an energy-oriented two-branch contrastive self-supervised learning framework for robust and fast freeform handwriting authentication. It consists of four stages: (i) pre-processing: converting manuscripts into energy distributions using a novel plug-and-play energy-oriented operator to eliminate the influence of noise; (ii) generalized pre-training: learning general representation through two-branch momentum-based adaptive contrastive learning with the energy distributions, which handles the high-dimensional features and spatial dependencies of handwriting; (iii) personalized fine-tuning: calibrating the learned knowledge using a small amount of labeled data from downstream tasks; and (iv) practical application: identifying individual handwriting from scrambled, missing, or forged data efficiently and conveniently. Considering the practicality, we construct EN-HA, a novel dataset that simulates data forgery and severe damage in real applications. Finally, we conduct extensive experiments on six benchmark datasets including our EN-HA, and the results prove the robustness and efficiency of SherlockNet., Comment: Accepted by TMM
Published: 2024

31. ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack

Author: Gao, Ziyi, Chen, Kai, Wei, Zhipeng, Mou, Tingshu, Chen, Jingjing, Tan, Zhiyu, Li, Hao, and Jiang, Yu-Gang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent diffusion-based unrestricted attacks generate imperceptible adversarial examples with high transferability compared to previous unrestricted attacks and restricted attacks. However, existing works on diffusion-based unrestricted attacks are mostly focused on images yet are seldom explored in videos. In this paper, we propose the Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack (ReToMe-VA), which is the first framework to generate imperceptible adversarial video clips with higher transferability. Specifically, to achieve spatial imperceptibility, ReToMe-VA adopts a Timestep-wise Adversarial Latent Optimization (TALO) strategy that optimizes perturbations in diffusion models' latent space at each denoising step. TALO offers iterative and accurate updates to generate more powerful adversarial frames. TALO can further reduce memory consumption in gradient computation. Moreover, to achieve temporal imperceptibility, ReToMe-VA introduces a Recursive Token Merging (ReToMe) mechanism by matching and merging tokens across video frames in the self-attention module, resulting in temporally consistent adversarial videos. ReToMe concurrently facilitates inter-frame interactions into the attack process, inducing more diverse and robust gradients, thus leading to better adversarial transferability. Extensive experiments demonstrate the efficacy of ReToMe-VA, particularly in surpassing state-of-the-art attacks in adversarial transferability by more than 14.16% on average.
Published: 2024

32. The Transition from Galaxy-wide Gas Inflow to Outflow in Quasar Host Galaxies

Author: He, Zhicheng, Chen, Zhifu, Liu, Guilin, Wang, Tinggui, Ho, Luis C., Wang, Junxian, Bian, Weihao, Cai, Zheng, Mou, Guobin, Gu, Qiusheng, and Wang, Zhiwen
Subjects: Astrophysics - Astrophysics of Galaxies
Abstract: Galactic-wide outflows driven by active galactic nuclei (AGNs) is a routinely invoked feedback mechanism in galaxy evolution models. Hitherto, the interplay among the interstellar gas on galactic scales, the propagation of AGN outflows and the fundamental AGN parameters during evolution remains elusive. Powerful nuclear outflows are found to favorably exist at early AGN stages usually associated with high accretion rates and weak narrow emission lines. In a sample of quasars emitting Mg II narrow absorption lines (NALs) from the Sloan Digital Sky Survey, we discover an unprecedented phenomenon where galaxy-scale inflow-dominated transforming into outflow-dominated gas accompanied by an increasing strength of the narrow [O III] line, at a confidence level of 6.7{\sigma}. The fact that nuclear outflows diminish while galaxy-wide outflows intensifies as AGNs evolve implies that early-stage outflows interact with interstellar medium on galactic scales and trigger the gradual transformation into galaxy-wide outflows, providing observational links to the hypothetical multi-stage propagation of AGN outflows that globally regulates galaxy evolution., Comment: Accepted by SCIENCE CHINA Physics, Mechanics & Astronomy; 15 pages, 4 figures and 6 appendix figures
Published: 2024
Full Text: View/download PDF

33. A Stochastic Precipitating Quasi-Geostrophic Model

Author: Chen, Nan, Mou, Changhong, Smith, Leslie M., and Zhang, Yeyu
Subjects: Physics - Fluid Dynamics, Mathematics - Numerical Analysis, Physics - Geophysics
Abstract: Efficient and effective modeling of complex systems, incorporating cloud physics and precipitation, is essential for accurate climate modeling and forecasting. However, simulating these systems is computationally demanding since microphysics has crucial contributions to the dynamics of moisture and precipitation. In this paper, appropriate stochastic models are developed for the phase-transition dynamics of water, focusing on the precipitating quasi-geostrophic (PQG) model as a prototype. By treating the moisture, phase transitions, and latent heat release as integral components of the system, the PQG model constitutes a set of partial differential equations (PDEs) that involve Heaviside nonlinearities due to phase changes of water. Despite systematically characterizing the precipitation physics, expensive iterative algorithms are needed to find a PDE inversion at each numerical integration time step. As a crucial step toward building an effective stochastic model, a computationally efficient Markov jump process is designed to randomly simulate transitions between saturated and unsaturated states that avoids using the expensive iterative solver. The transition rates, which are deterministic, are derived from the physical fields, guaranteeing physical and statistical consistency with nature. Furthermore, to maintain the consistent spatial pattern of precipitation, the stochastic model incorporates an adaptive parameterization that automatically adjusts the transitions based on spatial information. Numerical tests show the stochastic model retains critical properties of the original PQG system while significantly reducing computational demands. It accurately captures observed precipitation patterns, including the spatial distribution and temporal variability of rainfall, alongside reproducing essential dynamic features such as potential vorticity fields and zonal mean flows.
Published: 2024

34. Shrinking Coarsened Win Ratio and Testing of Composite Endpoint

Author: Mou, Yunhan, Kyriakides, Tassos, Hummel, Scott, Li, Fan, and Huang, Yuan
Subjects: Statistics - Methodology
Abstract: Composite endpoints consisting of both terminal and non-terminal events, such as death and hospitalization, are frequently used as primary endpoints in cardiovascular clinical trials. The Win Ratio method (WR) proposed by Pocock et al. (2012) [1] employs a hierarchical structure to combine fatal and non-fatal events by giving death information an absolute priority, which adversely affects power if the treatment effect is mainly on the non-fatal outcomes. We hereby propose the Shrinking Coarsened Win Ratio method (SCWR) that releases the strict hierarchical structure of the standard WR by adding stages with coarsened thresholds shrinking to zero. A weighted adaptive approach is developed to determine the thresholds in SCWR. This method preserves the good statistical properties of the standard WR and has a greater capacity to detect treatment effects on non-fatal events. We show that SCWR has an overall more favorable performance than WR in our simulation that addresses the influence of follow-up time, the association between events, and the treatment effect levels, as well as a case study based on the Digitalis Investigation Group clinical trial data.
Published: 2024

35. SARO: Space-Aware Robot System for Terrain Crossing via Vision-Language Model

Author: Zhu, Shaoting, Li, Derun, Mou, Linzhan, Liu, Yong, Xu, Ningyi, and Zhao, Hang
Subjects: Computer Science - Robotics
Abstract: The application of vision-language models (VLMs) has achieved impressive success in various robotics tasks. However, there are few explorations for these foundation models used in quadruped robot navigation through terrains in 3D environments. In this work, we introduce SARO (Space Aware Robot System for Terrain Crossing), an innovative system composed of a high-level reasoning module, a closed-loop sub-task execution module, and a low-level control policy. It enables the robot to navigate across 3D terrains and reach the goal position. For high-level reasoning and execution, we propose a novel algorithmic system taking advantage of a VLM, with a design of task decomposition and a closed-loop sub-task execution mechanism. For low-level locomotion control, we utilize the Probability Annealing Selection (PAS) method to effectively train a control policy by reinforcement learning. Numerous experiments show that our whole system can accurately and robustly navigate across several 3D terrains, and its generalization ability ensures the applications in diverse indoor and outdoor scenarios and terrains. Project page: https://saro-vlm.github.io/, Comment: 12 pages, 9 figures
Published: 2024

36. Puzzle Ideals for Grassmannians

Author: Mou, Chenqi and Shang, Weifeng
Subjects: Mathematics - Combinatorics, Computer Science - Symbolic Computation, Mathematics - Commutative Algebra, 05E14 (Primary) 13F20, 14N15 (Secondary)
Abstract: Puzzles are a versatile combinatorial tool to interpret the Littlewood-Richardson coefficients for Grassmannians. In this paper, we propose the concept of puzzle ideals whose varieties one-one correspond to the tilings of puzzles and present an algebraic framework to construct the puzzle ideals which works with the Knutson-Tao-Woodward puzzle and its $T$-equivariant and $K$-theoretic variants for Grassmannians. For puzzles for which one side is free, we propose the side-free puzzle ideals whose varieties one-one correspond to the tilings of side-free puzzles, and the elimination ideals of the side-free puzzle ideals contain all the information of the structure constants for Grassmannians with respect to the free side. Besides the underlying algebraic importance of the introduction of these puzzle ideals is the computational feasibility to find all the tilings of the puzzles for Grassmannians by solving the defining polynomial systems, demonstrated with illustrative puzzles via computation of Gr\"obner bases., Comment: 40 pages, 21 figures
Published: 2024

37. Triggering the Untriggered: The First Einstein Probe-Detected Gamma-Ray Burst 240219A and Its Implications

Author: Yin, Yi-Han Iris, Zhang, Bin-Bin, Yang, Jun, Sun, Hui, Zhang, Chen, Shao, Yi-Xuan, Hu, You-Dong, Zhu, Zi-Pei, Xu, Dong, An, Li, Gao, He, Wu, Xue-Feng, Zhang, Bing, Castro-Tirado, Alberto Javier, Pandey, Shashi B., Rau, Arne, Lei, Weihua, Xie, Wei, Ghirlanda, Giancarlo, Piro, Luigi, O'Brien, Paul, Troja, Eleonora, Jonker, Peter, Yu, Yun-Wei, An, Jie, Chen, Run-Chao, Chen, Yi-Jing, Dong, Xiao-Fei, Eyles-Ferris, Rob, Fan, Zhou, Fu, Shao-Yu, Fynbo, Johan P. U., Gao, Xing, Huang, Yong-Feng, Jiang, Shuai-Qing, Jiang, Ya-Hui, Julakanti, Yashaswi, Kuulkers, Erik, Lao, Qing-Hui, Li, Dongyue, Ling, Zhi-Xing, Liu, Xing, Liu, Yuan, Mou, Jia-Yu, Varun, Wei, Daming, Wu, Qinyu, Yadav, Muskan, Yang, Yu-Han, Yuan, Weimin, and Zhang, Shuang-Nan
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: The Einstein Probe (EP) achieved its first detection and localization of a bright X-ray flare, EP240219a, on February 19, 2024, during its commissioning phase. Subsequent targeted searches triggered by the EP240219a alert identified a faint, untriggered gamma-ray burst (GRB) in the archived data of Fermi/GBM, Swift/BAT, Insight-HXMT/HE and INTEGRAL/SPI-ACS. The EP/WXT light curve reveals a long duration of approximately 160 seconds with a slow decay, whereas the Fermi/GBM light curve shows a total duration of approximately 70 seconds. The peak in the Fermi/GBM light curve occurs slightly later with respect to the peak seen in the EP/WXT light curve. Our spectral analysis shows that a single cutoff power-law model effectively describes the joint EP/WXT-Fermi/GBM spectra in general, indicating coherent broad emission typical of GRBs. The model yielded a photon index of $\sim -1.70 \pm 0.05$ and a peak energy of $\sim 257 \pm 134$ keV. After detection of GRB 240219A, long-term observations identified several candidates in optical and radio wavelengths, none of which was confirmed as the afterglow counterpart during subsequent optical and near-infrared follow-ups. The analysis of GRB 240219A classifies it as an X-ray rich GRB with a high peak energy, presenting both challenges and opportunities for studying the physical origins of X-ray flashes (XRFs), X-ray rich GRBs (XRRs), and classical GRBs (C-GRBs). Furthermore, linking the cutoff power-law component to non-thermal synchrotron radiation suggests that the burst is driven by a Poynting flux-dominated outflow., Comment: 14 pages, 8 figures, 3 tables
Published: 2024

38. Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction

Author: Liu, Yili, Mou, Linzhan, Yu, Xuan, Han, Chenrui, Mao, Sitong, Xiong, Rong, and Wang, Yue
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Accurate perception of the dynamic environment is a fundamental task for autonomous driving and robot systems. This paper introduces Let Occ Flow, the first self-supervised work for joint 3D occupancy and occupancy flow prediction using only camera inputs, eliminating the need for 3D annotations. Utilizing TPV for unified scene representation and deformable attention layers for feature aggregation, our approach incorporates a novel attention-based temporal fusion module to capture dynamic object dependencies, followed by a 3D refine module for fine-gained volumetric representation. Besides, our method extends differentiable rendering to 3D volumetric flow fields, leveraging zero-shot 2D segmentation and optical flow cues for dynamic decomposition and motion optimization. Extensive experiments on nuScenes and KITTI datasets demonstrate the competitive performance of our approach over prior state-of-the-art methods. Our project page is available at https://eliliu2233.github.io/letoccflow/, Comment: Accepted to CoRL 2024
Published: 2024

39. On Bellman equations for continuous-time policy evaluation I: discretization and approximation

Author: Mou, Wenlong and Zhu, Yuhua
Subjects: Computer Science - Machine Learning, Mathematics - Numerical Analysis, Mathematics - Optimization and Control, Mathematics - Probability
Abstract: We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable numerical schemes that are compatible with discrete-time reinforcement learning (RL) with function approximation. We establish high-order numerical accuracy as well as the approximation error guarantees for the proposed approach. In contrast to discrete-time RL problems where the approximation factor depends on the effective horizon, we obtain a bounded approximation factor using the underlying elliptic structures, even if the effective horizon diverges to infinity., Comment: WM and YZ contributed equally to this work
Published: 2024

40. On well (edge) dominated and equimatchable strong product graphs

Author: Cao, Yixin, Mou, Guiqiang, and Wang, Jianxin
Subjects: Mathematics - Combinatorics
Abstract: A graph is well-(edge-)dominated if every minimal (edge) dominating set is minimum. A graph is equimatchable if every maximal matching is maximum. We study these concepts on strong product graphs. We fully characterize well-edge-dominated and equimatchable strong product graphs of nontrivial graphs, and identify a large family of graphs whose strong products with any well-dominated graph are well-dominated.
Published: 2024

41. On the equivalence of Noether charge and Hilbert action boundary term formulae for the black hole entropy in F($R_{abcd}$) gravity theory

Author: Guo, Wei, Guo, Xiyao, Li, Mingfeng, Mou, Zili, and Zhang, Hongbao
Subjects: High Energy Physics - Theory, General Relativity and Quantum Cosmology
Abstract: By working with the covariant phase space formalism, we have shown that not only can the Hamiltonian conjugate to a Killing vector field $\xi$ be expressed as the sum of the associated Noether charge and $\xi$ contracted with the Hilbert action boundary term for F($R_{abcd}$) gravity, but also be written as its contraction with another $\xi$ independent tensor field. With this, we have proven the equivalence of Noether charge and Hilbert action boundary term formulae for the stationary black hole entropy in F($R_{abcd}$) gravity, which is further substantiated by our explicit computation using both formulae., Comment: typos corrected, clarifications made, version to appear in PRD
Published: 2024
Full Text: View/download PDF

42. Low-Latency Layer-Aware Proactive and Passive Container Migration in Meta Computing

Author: Liu, Mengjie, Li, Yihua, Mou, Fangyi, Tang, Zhiqing, Lou, Jiong, Guo, Jianxiong, and Jia, Weijia
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Meta computing is a new computing paradigm that aims to efficiently utilize all network computing resources to provide fault-tolerant, personalized services with strong security and privacy guarantees. It also seeks to virtualize the Internet as many meta computers. In meta computing, tasks can be assigned to containers at edge nodes for processing, based on container images with multiple layers. The dynamic and resource-constrained nature of meta computing environments requires an optimal container migration strategy for mobile users to minimize latency. However, the problem of container migration in meta computing has not been thoroughly explored. To address this gap, we present low-latency, layer-aware container migration strategies that consider both proactive and passive migration. Specifically: 1) We formulate the container migration problem in meta computing, taking into account layer dependencies to reduce migration costs and overall task duration by considering four delays. 2) We introduce a reinforcement learning algorithm based on policy gradients to minimize total latency by identifying layer dependencies for action selection, making decisions for both proactive and passive migration. Expert demonstrations are introduced to enhance exploitation. 3) Experiments using real data trajectories show that the algorithm outperforms baseline algorithms, achieving lower total latency., Comment: to be published in IEEE ICMC 2024
Published: 2024

43. Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion

Author: Mou, Linzhan, Chen, Jun-Kun, and Wang, Yu-Xiong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: This paper proposes Instruct 4D-to-4D that achieves 4D awareness and spatial-temporal consistency for 2D diffusion models to generate high-quality instruction-guided dynamic scene editing results. Traditional applications of 2D diffusion models in dynamic scene editing often result in inconsistency, primarily due to their inherent frame-by-frame editing methodology. Addressing the complexities of extending instruction-guided editing to 4D, our key insight is to treat a 4D scene as a pseudo-3D scene, decoupled into two sub-problems: achieving temporal consistency in video editing and applying these edits to the pseudo-3D scene. Following this, we first enhance the Instruct-Pix2Pix (IP2P) model with an anchor-aware attention module for batch processing and consistent editing. Additionally, we integrate optical flow-guided appearance propagation in a sliding window fashion for more precise frame-to-frame editing and incorporate depth-based projection to manage the extensive data of pseudo-3D scenes, followed by iterative editing to achieve convergence. We extensively evaluate our approach in various scenes and editing instructions, and demonstrate that it achieves spatially and temporally consistent editing results, with significantly enhanced detail and sharpness over the prior art. Notably, Instruct 4D-to-4D is general and applicable to both monocular and challenging multi-camera scenes. Code and more results are available at immortalco.github.io/Instruct-4D-to-4D., Comment: CVPR 2024
Published: 2024

44. Interaction of an outflow with surrounding gaseous clouds as the origin of the late-time radio flares in TDEs

Author: Zhuang, Jialun, Shen, Rong-Feng, Mou, Guobin, and Lu, Wenbin
Subjects: Astrophysics - High Energy Astrophysical Phenomena
Abstract: Close encounter between a star and a supermassive black hole (SMBH) results in the tidal disruption of the star, known as a tidal disruption event (TDE). Recently, a few TDEs, e.g., ASASSN-15oi and AT2018hyz, have shown late-time (hundreds of days after their UV/optical peaks) radio flares with radio luminosities of $10^{38\sim39}$ erg/s. The super-Eddington fallback or accretion in a TDE may generate a mass outflow. Here we investigate a scenario that the late-time radio flares come from the interaction of the outflow with the circum-nuclear gaseous clouds, in addition to the slow-evolving emission component due to the outflow-diffuse medium interaction. We calculate the associated radio temporal and spectral signatures and find that they reproduce well the observations. The outflows have the inferred velocity of 0.2$\sim0.8$ c, the total mass of $10^{-3}\sim10^{-1}$ $\mathrm{M_{\odot}}$ and the ejection duration of a month to a year. The distances of the clouds to the SMBH are $0.1\sim1$ pc. This scenario has advantages in explaining the long delay, sharpness of the rise and the multiplicity of the late radio flares. Future observations may build up a much larger sample of late-time radio flares and enable their use as a probe of the TDE physics and the host circumnuclear environment., Comment: 13 pages, 13 figures. Submitted to ApJ. A new version with some modifications. Comments are welcome
Published: 2024

45. A Dual-View Approach to Classifying Radiology Reports by Co-Training

Author: Han, Yutong, Yuan, Yan, and Mou, Lili
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Radiology report analysis provides valuable information that can aid with public health initiatives, and has been attracting increasing attention from the research community. In this work, we present a novel insight that the structure of a radiology report (namely, the Findings and Impression sections) offers different views of a radiology scan. Based on this intuition, we further propose a co-training approach, where two machine learning models are built upon the Findings and Impression sections, respectively, and use each other's information to boost performance with massive unlabeled data in a semi-supervised manner. We conducted experiments in a public health surveillance study, and results show that our co-training approach is able to improve performance using the dual views and surpass competing supervised and semi-supervised methods., Comment: Accepted by LREC-COLING 2024
Published: 2024

46. DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs

Author: Lin, Haokun, Xu, Haobo, Wu, Yichen, Cui, Jingzhi, Zhang, Yingtao, Mou, Linzhan, Song, Linqi, Sun, Zhenan, and Wei, Ying
Subjects: Computer Science - Computation and Language
Abstract: Quantization of large language models (LLMs) faces significant challenges, particularly due to the presence of outlier activations that impede efficient low-bit representation. Traditional approaches predominantly address $\textit{Normal Outliers}$, which are activations across all tokens with relatively large magnitudes. However, these methods struggle with smoothing $\textit{Massive Outliers}$ that display significantly larger values, which leads to significant performance degradation in low-bit quantization. In this paper, we introduce DuQuant, a novel approach that utilizes rotation and permutation transformations to more effectively mitigate both massive and normal outliers. First, DuQuant starts by constructing rotation matrices, using specific outlier dimensions as prior knowledge, to redistribute outliers to adjacent channels by block-wise rotation. Second, We further employ a zigzag permutation to balance the distribution of outliers across blocks, thereby reducing block-wise variance. A subsequent rotation further smooths the activation landscape, enhancing model performance. DuQuant simplifies the quantization process and excels in managing outliers, outperforming the state-of-the-art baselines across various sizes and types of LLMs on multiple tasks, even with 4-bit weight-activation quantization. Our code is available at https://github.com/Hsu1023/DuQuant., Comment: 26 pages, 13 figures, Website at https://duquant.github.io
Published: 2024

47. DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation

Author: Xie, Qihang, Guo, Mengguo, Mou, Lei, Zhang, Dan, Chen, Da, Shan, Caifeng, Zhao, Yitian, Su, Ruisheng, and Zhang, Jiong
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: Cerebrovascular diseases (CVDs) remain a leading cause of global disability and mortality. Digital Subtraction Angiography (DSA) sequences, recognized as the golden standard for diagnosing CVDs, can clearly visualize the dynamic flow and reveal pathological conditions within the cerebrovasculature. Therefore, precise segmentation of cerebral arteries (CAs) and classification between their main trunks and branches are crucial for physicians to accurately quantify diseases. However, achieving accurate CA segmentation in DSA sequences remains a challenging task due to small vessels with low contrast, and ambiguity between vessels and residual skull structures. Moreover, the lack of publicly available datasets limits exploration in the field. In this paper, we introduce a DSA Sequence-based Cerebral Artery segmentation dataset (DSCA), the first publicly accessible dataset designed specifically for pixel-level semantic segmentation of CAs. Additionally, we propose DSANet, a spatio-temporal network for CA segmentation in DSA sequences. Unlike existing DSA segmentation methods that focus only on a single frame, the proposed DSANet introduces a separate temporal encoding branch to capture dynamic vessel details across multiple frames. To enhance small vessel segmentation and improve vessel connectivity, we design a novel TemporalFormer module to capture global context and correlations among sequential frames. Furthermore, we develop a Spatio-Temporal Fusion (STF) module to effectively integrate spatial and temporal features from the encoder. Extensive experiments demonstrate that DSANet outperforms other state-of-the-art methods in CA segmentation, achieving a Dice of 0.9033.
Published: 2024

48. Exploring the redundancy of Radon transform using a set of partial derivative equations: Could we precisely reconstruct the image from a sparse-view projection without any image prior?

Author: Mou, Xuanqin and Duan, Jiayu
Subjects: Physics - Medical Physics
Abstract: In this study, we proposed a universal n-th order partial differential equation (PDE) of 2-D Radon transform to disclose the relationship of Radon transform over a neighborhood of the integral line, named as local correlation equation (LCE). It is independent to the imaging object while in present CT theory, the relationship of Radon transform over neighboring integral line had been described depended on the imaging objection. Hence, the LCE is the first PDE to reveal the universal correlation property of Radon transform. The LCE can be applied to either of 2D CT projections or any 2-D profile of 3-D CT projections. The correlation also provides the redundancy property of Radon transform. In this regard, we carried out a preliminary study on sparse-view CT reconstruction by using a discrete first order LCE to interpolate missing projections in sparse-view sampling without knowing image prior. Meanwhile, we also proposed a unified reconstruction framework that combines a regularized iterative reconstruction with the LCE based interpolation method to handle the sparse-view CT problem with higher sparsity level. The conducted experiments have credibly validated the proposed LCE, projection interpolation method, and the unified reconstruction scheme. The result of this study suggests an attractive possibility that a sparse-view projection may contain enough information of the complete projection, by which projection completeness in CT scanning may not be necessity. This possibility would bring profound changes in CT geometry designs and reconstruction algorithms. Moreover, this study initiates an appealing research topic of exploring the redundancy property of Radon transform and investigating new CT theories based on the redundancy property, which will boost the further development of CT reconstructions., Comment: correction of typos and some results have been updated
Published: 2024

49. Deep Koopman Learning using the Noisy Data

Author: Hao, Wenjian, Upadhyay, Devesh, and Mou, Shaoshuai
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: This paper proposes a data-driven framework to learn a finite-dimensional approximation of a Koopman operator for approximating the state evolution of a dynamical system under noisy observations. To this end, our proposed solution has two main advantages. First, the proposed method only requires the measurement noise to be bounded. Second, the proposed method modifies the existing deep Koopman operator formulations by characterizing the effect of the measurement noise on the Koopman operator learning and then mitigating it by updating the tunable parameter of the observable functions of the Koopman operator, making it easy to implement. The performance of the proposed method is demonstrated on several standard benchmarks. We further compare the presented method with similar methods proposed in the latest literature on Koopman learning.
Published: 2024

50. ReVideo: Remake a Video with Motion and Content Control

Author: Mou, Chong, Cao, Mingdeng, Wang, Xintao, Zhang, Zhaoyang, Shan, Ying, and Zhang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Despite significant advancements in video generation and editing using diffusion models, achieving accurate and localized video editing remains a substantial challenge. Additionally, most existing video editing methods primarily focus on altering visual content, with limited research dedicated to motion editing. In this paper, we present a novel attempt to Remake a Video (ReVideo) which stands out from existing methods by allowing precise video editing in specific areas through the specification of both content and motion. Content editing is facilitated by modifying the first frame, while the trajectory-based motion control offers an intuitive user interaction experience. ReVideo addresses a new task involving the coupling and training imbalance between content and motion control. To tackle this, we develop a three-stage training strategy that progressively decouples these two aspects from coarse to fine. Furthermore, we propose a spatiotemporal adaptive fusion module to integrate content and motion control across various sampling steps and spatial locations. Extensive experiments demonstrate that our ReVideo has promising performance on several accurate video editing applications, i.e., (1) locally changing video content while keeping the motion constant, (2) keeping content unchanged and customizing new motion trajectories, (3) modifying both content and motion trajectories. Our method can also seamlessly extend these applications to multi-area editing without specific training, demonstrating its flexibility and robustness.
Published: 2024

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

73,236 results on '"Mou A"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources