31,444 results on '"Liu, Rui"'
Search Results
2. Being Too Asian: Migrant Student Time and Resistance within the Canadian University
- Author
-
Govi, Vedanth, Liu, Rui, and Tian, Ian Liujia
- Published
- 2023
3. Analyst Reports and Stock Performance: Evidence from the Chinese Market
- Author
-
Liu, Rui, Liang, Jiayou, Chen, Haolong, and Hu, Yujia
- Subjects
Computer Science - Computation and Language ,Quantitative Finance - Computational Finance - Abstract
This article applies natural language processing (NLP) to extract and quantify textual information to predict stock performance. Using an extensive dataset of Chinese analyst reports and employing a customized BERT deep learning model for Chinese text, this study categorizes the sentiment of the reports as positive, neutral, or negative. The findings underscore the predictive capacity of this sentiment indicator for stock volatility, excess returns, and trading volume. Specifically, analyst reports with strong positive sentiment will increase excess return and intraday volatility, and vice versa, reports with strong negative sentiment also increase volatility and trading volume, but decrease future excess return. The magnitude of this effect is greater for positive sentiment reports than for negative sentiment reports. This article contributes to the empirical literature on sentiment analysis and the response of the stock market to news in the Chinese stock market.
- Published
- 2024
4. Voltage Support Capability Analysis of Grid-Forming Inverters with Current-Limiting Control Under Asymmetrical Grid Faults
- Author
-
Zhang, Han, Liu, Rui, Yunwei, and Li
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Voltage support capability is critical for grid-forming (GFM) inverters with current-limiting control (CLC) during grid faults. Despite the findings on the voltage support for symmetrical grid faults, its applicability to more common but complex asymmetrical grid faults has yet to be verified rigorously. This letter fills the gap in the voltage support capability analysis for asymmetrical grid faults by establishing and analyzing positive- and negative-sequence equivalent circuit models, where the virtual impedance is adopted to emulate various CLCs. It is discovered that matching the phase angle of the virtual impedance, emulated by the CLC, with that of the composed impedance from the capacitor to the fault location can maximize the voltage support capability of GFM inverters under asymmetrical grid faults. Rigorous theoretical analysis and experimental results verify this conclusion.
- Published
- 2024
5. Flexible Coded Distributed Convolution Computing for Enhanced Fault Tolerance and Numerical Stability in Distributed CNNs
- Author
-
Tan, Shuo, Liu, Rui, Long, XianLei, Wan, Kai, Song, Linqi, and Li, Yong
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Information Theory ,Computer Science - Machine Learning - Abstract
Deploying Convolutional Neural Networks (CNNs) on resource-constrained devices necessitates efficient management of computational resources, often via distributed systems susceptible to latency from straggler nodes. This paper introduces the Flexible Coded Distributed Convolution Computing (FCDCC) framework to enhance fault tolerance and numerical stability in distributed CNNs. We extend Coded Distributed Computing (CDC) with Circulant and Rotation Matrix Embedding (CRME) which was originally proposed for matrix multiplication to high-dimensional tensor convolution. For the proposed scheme, referred to as Numerically Stable Coded Tensor Convolution (NSCTC) scheme, we also propose two new coded partitioning schemes: Adaptive-Padding Coded Partitioning (APCP) for input tensor and Kernel-Channel Coded Partitioning (KCCP) for filter tensor. These strategies enable linear decomposition of tensor convolutions and encoding them into CDC sub-tasks, combining model parallelism with coded redundancy for robust and efficient execution. Theoretical analysis identifies an optimal trade-off between communication and storage costs. Empirical results validate the framework's effectiveness in computational efficiency, fault tolerance, and scalability across various CNN architectures., Comment: 14 pages, 6 figures
- Published
- 2024
6. FairDgcl: Fairness-aware Recommendation with Dynamic Graph Contrastive Learning
- Author
-
Chen, Wei, Yuan, Meng, Zhang, Zhao, Xie, Ruobing, Zhuang, Fuzhen, Wang, Deqing, and Liu, Rui
- Subjects
Computer Science - Artificial Intelligence - Abstract
As trustworthy AI continues to advance, the fairness issue in recommendations has received increasing attention. A recommender system is considered unfair when it produces unequal outcomes for different user groups based on user-sensitive attributes (e.g., age, gender). Some researchers have proposed data augmentation-based methods aiming at alleviating user-level unfairness by altering the skewed distribution of training data among various user groups. Despite yielding promising results, they often rely on fairness-related assumptions that may not align with reality, potentially reducing the data quality and negatively affecting model effectiveness. To tackle this issue, in this paper, we study how to implement high-quality data augmentation to improve recommendation fairness. Specifically, we propose FairDgcl, a dynamic graph adversarial contrastive learning framework aiming at improving fairness in recommender system. First, FairDgcl develops an adversarial contrastive network with a view generator and a view discriminator to learn generating fair augmentation strategies in an adversarial style. Then, we propose two dynamic, learnable models to generate contrastive views within contrastive learning framework, which automatically fine-tune the augmentation strategies. Meanwhile, we theoretically show that FairDgcl can simultaneously generate enhanced representations that possess both fairness and accuracy. Lastly, comprehensive experiments conducted on four real-world datasets demonstrate the effectiveness of the proposed FairDgcl., Comment: 12 pages, submitted to TKDE
- Published
- 2024
7. Vision-Language Navigation with Energy-Based Policy
- Author
-
Liu, Rui, Wang, Wenguan, and Yang, Yi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision-language navigation (VLN) requires an agent to execute actions following human instructions. Existing VLN models are optimized through expert demonstrations by supervised behavioural cloning or incorporating manual reward engineering. While straightforward, these efforts overlook the accumulation of errors in the Markov decision process, and struggle to match the distribution of the expert policy. Going beyond this, we propose an Energy-based Navigation Policy (ENP) to model the joint state-action distribution using an energy-based model. At each step, low energy values correspond to the state-action pairs that the expert is most likely to perform, and vice versa. Theoretically, the optimization objective is equivalent to minimizing the forward divergence between the occupancy measure of the expert and ours. Consequently, ENP learns to globally align with the expert policy by maximizing the likelihood of the actions and modeling the dynamics of the navigation states in a collaborative manner. With a variety of VLN architectures, ENP achieves promising performances on R2R, REVERIE, RxR, and R2R-CE, unleashing the power of existing VLN models.
- Published
- 2024
8. Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
- Author
-
He, Shuwei, Liu, Rui, and Li, Haizhou
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Visual Text-to-Speech (VTTS) aims to take the spatial environmental image as the prompt to synthesize the reverberation speech for the spoken content. Previous research focused on the RGB modality for global environmental modeling, overlooking the potential of multi-source spatial knowledge like depth, speaker position, and environmental semantics. To address the issues, we propose a novel multi-source spatial knowledge understanding scheme for immersive VTTS, termed MS$^2$KU-VTTS. Specifically, we first prioritize RGB image as the dominant source and consider depth image, speaker position knowledge from object detection, and semantic captions from image understanding LLM as supplementary sources. Afterwards, we propose a serial interaction mechanism to deeply engage with both dominant and supplementary sources. The resulting multi-source knowledge is dynamically integrated based on their contributions.This enriched interaction and integration of multi-source spatial knowledge guides the speech generation model, enhancing the immersive spatial speech experience.Experimental results demonstrate that the MS$^2$KU-VTTS surpasses existing baselines in generating immersive speech. Demos and code are available at: https://github.com/MS2KU-VTTS/MS2KU-VTTS., Comment: 5 pages, 1 figure
- Published
- 2024
9. Dual-Path Mechanism of Amino Acid Racemization Mediated by Quantum Mechanical Tunneling
- Author
-
Yang, Xinrui, Liu, Rui, Xu, Ruiqi, Cui, Zhaohua, and Wang, Zhigang
- Subjects
Physics - Chemical Physics ,Physics - Computational Physics - Abstract
The racemization of amino acids constitutes one of the most elemental and critical reactions, holding primitive significance for understanding the life's origin and maintenance. Nevertheless, its mechanism at the atomic level has been persistently misunderstood for more than a century. In this work, we demonstrate that the racemization of amino acid molecules in aqueous environments can occur simultaneously by two pathways via the carboxyl (COOH) and amino (NH2) groups. Behind this result, the quantum mechanical tunneling (QMT) effect plays a pivotal role, as evidenced by the tunneling hindrance of the NH2 reaction and the tunneling enhancement of the COOH reaction. Notably, the disparity in the QMT effect leads to a crossover between the COOH and NH2 reactions within 200-257 K, such that NH2 reactions dominate at high temperatures and COOH reactions dominate at low temperatures. Our work emphasizes the significance of QMT effect in the racemization of amino acids and therefore introduces a dual-path coexistence mechanism, offering valuable insights into the origin of homochirality in extreme environments of the early Earth., Comment: 15 pages, 4 figures
- Published
- 2024
10. Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling
- Author
-
Liu, Rui, Jia, Zhenqi, Yang, Jie, Hu, Yifan, and Li, Haizhou
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Conversational Text-to-Speech (CTTS) aims to accurately express an utterance with the appropriate style within a conversational setting, which attracts more attention nowadays. While recognizing the significance of the CTTS task, prior studies have not thoroughly investigated speech emphasis expression, which is essential for conveying the underlying intention and attitude in human-machine interaction scenarios, due to the scarcity of conversational emphasis datasets and the difficulty in context understanding. In this paper, we propose a novel Emphasis Rendering scheme for the CTTS model, termed ER-CTTS, that includes two main components: 1) we simultaneously take into account textual and acoustic contexts, with both global and local semantic modeling to understand the conversation context comprehensively; 2) we deeply integrate multi-modal and multi-scale context to learn the influence of context on the emphasis expression of the current utterance. Finally, the inferred emphasis feature is fed into the neural speech synthesizer to generate conversational speech. To address data scarcity, we create emphasis intensity annotations on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in emphasis rendering within a conversational setting. The code and audio samples are available at https://github.com/CodeStoreTTS/ER-CTTS., Comment: submitted to IEEE Transaction
- Published
- 2024
11. Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark
- Author
-
Lian, Zheng, Sun, Haiyang, Sun, Licai, Chen, Lan, Chen, Haoyu, Gu, Hao, Wen, Zhuofan, Chen, Shun, Zhang, Siyuan, Yao, Hailiang, Xu, Mingyu, Chen, Kang, Liu, Bin, Liu, Rui, Liang, Shan, Li, Ya, Yi, Jiangyan, and Tao, Jianhua
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Multimodal Emotion Recognition (MER) is an important research topic. This paper advocates for a transformative paradigm in MER. The rationale behind our work is that current approaches often rely on a limited set of basic emotion labels, which do not adequately represent the rich spectrum of human emotions. These traditional and overly simplistic emotion categories fail to capture the inherent complexity and subtlety of human emotional experiences, leading to limited generalizability and practicality. Therefore, we propose a new MER paradigm called Open-vocabulary MER (OV-MER), which encompasses a broader range of emotion labels to reflect the richness of human emotions. This paradigm relaxes the label space, allowing for the prediction of arbitrary numbers and categories of emotions. To support this transition, we provide a comprehensive solution that includes a newly constructed database based on LLM and human collaborative annotations, along with corresponding metrics and a series of benchmarks. We hope this work advances emotion recognition from basic emotions to more nuanced emotions, contributing to the development of emotional AI.
- Published
- 2024
12. FluentEditor+: Text-based Speech Editing by Modeling Local Hierarchical Acoustic Smoothness and Global Prosody Consistency
- Author
-
Liu, Rui, Xi, Jiatian, Jiang, Ziyue, and Li, Haizhou
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Text-based speech editing (TSE) allows users to modify speech by editing the corresponding text and performing operations such as cutting, copying, and pasting to generate updated audio without altering the original recording directly. Text-based speech editing (TSE) allows users to modify speech by editing the corresponding text and performing operations such as cutting, copying, and pasting to generate updated audio without altering the original recording directly. While current TSE techniques focus on minimizing discrepancies between generated speech and reference targets within edited segments, they often neglect the importance of maintaining both local and global fluency in the context of the original discourse. Additionally, seamlessly integrating edited segments with unaltered portions of the audio remains challenging, typically requiring support from text-to-speech (TTS) systems. This paper introduces a novel approach, FluentEditor$\tiny +$, designed to overcome these limitations. FluentEditor$\tiny +$ employs advanced feature extraction techniques to capture both acoustic and prosodic characteristics, ensuring fluent transitions between edited and unedited regions. The model ensures segmental acoustic smoothness and global prosody consistency, allowing seamless splicing of speech while preserving the coherence and naturalness of the output. Extensive experiments on the VCTK and LibriTTS datasets show that FluentEditor$\tiny +$ surpasses existing TTS-based methods, including Editspeech, Campnet, $A^3T$ FluentSpeech, and Fluenteditor, in both fluency and prosody. Ablation studies further highlight the contributions of each module to the overall effectiveness of the system., Comment: Work in progress
- Published
- 2024
13. Building Real-time Awareness of Out-of-distribution in Trajectory Prediction for Autonomous Vehicles
- Author
-
Tongfei, Guo, Banerjee, Taposh, Liu, Rui, and Su, Lili
- Subjects
Computer Science - Robotics ,Computer Science - Machine Learning - Abstract
Trajectory prediction describes the motions of surrounding moving obstacles for an autonomous vehicle; it plays a crucial role in enabling timely decision-making, such as collision avoidance and trajectory replanning. Accurate trajectory planning is the key to reliable vehicle deployments in open-world environment, where unstructured obstacles bring in uncertainties that are impossible to fully capture by training data. For traditional machine learning tasks, such uncertainties are often addressed reasonably well via methods such as continual learning. On the one hand, naively applying those methods to trajectory prediction can result in continuous data collection and frequent model updates, which can be resource-intensive. On the other hand, the predicted trajectories can be far away from the true trajectories, leading to unsafe decision-making. In this paper, we aim to establish real-time awareness of out-of-distribution in trajectory prediction for autonomous vehicles. We focus on the challenging and practically relevant setting where the out-of-distribution is deceptive, that is, the one not easily detectable by human intuition. Drawing on the well-established techniques of sequential analysis, we build real-time awareness of out-of-distribution by monitoring prediction errors using the quickest change point detection (QCD). Our solutions are lightweight and can handle the occurrence of out-of-distribution at any time during trajectory prediction inference. Experimental results on multiple real-world datasets using a benchmark trajectory prediction model demonstrate the effectiveness of our methods.
- Published
- 2024
14. Reactive Multi-Robot Navigation in Outdoor Environments Through Uncertainty-Aware Active Learning of Human Preference Landscape
- Author
-
Huang, Chao, Zang, Wenshuo, Pinciroli, Carlo, Li, Zhi Jane, Banerjee, Taposh, Su, Lili, and Liu, Rui
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Compared with single robots, Multi-Robot Systems (MRS) can perform missions more efficiently due to the presence of multiple members with diverse capabilities. However, deploying an MRS in wide real-world environments is still challenging due to uncertain and various obstacles (e.g., building clusters and trees). With a limited understanding of environmental uncertainty on performance, an MRS cannot flexibly adjust its behaviors (e.g., teaming, load sharing, trajectory planning) to ensure both environment adaptation and task accomplishments. In this work, a novel joint preference landscape learning and behavior adjusting framework (PLBA) is designed. PLBA efficiently integrates real-time human guidance to MRS coordination and utilizes Sparse Variational Gaussian Processes with Varying Output Noise to quickly assess human preferences by leveraging spatial correlations between environment characteristics. An optimization-based behavior-adjusting method then safely adapts MRS behaviors to environments. To validate PLBA's effectiveness in MRS behavior adaption, a flood disaster search and rescue task was designed. 20 human users provided 1764 feedback based on human preferences obtained from MRS behaviors related to "task quality", "task progress", "robot safety". The prediction accuracy and adaptation speed results show the effectiveness of PLBA in preference learning and MRS behavior adaption.
- Published
- 2024
15. Leveraging Retrieval Augment Approach for Multimodal Emotion Recognition Under Missing Modalities
- Author
-
Fan, Qi, Yuan, Hongyu, Zuo, Haolin, Liu, Rui, and Gao, Guanglai
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Multimodal emotion recognition utilizes complete multimodal information and robust multimodal joint representation to gain high performance. However, the ideal condition of full modality integrity is often not applicable in reality and there always appears the situation that some modalities are missing. For example, video, audio, or text data is missing due to sensor failure or network bandwidth problems, which presents a great challenge to MER research. Traditional methods extract useful information from the complete modalities and reconstruct the missing modalities to learn robust multimodal joint representation. These methods have laid a solid foundation for research in this field, and to a certain extent, alleviated the difficulty of multimodal emotion recognition under missing modalities. However, relying solely on internal reconstruction and multimodal joint learning has its limitations, especially when the missing information is critical for emotion recognition. To address this challenge, we propose a novel framework of Retrieval Augment for Missing Modality Multimodal Emotion Recognition (RAMER), which introduces similar multimodal emotion data to enhance the performance of emotion recognition under missing modalities. By leveraging databases, that contain related multimodal emotion data, we can retrieve similar multimodal emotion information to fill in the gaps left by missing modalities. Various experimental results demonstrate that our framework is superior to existing state-of-the-art approaches in missing modality MER tasks. Our whole project is publicly available on https://github.com/WooyoohL/Retrieval_Augment_MER., Comment: Under reviewing
- Published
- 2024
16. IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition
- Author
-
Liu, Rui, Mahammad, Zahiruddin, Bhaskar, Amisha, and Tokekar, Pratap
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence - Abstract
Robotic assistive feeding holds significant promise for improving the quality of life for individuals with eating disabilities. However, acquiring diverse food items under varying conditions and generalizing to unseen food presents unique challenges. Existing methods that rely on surface-level geometric information (e.g., bounding box and pose) derived from visual cues (e.g., color, shape, and texture) often lacks adaptability and robustness, especially when foods share similar physical properties but differ in visual appearance. We employ imitation learning (IL) to learn a policy for food acquisition. Existing methods employ IL or Reinforcement Learning (RL) to learn a policy based on off-the-shelf image encoders such as ResNet-50. However, such representations are not robust and struggle to generalize across diverse acquisition scenarios. To address these limitations, we propose a novel approach, IMRL (Integrated Multi-Dimensional Representation Learning), which integrates visual, physical, temporal, and geometric representations to enhance the robustness and generalizability of IL for food acquisition. Our approach captures food types and physical properties (e.g., solid, semi-solid, granular, liquid, and mixture), models temporal dynamics of acquisition actions, and introduces geometric information to determine optimal scooping points and assess bowl fullness. IMRL enables IL to adaptively adjust scooping strategies based on context, improving the robot's capability to handle diverse food acquisition scenarios. Experiments on a real robot demonstrate our approach's robustness and adaptability across various foods and bowl configurations, including zero-shot generalization to unseen settings. Our approach achieves improvement up to $35\%$ in success rate compared with the best-performing baseline.
- Published
- 2024
17. Medical Report Generation Is A Multi-label Classification Problem
- Author
-
Fan, Yijian, Yang, Zhenbang, Liu, Rui, Li, Mingjie, and Chang, Xiaojun
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Medical report generation is a critical task in healthcare that involves the automatic creation of detailed and accurate descriptions from medical images. Traditionally, this task has been approached as a sequence generation problem, relying on vision-and-language techniques to generate coherent and contextually relevant reports. However, in this paper, we propose a novel perspective: rethinking medical report generation as a multi-label classification problem. By framing the task this way, we leverage the radiology nodes from the commonly used knowledge graph, which can be better captured through classification techniques. To verify our argument, we introduce a novel report generation framework based on BLIP integrated with classified key nodes, which allows for effective report generation with accurate classification of multiple key aspects within the medical images. This approach not only simplifies the report generation process but also significantly enhances performance metrics. Our extensive experiments demonstrate that leveraging key nodes can achieve state-of-the-art (SOTA) performance, surpassing existing approaches across two benchmark datasets. The results underscore the potential of re-envisioning traditional tasks with innovative methodologies, paving the way for more efficient and accurate medical report generation., Comment: Accepted to 2024 IEEE International Conference on Medical Artificial Intelligence
- Published
- 2024
18. Fast Downflows Observed during a Polar Crown Filament Eruption
- Author
-
Sun, Zheng, Tian, Hui, Li, Ting, Liu, Rui, and Duan, Yadan
- Subjects
Astrophysics - Solar and Stellar Astrophysics - Abstract
Solar filaments can undergo eruptions and result in the formation of coronal mass ejections (CMEs), which could significantly impact planetary space environments. Observations of eruptions involving polar crown filaments, situated in the polar regions of the Sun, are limited. In this study, we report a polar crown filament eruption (SOL2023-06-12), characterized by fast downflows below the filament. The downflows appear instantly after the onset of the filament eruption and persist for approximately 2 hours, exhibiting plane-of-sky (POS) velocities ranging between 92 and 144 km s$^{-1}$. They originate from the leading edge of the filament and no clear acceleration is observed. Intriguingly, these downflows appear at two distinct sites, symmetrically positioned at the opposite ends of the conjugate flare ribbons. Based on the observations, we propose that the filament might be supported by a magnetic flux rope (MFR), and these downflows possibly occur along the legs of the MFR. The downflows likely result from continuous reconnections between the MFR and the overlying magnetic field structures, and could either be reconnection outflows or redirected filament materials. We also observed horizontal drifting of the locations of downflows, which might correspond to the MFR's footpoint drifting. This type of downflows can potentially be utilized to track the footpoints of MFRs during eruptions.
- Published
- 2024
19. The Frame-Dragging effect on the excitation rate of atoms
- Author
-
Liu, Rui-Chen and Sun, C. P.
- Subjects
General Relativity and Quantum Cosmology - Abstract
The frame-dragging phenomenon in gravitational fields is revisited to explore the geometric effects induced by spacetime curvature. We quantize a massless scalar field in the spacetime of a rotating sphere, incorporating the frame-dragging frequency into the field modes. The excitation rate for an atom undergoing uniform circular motion and interacting with the scalar field is calculated. Our results reveal that the time-dependent excitation rates of atoms following different trajectories exhibit a common envelope, from which the frame-dragging frequency can be effectively extracted. This discovery leads us to propose a novel detection scheme for measuring the frame-dragging frequency caused by rotating celestial bodies, eliminating the need for traditional starlight calibration methods.
- Published
- 2024
20. MCDubber: Multimodal Context-Aware Expressive Video Dubbing
- Author
-
Zhao, Yuan, Jia, Zhenqi, Liu, Rui, Hu, De, Bao, Feilong, and Gao, Guanglai
- Subjects
Computer Science - Multimedia ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Automatic Video Dubbing (AVD) aims to take the given script and generate speech that aligns with lip motion and prosody expressiveness. Current AVD models mainly utilize visual information of the current sentence to enhance the prosody of synthesized speech. However, it is crucial to consider whether the prosody of the generated dubbing aligns with the multimodal context, as the dubbing will be combined with the original context in the final video. This aspect has been overlooked in previous studies. To address this issue, we propose a Multimodal Context-aware video Dubbing model, termed \textbf{MCDubber}, to convert the modeling object from a single sentence to a longer sequence with context information to ensure the consistency of the global context prosody. MCDubber comprises three main components: (1) A context duration aligner aims to learn the context-aware alignment between the text and lip frames; (2) A context prosody predictor seeks to read the global context visual sequence and predict the context-aware global energy and pitch; (3) A context acoustic decoder ultimately predicts the global context mel-spectrogram with the assistance of adjacent ground-truth mel-spectrograms of the target sentence. Through this process, MCDubber fully considers the influence of multimodal context on the prosody expressiveness of the current sentence when dubbing. The extracted mel-spectrogram belonging to the target sentence from the output context mel-spectrograms is the final required dubbing audio. Extensive experiments on the Chem benchmark dataset demonstrate that our MCDubber significantly improves dubbing expressiveness compared to all advanced baselines. The code and demos are available at https://github.com/XiaoYuanJun-zy/MCDubber., Comment: Accepted by NCMMSC2024
- Published
- 2024
21. Generative Expressive Conversational Speech Synthesis
- Author
-
Liu, Rui, Hu, Yifan, Ren, Yi, Yin, Xiang, and Li, Haizhou
- Subjects
Computer Science - Computation and Language ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper speaking style in a user-agent conversation setting. Existing CSS methods employ effective multi-modal context modeling techniques to achieve empathy understanding and expression. However, they often need to design complex network architectures and meticulously optimize the modules within them. In addition, due to the limitations of small-scale datasets containing scripted recording styles, they often fail to simulate real natural conversational styles. To address the above issues, we propose a novel generative expressive CSS system, termed GPT-Talker.We transform the multimodal information of the multi-turn dialogue history into discrete token sequences and seamlessly integrate them to form a comprehensive user-agent dialogue context. Leveraging the power of GPT, we predict the token sequence, that includes both semantic and style knowledge, of response for the agent. After that, the expressive conversational speech is synthesized by the conversation-enriched VITS to deliver feedback to the user.Furthermore, we propose a large-scale Natural CSS Dataset called NCSSD, that includes both naturally recorded conversational speech in improvised styles and dialogues extracted from TV shows. It encompasses both Chinese and English languages, with a total duration of 236 hours.We conducted comprehensive experiments on the reliability of the NCSSD and the effectiveness of our GPT-Talker. Both subjective and objective evaluations demonstrate that our model outperforms other state-of-the-art CSS systems significantly in terms of naturalness and expressiveness. The Code, Dataset, and Pre-trained Model are available at: https://github.com/AI-S2-Lab/GPT-Talker., Comment: 14 pages, 6 figures, 8 tables. Accepted by ACM MM 2024
- Published
- 2024
22. Nonparametric Statistics on Magnetic Properties at the Footpoints of Erupting Magnetic Flux Ropes
- Author
-
Liu, Rui and Wang, Wensi
- Subjects
Astrophysics - Solar and Stellar Astrophysics ,Physics - Space Physics - Abstract
It is under debate whether the magnetic field in the solar atmosphere carries neutralized electric currents; particularly, whether a magnetic flux rope (MFR), which is considered the core structure of coronal mass ejections, carries neutralized electric currents. Recently Wang et al. (2023, ApJ, 943, 80) studied magnetic flux and electric current measured at the footpoints of 28 eruptive MFRs from 2010 to 2015. Because of the small sample size, no rigorous statistics has been done. Here, we include 9 more events from 2016 to 2023 and perform a series of nonparametric statistical tests at a significance level of 5\%. The tests confirm that there exist no significant differences in magnetic properties between conjugated footpoints of the same MFR, which justifies the method of identifying the MFR footpoints through coronal dimming. The tests demonstrate that there exist no significant differences between MFRs with pre-eruption dimming and those with only post-eruption dimming. However, there is a medium level of association between MFRs carrying substantial net current and those produce pre-eruption dimming, which can be understood by the Lorentz-self force of the current channel. The tests also suggest that in estimating the magnetic twist of MFRs, it is necessary to take into account the spatially inhomogeneous distribution of electric current density and magnetic field., Comment: Accepted for publication in ApJ
- Published
- 2024
23. Navigation Instruction Generation with BEV Perception and Large Language Models
- Author
-
Fan, Sheng, Liu, Rui, Wang, Wenguan, and Yang, Yi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Navigation instruction generation, which requires embodied agents to describe the navigation routes, has been of great interest in robotics and human-computer interaction. Existing studies directly map the sequence of 2D perspective observations to route descriptions. Though straightforward, they overlook the geometric information and object semantics of the 3D environment. To address these challenges, we propose BEVInstructor, which incorporates Bird's Eye View (BEV) features into Multi-Modal Large Language Models (MLLMs) for instruction generation. Specifically, BEVInstructor constructs a PerspectiveBEVVisual Encoder for the comprehension of 3D environments through fusing BEV and perspective features. To leverage the powerful language capabilities of MLLMs, the fused representations are used as visual prompts for MLLMs, and perspective-BEV prompt tuning is proposed for parameter-efficient updating. Based on the perspective-BEV prompts, BEVInstructor further adopts an instance-guided iterative refinement pipeline, which improves the instructions in a progressive manner. BEVInstructor achieves impressive performance across diverse datasets (i.e., R2R, REVERIE, and UrbanWalk)., Comment: ECCV 2024; Project Page: https://github.com/FanScy/BEVInstructor
- Published
- 2024
24. Exploiting Scale-Variant Attention for Segmenting Small Medical Objects
- Author
-
Dai, Wei, Liu, Rui, Wu, Zixuan, Wu, Tianyi, Wang, Min, Zhou, Junxian, Yuan, Yixuan, and Liu, Jun
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise in segmenting medical objects, analyzing small areas in medical images remains challenging. This difficulty arises due to information losses and compression defects from convolution and pooling operations in CNNs, which become more pronounced as the network deepens, especially for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurately segmenting small-scale objects in medical images. The SvANet consists of scale-variant attention, cross-scale guidance, Monte Carlo attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively., Comment: 14 pages, 9 figures, under review
- Published
- 2024
25. Federated Knowledge Transfer Fine-tuning Large Server Model with Resource-Constrained IoT Clients
- Author
-
Chen, Shaoyuan, You, Linlin, Liu, Rui, Yu, Shuo, and Abdelmoniem, Ahmed M.
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
The training of large models, involving fine-tuning, faces the scarcity of high-quality data. Compared to the solutions based on centralized data centers, updating large models in the Internet of Things (IoT) faces challenges in coordinating knowledge from distributed clients by using their private and heterogeneous data. To tackle such a challenge, we propose KOALA (Federated Knowledge Transfer Fine-tuning Large Server Model with Resource-Constrained IoT Clients) to impel the training of large models in IoT. Since the resources obtained by IoT clients are limited and restricted, it is infeasible to locally execute large models and also update them in a privacy-preserving manner. Therefore, we leverage federated learning and knowledge distillation to update large models through collaboration with their small models, which can run locally at IoT clients to process their private data separately and enable large-small model knowledge transfer through iterative learning between the server and clients. Moreover, to support clients with similar or different computing capacities, KOALA is designed with two kinds of large-small model joint learning modes, namely to be homogeneous or heterogeneous. Experimental results demonstrate that compared to the conventional approach, our method can not only achieve similar training performance but also significantly reduce the need for local storage and computing power resources.
- Published
- 2024
26. Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
- Author
-
Bai, Ye, Chen, Jingping, Chen, Jitong, Chen, Wei, Chen, Zhuo, Ding, Chuang, Dong, Linhao, Dong, Qianqian, Du, Yujiao, Gao, Kepan, Gao, Lu, Guo, Yi, Han, Minglun, Han, Ting, Hu, Wenchao, Hu, Xinying, Hu, Yuxiang, Hua, Deyu, Huang, Lu, Huang, Mingkun, Huang, Youjia, Jin, Jishuo, Kong, Fanliu, Lan, Zongwei, Li, Tianyu, Li, Xiaoyang, Li, Zeyang, Lin, Zehua, Liu, Rui, Liu, Shouda, Lu, Lu, Lu, Yizhou, Ma, Jingting, Ma, Shengtao, Pei, Yulin, Shen, Chen, Tan, Tian, Tian, Xiaogang, Tu, Ming, Wang, Bo, Wang, Hao, Wang, Yuping, Wang, Yuxuan, Xia, Hanzhang, Xia, Rui, Xie, Shuangyi, Xu, Hongmin, Yang, Meng, Zhang, Bihong, Zhang, Jun, Zhang, Wanyi, Zhang, Yang, Zhang, Yawei, Zheng, Yijie, and Zou, Ming
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
- Published
- 2024
27. Emotion and Intent Joint Understanding in Multimodal Conversation: A Benchmarking Dataset
- Author
-
Liu, Rui, Zuo, Haolin, Lian, Zheng, Xing, Xiaofen, Schuller, Björn W., and Li, Haizhou
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Emotion and Intent Joint Understanding in Multimodal Conversation (MC-EIU) aims to decode the semantic information manifested in a multimodal conversational history, while inferring the emotions and intents simultaneously for the current utterance. MC-EIU is enabling technology for many human-computer interfaces. However, there is a lack of available datasets in terms of annotation, modality, language diversity, and accessibility. In this work, we propose an MC-EIU dataset, which features 7 emotion categories, 9 intent categories, 3 modalities, i.e., textual, acoustic, and visual content, and two languages, i.e., English and Mandarin. Furthermore, it is completely open-source for free access. To our knowledge, MC-EIU is the first comprehensive and rich emotion and intent joint understanding dataset for multimodal conversation. Together with the release of the dataset, we also develop an Emotion and Intent Interaction (EI$^2$) network as a reference system by modeling the deep correlation between emotion and intent in the multimodal conversation. With comparative experiments and ablation studies, we demonstrate the effectiveness of the proposed EI$^2$ method on the MC-EIU dataset. The dataset and codes will be made available at: https://github.com/MC-EIU/MC-EIU., Comment: 26 pages, 8 figures, 12 tables, NeurIPS 2024 Dataset and Benchmark Track
- Published
- 2024
28. ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024
- Author
-
Fu, Ruibo, Liu, Rui, Qiang, Chunyu, Gao, Yingming, Lu, Yi, Shi, Shuchen, Wang, Tao, Li, Ya, Wen, Zhengqi, Zhang, Chen, Bu, Hui, Liu, Yukun, Qi, Xin, and Li, Guanjun
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Artificial Intelligence - Abstract
The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective perception in practical applications like companion robots for children and marketing bots. The core issue lies in the inconsistency between high-quality audio generation and the ultimate human subjective experience. Therefore, this challenge aims to enhance the persuasiveness and acceptability of synthesized audio, focusing on human alignment convincing and inspirational audio generation. A total of 19 teams have registered for the challenge, and the results of the competition and the competition are described in this paper., Comment: ISCSLP 2024 Challenge description and results
- Published
- 2024
29. Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge
- Author
-
Liu, Rui and Ma, Zening
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Speech Self-Supervised Learning (SSL) has demonstrated considerable efficacy in various downstream tasks. Nevertheless, prevailing self-supervised models often overlook the incorporation of emotion-related prior information, thereby neglecting the potential enhancement of emotion task comprehension through emotion prior knowledge in speech. In this paper, we propose an emotion-aware speech representation learning with intensity knowledge. Specifically, we extract frame-level emotion intensities using an established speech-emotion understanding model. Subsequently, we propose a novel emotional masking strategy (EMS) to incorporate emotion intensities into the masking process. We selected two representative models based on Transformer and CNN, namely MockingJay and Non-autoregressive Predictive Coding (NPC), and conducted experiments on IEMOCAP dataset. Experiments have demonstrated that the representations derived from our proposed method outperform the original model in SER task., Comment: Accepted by InterSpeech2024
- Published
- 2024
30. High-resolution Observation of Blowout Jets Regulated by Sunspot Rotation
- Author
-
Gou, Tingyu, Liu, Rui, Su, Yang, Veronig, Astrid M., Pan, Hanya, Luo, Runbin, and Gan, Weiqun
- Subjects
Astrophysics - Solar and Stellar Astrophysics - Abstract
Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-filament remains largely stationary during the blowout jet, except that it is straddled by flare loops connecting two flare ribbons, indicating that the magnetic arcade embedding the mini-filament has been torn into two parts, with the upper part escaping with the blowout jet. In the wake of the flare, the southern end of the mini-filament fans out like neighboring fibrils, indicative of mass and field exchanges between the mini-filament and the fibrils. The blowout jet is preceded by a standard jet. With H-alpha fibrils moving toward the single-strand spire in a sweeping fashion, the standard jet transitions to the blowout jet. The similar pattern of standard-to-blowout jet transition occurs in an earlier C-class flare before the mini-filament forms. The spiraling morphology and sweeping direction of these fibrils are suggestive of their footpoints being dragged by the leading sunspot that undergoes clockwise rotation for over two days. Soon after the sunspot rotation reaches a peak angular speed as fast as 10 deg/hr, the dormant active region becomes flare-productive, and the mini-filament forms through the interaction of moving magnetic features from the rotating sunspot with satellite spots/pores. Hence, we suggest that the sunspot rotation plays a key role in building up free energy for flares and jets and in triggering blowout jets by inducing sweeping motions of fibrils., Comment: 16 pages, 10 figures, accepted in Solar Physics
- Published
- 2024
31. Relationship between deltamethrin resistance and gut symbiotic bacteria of Aedes albopictus by 16S rDNA sequencing.
- Author
-
Sun, Yingbo, Li, Tingting, Zhou, Guofa, Zhou, Yunfei, Wu, Yuhong, Xu, Jiabao, Chen, Jiarong, Zhong, Saifeng, Zhong, Daibin, Liu, Rui, Lu, Gang, and Li, Yiji
- Subjects
16S rDNA ,Aedes albopictus ,Deltamethrin ,Gut commensal bacteria ,Insecticide resistance ,Animals ,Pyrethrins ,Nitriles ,Aedes ,Insecticide Resistance ,Insecticides ,Larva ,RNA ,Ribosomal ,16S ,Symbiosis ,Bacteria ,Gastrointestinal Microbiome ,Mosquito Vectors ,DNA ,Ribosomal ,Female ,DNA ,Bacterial ,Gastrointestinal Tract - Abstract
BACKGROUND: Aedes albopictus is an important vector for pathogens such as dengue, Zika, and chikungunya viruses. While insecticides is the mainstay for mosquito control, their widespread and excessive use has led to the increased resistance in Ae. albopictus globally. Gut symbiotic bacteria are believed to play a potential role in insect physiology, potentially linking to mosquitoes metabolic resistance against insecticides. METHODS: We investigated the role of symbiotic bacteria in the development of resistance in Ae. albopictus by comparing gut symbiotic bacteria between deltamethrin-sensitive and deltamethrin-resistant populations. Adults were reared from field-collected larvae. Sensitive and resistant mosquitoes were screened using 0.03% and 0.09% deltamethrin, respectively, on the basis of the World Health Organization (WHO) tube bioassay. Sensitive and resistant field-collected larvae were screened using 5 × LC50 (lethal concentration at 50% mortality) and 20 × LC50 concentration of deltamethrin, respectively. Laboratory strain deltamethrin-sensitive adults and larvae were used as controls. The DNA of gut samples from these mosquitoes were extracted using the magnetic bead method. Bacterial 16S rDNA was sequenced using BGISEQ method. We isolated and cultured gut microorganisms from adult and larvae mosquitoes using four different media: Luria Bertani (LB), brain heart infusion (BHI), nutrient agar (NA), and salmonella shigella (SS). RESULTS: Sequencing revealed significantly higher gut microbial diversity in field-resistant larvae compared with field-sensitive and laboratory-sensitive larvae (P
- Published
- 2024
32. Perspectives on the state of cleft lip and cleft palate patient care in Africa.
- Author
-
Liu, Rui, Manana, Wayne, Tollefson, Travis, Ntirenganya, Faustin, and Shaye, David
- Subjects
Cleft Palate ,Humans ,Cleft Lip ,Africa South of the Sahara ,Developing Countries ,Africa ,Delivery of Health Care ,Health Services Accessibility - Abstract
PURPOSE OF REVIEW: Patients with cleft lip -palate (CLP) experience morbidity and social stigma, particularly in low-income and middle-income countries (LMICs) such as those of sub-Saharan Africa (SSA). Delays in treatment secondary either to lack of awareness, skills, equipment and consumables; poor health infrastructure, limited resources or a combination of them, has led to SSA having the highest rates of death and second highest rates of disability-adjusted life years in patients with CLP globally. Here we review current perspectives on the state of comprehensive cleft lip and palate repair in Africa. RECENT FINDINGS: To bridge gaps in government health services, nongovernmental organizations (NGOs) have emerged to provide care through short-term surgical interventions (STSIs). These groups can effect change through direct provision of care, whereas others strengthen internal system. However, sustainability is lacking as there continue to be barriers to achieving comprehensive and longitudinal cleft care in SSA, including a lack of awareness of CLP as a treatable condition, prohibitive costs, poor follow-up, and insufficient surgical infrastructure. With dedicated local champions, a comprehensive approach, and reliable partners, establishing sustainable CLP services is possible in countries with limited resources. SUMMARY: The replacement of CLP missions with locally initiated, internationally supported capacity building initiatives, integrated into local healthcare systems will prove sustainable in the long-term.
- Published
- 2024
33. Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion
- Author
-
Sun, Hongze, Liu, Rui, Cai, Wuque, Wang, Jun, Wang, Yue, Tang, Huajin, Cui, Yan, Yao, Dezhong, and Guo, Daqing
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Quantitative Biology - Neurons and Cognition - Abstract
Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities and then uses a unified encoder to align the features across different domains. Moreover, we propose an enhanced transformer-based module to fuse multimodal features using attention mechanisms. With these methods, the MMHT model can effectively construct a multiscale and multidimensional visual feature space and achieve discriminative feature modeling. Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with that of other state-of-the-art methods. Overall, our results highlight the effectiveness of the MMHT model in terms of addressing the challenges faced in visual object tracking tasks., Comment: 16 pages, 7 figures, 9 tabes; This work has been submitted for possible publication
- Published
- 2024
- Full Text
- View/download PDF
34. High-Resolution Observation and Magnetic Modeling of a Solar Minifilament: the Formation, Eruption and Failing Mechanisms
- Author
-
Teng, Weilin, Su, Yingna, Liu, Rui, Chen, Jialin, Liu, Yanjie, Dai, Jun, Cao, Wenda, Shen, Jinhua, and Ji, Haisheng
- Subjects
Astrophysics - Solar and Stellar Astrophysics - Abstract
Minifilaments are widespread small-scale structures in the solar atmosphere. To better understand their formation and eruption mechanisms, we investigate the entire life of a sigmoidal minifilament located below a large quiescent filament observed by BBSO/GST on 2015 August 3. The H{\alpha} structure initially appears as a group of arched threads, then transforms into two J-shaped arcades, and finally forms a sigmoidal shape. SDO/AIA observations in 171{\AA} show that two coronal jets occur around the southern footpoint of the minifilament before the minifilament eruption. The minifilament eruption starts from the southern footpoint, then interacts with the overlying filament and fails. The aforementioned observational changes correspond to three episodes of flux cancellations observed by SDO/HMI. Unlike previous studies, the flux cancellation occurs between the polarity where southern footpoint of the minifilament is rooted in and an external polarity. We construct two magnetic field models before the eruption using the flux rope insertion method, and find an hyperbolic flux tube (HFT) above the flux cancellation site. The observation and modeling results suggest that the eruption is triggered by the external magnetic reconnection between the core field of the minifilament and the external fields due to flux cancellations. This study reveals a new triggering mechanism for minifilament eruptions and a new relationship between minifilament eruptions and coronal jets.
- Published
- 2024
35. Enhanced Detection Classification via Clustering SVM for Various Robot Collaboration Task
- Author
-
Liu, Rui, Xu, Xuanzhen, Shen, Yuwei, Zhu, Armando, Yu, Chang, Chen, Tianjian, and Zhang, Ye
- Subjects
Computer Science - Robotics - Abstract
We introduce an advanced, swift pattern recognition strategy for various multiple robotics during curve negotiation. This method, leveraging a sophisticated k-means clustering-enhanced Support Vector Machine algorithm, distinctly categorizes robotics into flying or mobile robots. Initially, the paradigm considers robot locations and features as quintessential parameters indicative of divergent robot patterns. Subsequently, employing the k-means clustering technique facilitates the efficient segregation and consolidation of robotic data, significantly optimizing the support vector delineation process and expediting the recognition phase. Following this preparatory phase, the SVM methodology is adeptly applied to construct a discriminative hyperplane, enabling precise classification and prognostication of the robot category. To substantiate the efficacy and superiority of the k-means framework over traditional SVM approaches, a rigorous cross-validation experiment was orchestrated, evidencing the former's enhanced performance in robot group classification., Comment: This paper has been received by CISCE 2024 Conference
- Published
- 2024
36. Non-Abelian line graph: A generalized approach to flat bands
- Author
-
Liu, Rui-Heng and Liu, Xin
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science - Abstract
Flat bands (FBs) in materials can enhance the correlation effects, resulting in exotic phenomena. Line graph (LG) lattices are well known for hosting FBs with isotropic hoppings in $s$-orbital models. Despite their prevalent application in the Kagome metals, there has been a lack of a general approach for incorporating higher-angular-momentum orbitals with spin-orbit couplings (SOCs) into LGs to achieve FBs. Here, we introduce a non-Abelian LG theory to construct FBs in realistic systems, which incorporates internal degrees of freedom and goes beyond $s$-orbital models. We modify the lattice edges and sites in the LG to be associated with arbitrary Hermitian matrices, referred to as the multiple LG. A fundamental aspect involves mapping the multiple LG Hamiltonian to a tight-binding (TB) model that respects the lattice symmetry through appropriate local non-Abelian transformations. We establish the general conditions to determine the local transformations. Based on this mechanism, we demonstrate the realization of $d$-orbital FBs in the Kagome lattice, which could serve as a minimal model for understanding the FBs in transition metal Kagome materials. Our approach bridges the gap between the known FBs in pure lattice models and their realization in multi-orbital systems., Comment: 10 pages, 4 figures
- Published
- 2024
37. EEG-Deformer: A Dense Convolutional Transformer for Brain-computer Interfaces
- Author
-
Ding, Yi, Li, Yong, Sun, Hao, Liu, Rui, Tong, Chengxuan, Liu, Chenyu, Zhou, Xinliang, and Guan, Cuntai
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Machine Learning ,Quantitative Biology - Neurons and Cognition - Abstract
Effectively learning the temporal dynamics in electroencephalogram (EEG) signals is challenging yet essential for decoding brain activities using brain-computer interfaces (BCIs). Although Transformers are popular for their long-term sequential learning ability in the BCI field, most methods combining Transformers with convolutional neural networks (CNNs) fail to capture the coarse-to-fine temporal dynamics of EEG signals. To overcome this limitation, we introduce EEG-Deformer, which incorporates two main novel components into a CNN-Transformer: (1) a Hierarchical Coarse-to-Fine Transformer (HCT) block that integrates a Fine-grained Temporal Learning (FTL) branch into Transformers, effectively discerning coarse-to-fine temporal patterns; and (2) a Dense Information Purification (DIP) module, which utilizes multi-level, purified temporal information to enhance decoding accuracy. Comprehensive experiments on three representative cognitive tasks-cognitive attention, driving fatigue, and mental workload detection-consistently confirm the generalizability of our proposed EEG-Deformer, demonstrating that it either outperforms or performs comparably to existing state-of-the-art methods. Visualization results show that EEG-Deformer learns from neurophysiologically meaningful brain regions for the corresponding cognitive tasks. The source code can be found at https://github.com/yi-ding-cs/EEG-Deformer., Comment: 10 pages, 9 figures. This work has been submitted to the IEEE for possible publication
- Published
- 2024
38. MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
- Author
-
Lian, Zheng, Sun, Haiyang, Sun, Licai, Wen, Zhuofan, Zhang, Siyuan, Chen, Shun, Gu, Hao, Zhao, Jinming, Ma, Ziyang, Chen, Xie, Yi, Jiangyan, Liu, Rui, Xu, Kele, Liu, Bin, Cambria, Erik, Zhao, Guoying, Schuller, Björn W., and Tao, Jianhua
- Subjects
Computer Science - Machine Learning ,Computer Science - Human-Computer Interaction - Abstract
Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing the dataset size and building more effective algorithms. However, due to problems such as complex environments and inaccurate annotations, current systems are hard to meet the demands of practical applications. Therefore, we organize the MER series of competitions to promote the development of this field. Last year, we launched MER2023, focusing on three interesting topics: multi-label learning, noise robustness, and semi-supervised learning. In this year's MER2024, besides expanding the dataset size, we further introduce a new track around open-vocabulary emotion recognition. The main purpose of this track is that existing datasets usually fix the label space and use majority voting to enhance the annotator consistency. However, this process may lead to inaccurate annotations, such as ignoring non-majority or non-candidate labels. In this track, we encourage participants to generate any number of labels in any category, aiming to describe emotional states as accurately as possible. Our baseline code relies on MERTools and is available at: https://github.com/zeroQiaoba/MERTools/tree/master/MER2024.
- Published
- 2024
39. Infrared Small Target Detection with Scale and Location Sensitivity
- Author
-
Liu, Qiankun, Liu, Rui, Zheng, Bolun, Wang, Hongkui, and Fu, Ying
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Recently, infrared small target detection (IRSTD) has been dominated by deep-learning-based methods. However, these methods mainly focus on the design of complex model structures to extract discriminative features, leaving the loss functions for IRSTD under-explored. For example, the widely used Intersection over Union (IoU) and Dice losses lack sensitivity to the scales and locations of targets, limiting the detection performance of detectors. In this paper, we focus on boosting detection performance with a more effective loss but a simpler model structure. Specifically, we first propose a novel Scale and Location Sensitive (SLS) loss to handle the limitations of existing losses: 1) for scale sensitivity, we compute a weight for the IoU loss based on target scales to help the detector distinguish targets with different scales: 2) for location sensitivity, we introduce a penalty term based on the center points of targets to help the detector localize targets more precisely. Then, we design a simple Multi-Scale Head to the plain U-Net (MSHNet). By applying SLS loss to each scale of the predictions, our MSHNet outperforms existing state-of-the-art methods by a large margin. In addition, the detection performance of existing detectors can be further improved when trained with our SLS loss, demonstrating the effectiveness and generalization of our SLS loss. The code is available at https://github.com/ying-fu/MSHNet., Comment: Accepted by CVPR 2024
- Published
- 2024
40. Volumetric Environment Representation for Vision-Language Navigation
- Author
-
Liu, Rui, Wang, Wenguan, and Yang, Yi
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision-language navigation (VLN) requires an agent to navigate through an 3D environment based on visual observations and natural language instructions. It is clear that the pivotal factor for successful navigation lies in the comprehensive scene understanding. Previous VLN agents employ monocular frameworks to extract 2D features of perspective views directly. Though straightforward, they struggle for capturing 3D geometry and semantics, leading to a partial and incomplete environment representation. To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells. For each cell, VER aggregates multi-view 2D features into such a unified 3D space via 2D-3D sampling. Through coarse-to-fine feature extraction and multi-task learning for VER, our agent predicts 3D occupancy, 3D room layout, and 3D bounding boxes jointly. Based on online collected VERs, our agent performs volume state estimation and builds episodic memory for predicting the next step. Experimental results show our environment representations from multi-task learning lead to evident performance gains on VLN. Our model achieves state-of-the-art performance across VLN benchmarks (R2R, REVERIE, and R4R)., Comment: Accepted at CVPR 2024
- Published
- 2024
41. Adaptive Visual Imitation Learning for Robotic Assisted Feeding Across Varied Bowl Configurations and Food Types
- Author
-
Liu, Rui, Bhaskar, Amisha, and Tokekar, Pratap
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
In this study, we introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF). The goal is to acquire (i.e., scoop) food items from a bowl. However, achieving robust and adaptive food manipulation is particularly challenging. To deal with this, we propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping. Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations in terms of material, size, and position, as well as diverse food types including granular, semi-solid, and liquid, even in the presence of distractors. We validate the effectiveness of our approach by conducting experiments on a real robot. We also compare its performance with a baseline. The results demonstrate improvement over the baseline across all scenarios, with an enhancement of up to 2.5 times in terms of a success metric. Notably, our model, trained solely on data from a transparent glass bowl containing granular cereals, showcases generalization ability when tested zero-shot on other bowl configurations with different types of food.
- Published
- 2024
42. LAVA: Long-horizon Visual Action based Food Acquisition
- Author
-
Bhaskar, Amisha, Liu, Rui, Sharma, Vishnu D., Shi, Guangyao, and Tokekar, Pratap
- Subjects
Computer Science - Robotics ,Computer Science - Human-Computer Interaction - Abstract
Robotic Assisted Feeding (RAF) addresses the fundamental need for individuals with mobility impairments to regain autonomy in feeding themselves. The goal of RAF is to use a robot arm to acquire and transfer food to individuals from the table. Existing RAF methods primarily focus on solid foods, leaving a gap in manipulation strategies for semi-solid and deformable foods. This study introduces Long-horizon Visual Action (LAVA) based food acquisition of liquid, semisolid, and deformable foods. Long-horizon refers to the goal of "clearing the bowl" by sequentially acquiring the food from the bowl. LAVA employs a hierarchical policy for long-horizon food acquisition tasks. The framework uses high-level policy to determine primitives by leveraging ScoopNet. At the mid-level, LAVA finds parameters for primitives using vision. To carry out sequential plans in the real world, LAVA delegates action execution which is driven by Low-level policy that uses parameters received from mid-level policy and behavior cloning ensuring precise trajectory execution. We validate our approach on complex real-world acquisition trials involving granular, liquid, semisolid, and deformable food types along with fruit chunks and soup acquisition. Across 46 bowls, LAVA acquires much more efficiently than baselines with a success rate of 89 +/- 4% and generalizes across realistic plate variations such as different positions, varieties, and amount of food in the bowl. Code, datasets, videos, and supplementary materials can be found on our website., Comment: 8 pages, 8 figures
- Published
- 2024
43. Low-cost and Convenient Fabrication of Polymer Micro/Nanopores with the Needle Punching Process and Their Applications in Nanofluidic Sensing
- Author
-
Liu, Rui, Liu, Zhe, Li, Jianfeng, and Qiu, Yinghua
- Subjects
Physics - Chemical Physics - Abstract
Solid-state micro/nanopores play an important role in the sensing field because of their high stability and controllable size. Aiming at problems of complex processes and high costs in pore manufacturing, we propose a convenient and low-cost micro/nanopore fabrication technique based on the needle punching method. The thin film is pierced by controlling the feed of a microscale tungsten needle, and the size variations of the micropore are monitored by the current feedback system. Based on the positive correlation between the micropore size and the current threshold, the size-controllable preparation of micropores is achieved. The preparation of nanopores is realized by the combination of needle punching and chemical etching. Firstly, a conical defect is prepared on the film with the tungsten needle. Then, nanopores are obtained by unilateral chemical etching of the film. Using the prepared conical micropores resistive-pulse detection of nanoparticles is performed. Significant ionic current rectification is also obtained with our conical nanopores. It is proved that the properties of micro/nanopores prepared by our method are comparable to those prepared by the track-etching method. The simple and controllable fabrication process proposed here will advance the development of low-cost micro/nanopore sensors., Comment: 27 pages, 6 figures
- Published
- 2024
- Full Text
- View/download PDF
44. Towards Efficient Risk-Sensitive Policy Gradient: An Iteration Complexity Analysis
- Author
-
Liu, Rui, Noorani, Erfaun, and Tokekar, Pratap
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Mathematics - Optimization and Control - Abstract
Reinforcement Learning (RL) has shown exceptional performance across various applications, enabling autonomous agents to learn optimal policies through interaction with their environments. However, traditional RL frameworks often face challenges in terms of iteration complexity and robustness. Risk-sensitive RL, which balances expected return and risk, has been explored for its potential to yield probabilistically robust policies, yet its iteration complexity analysis remains underexplored. In this study, we conduct a thorough iteration complexity analysis for the risk-sensitive policy gradient method, focusing on the REINFORCE algorithm and employing the exponential utility function. We obtain an iteration complexity of $\cO(\epsilon^{-2})$ to reach an $\epsilon$-approximate first-order stationary point (FOSP). We investigate whether risk-sensitive algorithms can potentially achieve better iteration complexity compared to their risk-neutral counterparts. Our theoretical analysis demonstrates that risk-sensitive REINFORCE can potentially have a reduced number of iterations required for convergence. This leads to improved iteration complexity, as employing the exponential utility does not entail additional computation per iteration. We characterize the conditions under which risk-sensitive algorithms can potentially achieve better iteration complexity. Our simulation results also validate that risk-averse cases can converge and stabilize more quickly after $41\%$ of the episodes compared to their risk-neutral counterparts.
- Published
- 2024
45. HGIC: A Hand Gesture Based Interactive Control System for Efficient and Scalable Multi-UAV Operations
- Author
-
Hu, Mengsha, Li, Jinzhou, Jin, Runxiang, Shi, Chao, Xu, Lei, and Liu, Rui
- Subjects
Computer Science - Robotics - Abstract
As technological advancements continue to expand the capabilities of multi unmanned-aerial-vehicle systems (mUAV), human operators face challenges in scalability and efficiency due to the complex cognitive load and operations associated with motion adjustments and team coordination. Such cognitive demands limit the feasible size of mUAV teams and necessitate extensive operator training, impeding broader adoption. This paper developed a Hand Gesture Based Interactive Control (HGIC), a novel interface system that utilize computer vision techniques to intuitively translate hand gestures into modular commands for robot teaming. Through learning control models, these commands enable efficient and scalable mUAV motion control and adjustments. HGIC eliminates the need for specialized hardware and offers two key benefits: 1) Minimal training requirements through natural gestures; and 2) Enhanced scalability and efficiency via adaptable commands. By reducing the cognitive burden on operators, HGIC opens the door for more effective large-scale mUAV applications in complex, dynamic, and uncertain scenarios. HGIC will be open-sourced after the paper being published online for the research community, aiming to drive forward innovations in human-mUAV interactions.
- Published
- 2024
46. Federated Joint Learning of Robot Networks in Stroke Rehabilitation
- Author
-
Jiang, Xinyu, Guo, Yibei, Hu, Mengsha, Jin, Ruoming, Phan, Hai, Alberts, Jay, and Liu, Rui
- Subjects
Computer Science - Robotics - Abstract
Advanced by rich perception and precise execution, robots possess immense potential to provide professional and customized rehabilitation exercises for patients with mobility impairments caused by strokes. Autonomous robotic rehabilitation significantly reduces human workloads in the long and tedious rehabilitation process. However, training a rehabilitation robot is challenging due to the data scarcity issue. This challenge arises from privacy concerns (e.g., the risk of leaking private disease and identity information of patients) during clinical data access and usage. Data from various patients and hospitals cannot be shared for adequate robot training, further compromising rehabilitation safety and limiting implementation scopes. To address this challenge, this work developed a novel federated joint learning (FJL) method to jointly train robots across hospitals. FJL also adopted a long short-term memory network (LSTM)-Transformer learning mechanism to effectively explore the complex tempo-spatial relations among patient mobility conditions and robotic rehabilitation motions. To validate FJL's effectiveness in training a robot network, a clinic-simulation combined experiment was designed. Real rehabilitation exercise data from 200 patients with stroke diseases (upper limb hemiplegia, Parkinson's syndrome, and back pain syndrome) were adopted. Inversely driven by clinical data, 300,000 robotic rehabilitation guidances were simulated. FJL proved to be effective in joint rehabilitation learning, performing 20% - 30% better than baseline methods.
- Published
- 2024
47. Serum small extracellular vesicles-derived BST2 as a biomarker for papillary thyroid microcarcinoma promotes lymph node metastasis
- Author
-
Cao, Zhen, Wang, Yuanyang, Wu, Jianqiang, Tang, Xiaoyue, Qian, Zhihong, Zhang, Zejian, Liu, Rui, Liu, Peng, Li, Zepeng, Xu, Xiequn, and Liu, Ziwen
- Published
- 2024
- Full Text
- View/download PDF
48. Research on Key Parameters and Fire Extinguishing Effectiveness of Compressed Air Foam System Used for UHV Substation
- Author
-
Zhang, Jiaqing, Shang, Fengju, Zhang, Shanwen, Wang, Liufang, Ke, Yanguo, Huang, Jie, Su, Wen, Liu, Rui, and Sheng, Youjie
- Published
- 2024
- Full Text
- View/download PDF
49. Insight into the effect of humic acids on transport of Cd2+ in biochar-amended saturated porous media
- Author
-
Zhao, Tian, Liu, Yongyang, Liu, Rui, and Wang, Fang
- Published
- 2024
- Full Text
- View/download PDF
50. Maternal probiotic supplementation protects against PBDE-induced developmental, behavior and metabolic reprogramming in a sexually dimorphic manner: Role of gut microbiome
- Author
-
Denys, Maximillian E., Kozlova, Elena V., Liu, Rui, Bishay, Anthony E., Do, Elyza A., Piamthai, Varadh, Korde, Yash V., Luna, Crystal N., Lam, Artha A., Hsiao, Ansel, and Currás-Collazo, Margarita
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.