31,769 results on '"Wang, Xi"'
Search Results
2. DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis
- Author
-
Li, Zeju, Xu, Changran, Shi, Zhengyuan, Peng, Zedong, Liu, Yi, Zhou, Yunhao, Zhou, Lingfeng, Ma, Chengyu, Zhong, Jianyuan, Wang, Xi, Zhao, Jieru, Chu, Zhufei, Yang, Xiaoyan, and Xu, Qiang
- Subjects
Computer Science - Machine Learning ,Computer Science - Programming Languages - Abstract
This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code. This structure enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks. DeepCircuitX is enriched with Chain of Thought (CoT) annotations, offering detailed descriptions of functionality and structure at multiple levels. These annotations enhance its utility for a wide range of tasks, including RTL code understanding, generation, and completion. Additionally, the dataset includes synthesized netlists and PPA metrics, facilitating early-stage design exploration and enabling accurate PPA prediction directly from RTL code. We demonstrate the dataset's effectiveness on various LLMs finetuned with our dataset and confirm the quality with human evaluations. Our results highlight DeepCircuitX as a critical resource for advancing RTL-focused machine learning applications in hardware design automation.Our data is available at https://zeju.gitbook.io/lcm-team., Comment: 8 pages, 3 figures
- Published
- 2025
3. Electrically Tunable Magnonic Bound States in the Continuum
- Author
-
Wang, Xi-guang, Guo, Guang-hua, Berakdar, Jamal, and Jing, Hui
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Strongly Correlated Electrons - Abstract
Low energy excitations of a magnetically ordered system are spin waves with magnon being their excitation quanta. Magnons are demonstrated to be useful for data processing and communication. To achieve magnon transport across extended distances, it is essential to minimize magnonic dissipation which can be accomplished by material engineering to reduce intrinsic damping or by spin torques that can counteract damping. This study introduces an alternative methodology to effectively reduce magnon dissipation based on magnonic bound states in the continuum (BIC). We demonstrate the approach for two antiferromagnetically coupled magnonic waveguides, with one waveguide being attached to a current carrying metallic layer. The current acts on the attached waveguide with a spin-orbit torque effectively amplifying the magnonic signal. The setup maps on a non-Hermitian system with coupled loss and more loss, enabling the formation of dissipationless magnon BIC. We investigate the necessary criteria for the formation of magnon BIC through electric currents. The influences of interlayer coupling constant, anisotropy constants and applied magnetic field on the current-induced magnon BIC are analyzed. The identified effect can be integrated in the design of magnon delay lines, offering opportunities for the enhancement of magnonic devices and circuits., Comment: 15 pages, 4 figures
- Published
- 2025
4. GOD model: Privacy Preserved AI School for Personal Assistant
- Author
-
PIN AI Team, Sun, Bill, Guo, Gavin, Peng, Regan, Zhang, Boliang, Wang, Shouqiao, Florescu, Laura, Wang, Xi, Crapis, Davide, and Wu, Ben
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Personal AI assistants (e.g., Apple Intelligence, Meta AI) offer proactive recommendations that simplify everyday tasks, but their reliance on sensitive user data raises concerns about privacy and trust. To address these challenges, we introduce the Guardian of Data (GOD), a secure, privacy-preserving framework for training and evaluating AI assistants directly on-device. Unlike traditional benchmarks, the GOD model measures how well assistants can anticipate user needs-such as suggesting gifts-while protecting user data and autonomy. Functioning like an AI school, it addresses the cold start problem by simulating user queries and employing a curriculum-based approach to refine the performance of each assistant. Running within a Trusted Execution Environment (TEE), it safeguards user data while applying reinforcement and imitation learning to refine AI recommendations. A token-based incentive system encourages users to share data securely, creating a data flywheel that drives continuous improvement. Specifically, users mine with their data, and the mining rate is determined by GOD's evaluation of how well their AI assistant understands them across categories such as shopping, social interactions, productivity, trading, and Web3. By integrating privacy, personalization, and trust, the GOD model provides a scalable, responsible path for advancing personal AI assistants. For community collaboration, part of the framework is open-sourced at https://github.com/PIN-AI/God-Model.
- Published
- 2025
5. FreeTumor: Large-Scale Generative Tumor Synthesis in Computed Tomography Images for Improving Tumor Recognition
- Author
-
Wu, Linshan, Zhuang, Jiaxin, Zhou, Yanning, He, Sunan, Ma, Jiabo, Luo, Luyang, Wang, Xi, Ni, Xuefeng, Zhong, Xiaoling, Wu, Mingxiang, Zhao, Yinghua, Duan, Xiaohui, Vardhanabhuti, Varut, Rajpurkar, Pranav, and Chen, Hao
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Tumor is a leading cause of death worldwide, with an estimated 10 million deaths attributed to tumor-related diseases every year. AI-driven tumor recognition unlocks new possibilities for more precise and intelligent tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, which demands extensive annotation efforts by radiologists. To tackle this challenge, we introduce FreeTumor, an innovative Generative AI (GAI) framework to enable large-scale tumor synthesis for mitigating data scarcity. Specifically, FreeTumor effectively leverages a combination of limited labeled data and large-scale unlabeled data for tumor synthesis training. Unleashing the power of large-scale data, FreeTumor is capable of synthesizing a large number of realistic tumors on images for augmenting training datasets. To this end, we create the largest training dataset for tumor synthesis and recognition by curating 161,310 publicly available Computed Tomography (CT) volumes from 33 sources, with only 2.3% containing annotated tumors. To validate the fidelity of synthetic tumors, we engaged 13 board-certified radiologists in a Visual Turing Test to discern between synthetic and real tumors. Rigorous clinician evaluation validates the high quality of our synthetic tumors, as they achieved only 51.1% sensitivity and 60.8% accuracy in distinguishing our synthetic tumors from real ones. Through high-quality tumor synthesis, FreeTumor scales up the recognition training datasets by over 40 times, showcasing a notable superiority over state-of-the-art AI methods including various synthesis methods and foundation models. These findings indicate promising prospects of FreeTumor in clinical applications, potentially advancing tumor treatments and improving the survival rates of patients.
- Published
- 2025
6. Valley resolved dynamics of phonon bottleneck in semiconductor molybdenum ditelluride
- Author
-
Wang, Zhong, Shi, Yijie, Pan, Yu, Li, Min, Wang, Xi, Zhang, Zheng, Zhu, Xiangyu, Hua, Fuyong, You, Qian, Hu, Chunlong, He, Junjie, Ye, Yu, and Liang, Wenxi
- Subjects
Condensed Matter - Materials Science ,Condensed Matter - Mesoscale and Nanoscale Physics ,Physics - Optics - Abstract
Semiconductor molybdenum ditelluride (2H-MoTe2) possess multiple valleys in the band structure, enriching its physical properties and potentials in applications. The understanding of its multivalley nature of fundamental processes involving population and relaxation of carriers and phonons is still evolving; particularly, the possible phonon bottleneck has not yet been addressed. Here, we investigate the carrier intra- and intervalley scattering and the phonon dynamics in different valleys in photoexcited few-layer 2H-MoTe2, by using the time resolved measurements of optical absorption and electron diffraction, together with the density functional theory calculation and molecular dynamics simulation. The pathways and timescales of carrier relaxation, accompanied with the emissions of optical phonons at the Brillouin zone center and acoustic phonons at the zone border are revealed. We present a couple of approaches to estimate the population of different phonon modes based on the results of optical and electron diffraction measurements, hence quantitatively identify the occurrences of phonon bottleneck located in different valleys. Our findings make possible to construct a comprehensive picture of the complex interactions between carriers and phonons in 2H-MoTe2 with the valley degree of freedom resolved., Comment: 34 pages, 17 figures (including Supplementary Information)
- Published
- 2025
7. Imaging orbital Rashba induced charge transport anisotropy
- Author
-
Persky, Eylon, Wang, Xi, Sala, Giacomo, van Thiel, Thierry C., Lesne, Edouard, Lau, Alexander, Cuoco, Mario, Gabay, Marc, Ortix, Carmine, Caviglia, Andrea D., and Kalisky, Beena
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science ,Condensed Matter - Strongly Correlated Electrons - Abstract
Identifying orbital textures and their effects on the electronic properties of quantum materials is a critical element in developing orbitronic devices. However, orbital effects are often entangled with the spin degree of freedom, making it difficult to uniquely identify them in charge transport phenomena. Here, we present a combination of scanning superconducting quantum interference device (SQUID) current imaging, global transport measurements, and theoretical analysis, that reveals a direct contribution of orbital textures to the linear charge transport of 2D systems. Specifically, we show that in the LaAlO$_3$/SrTiO$_3$ interface, which lacks both rotation and inversion symmetries, an anisotropic orbital Rashba coupling leads to conductivity anisotropy in zero magnetic field. We experimentally demonstrate this result by locally measuring the conductivity anisotropy, and correlating its appearance to the non-linear Hall effect, showing that the two phenomena have a common origin. Our results lay the foundations for an all--electrical probing of orbital currents in two-dimensional systems., Comment: 22 pages, 15 figures
- Published
- 2025
8. Generalizable Cervical Cancer Screening via Large-scale Pretraining and Test-Time Adaptation
- Author
-
Jiang, Hao, Jin, Cheng, Lin, Huangjing, Zhou, Yanning, Wang, Xi, Ma, Jiabo, Ding, Li, Hou, Jun, Liu, Runsheng, Chai, Zhizhong, Luo, Luyang, Shi, Huijuan, Qian, Yinling, Wang, Qiong, Li, Changzhong, Han, Anjia, Chan, Ronald Cheong Kin, and Chen, Hao
- Subjects
Quantitative Biology - Quantitative Methods ,Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Cervical cancer is a leading malignancy in female reproductive system. While AI-assisted cytology offers a cost-effective and non-invasive screening solution, current systems struggle with generalizability in complex clinical scenarios. To address this issue, we introduced Smart-CCS, a generalizable Cervical Cancer Screening paradigm based on pretraining and adaptation to create robust and generalizable screening systems. To develop and validate Smart-CCS, we first curated a large-scale, multi-center dataset named CCS-127K, which comprises a total of 127,471 cervical cytology whole-slide images collected from 48 medical centers. By leveraging large-scale self-supervised pretraining, our CCS models are equipped with strong generalization capability, potentially generalizing across diverse scenarios. Then, we incorporated test-time adaptation to specifically optimize the trained CCS model for complex clinical settings, which adapts and refines predictions, improving real-world applicability. We conducted large-scale system evaluation among various cohorts. In retrospective cohorts, Smart-CCS achieved an overall area under the curve (AUC) value of 0.965 and sensitivity of 0.913 for cancer screening on 11 internal test datasets. In external testing, system performance maintained high at 0.950 AUC across 6 independent test datasets. In prospective cohorts, our Smart-CCS achieved AUCs of 0.947, 0.924, and 0.986 in three prospective centers, respectively. Moreover, the system demonstrated superior sensitivity in diagnosing cervical cancer, confirming the accuracy of our cancer screening results by using histology findings for validation. Interpretability analysis with cell and slide predictions further indicated that the system's decision-making aligns with clinical practice. Smart-CCS represents a significant advancement in cancer screening across diverse clinical contexts.
- Published
- 2025
9. Postselection-Free Cavity-Enhanced Narrow-Band Orbital Angular Momentum Entangled Photon Source
- Author
-
Wan, Pei, Zhu, Wen-Zheng, Lou, Yan-Chao, Cheng, Zi-Mo, Ren, Zhi-Cheng, Zhang, Han, Wang, Xi-Lin, and Wang, Hui-Tian
- Subjects
Quantum Physics ,Physics - Optics - Abstract
Cavity-enhanced spontaneous parametric down-conversion (SPDC) provides a significant way to produce $\sim$10 MHz narrow-band photon pairs, which matches the bandwidth of photon for quantum memory. However, the output photon pairs from the cavity is not entangled and the postselection is required to create the entanglement outside the cavity, so the direct output of cavity-enhanced narrow-band entangled photon pairs is still an open challenge. Here we propose a solution that realizes the first postselection-free cavity-enhanced narrow-band entangled photon pairs. The entanglement is achieved in degree of freedom (DOF) of orbital angular momentum (OAM) by implementing an OAM-conservation SPDC process in an actively and precisely controlled cavity supporting degenerate high-order OAM modes. The measured linewidth for the two photons is 13.8 MHz and the measured fidelity is 0.969(3) for the directly generated OAM entangled two photons. We deterministically transfer the OAM entanglement to polarization one with almost no loss and obtain polarization entangled two photons with a fidelity of 0.948(2). Moreover, we produce narrow-band OAM-polarization hyperentangled photon pairs with a fidelity of 0.850(2) by establishing polarization entanglement with preservation of OAM entanglement, which is realized by interfering the two photons on a polarizing beam splitter (PBS) and post-selecting the events of one and only one photon in on each of the PBS port. Novel cavity may find applications in cavity-based light-matter interaction. Our results provide an efficient and promising approach to create narrow-band entangled photon sources for memory-based long-distance quantum communication and network., Comment: 7 pages, 5 figures
- Published
- 2025
- Full Text
- View/download PDF
10. Two-particle quantum interference in a nonlinear optical medium: a witness of timelike indistinguishability
- Author
-
Chen, Chao, Xue, Shu-Tian, Shi, Yu-Peng, Wang, Jing, Cheng, Zi-Mo, Wan, Pei, Ren, Zhi-Cheng, Jabbour, Michael G., Cerf, Nicolas J., Wang, Xi-Lin, and Wang, Hui-Tian
- Subjects
Quantum Physics - Abstract
The Hong-Ou-Mandel effect is a paradigmatic quantum phenomenon demonstrating the interference of two indistinguishable photons that are linearly coupled at a 50:50 beam splitter. Here, we transpose such a two-particle quantum interference effect to the nonlinear regime, when two single photons are impinging on a parametric down-conversion crystal. Formally, this transposition amounts to exchanging space and time variables, giving rise to an unknown form of timelike quantum interference. The two-photon component of the output state is a superposition of the incident photons being either transmitted or reborn, that is, replaced by indistinguishable substitutes due to their interaction with the nonlinear crystal. We experimentally demonstrate the suppression of the probability of detecting precisely one photon pair when the amplification gain is tuned to 2, which arises from the destructive interference between the transmitted and reborn photon pairs. This heretofore unobserved quantum manifestation of indistinguishability in time pushes nonlinear quantum interference towards a new regime with multiple photons. Hence, composing this effect with larger linear optical circuits should provide a tool to generate multimode quantum non-Gaussian states, which are essential resources for photonic quantum computers.
- Published
- 2025
11. CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions
- Author
-
Zhu, Xinfa, Tian, Wenjie, Wang, Xinsheng, He, Lei, Wang, Xi, Zhao, Sheng, and Xie, Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Text-to-Audio (TTA) generation is an emerging area within AI-generated content (AIGC), where audio is created from natural language descriptions. Despite growing interest, developing robust TTA models remains challenging due to the scarcity of well-labeled datasets and the prevalence of noisy or inaccurate captions in large-scale, weakly labeled corpora. To address these challenges, we propose CosyAudio, a novel framework that utilizes confidence scores and synthetic captions to enhance the quality of audio generation. CosyAudio consists of two core components: AudioCapTeller and an audio generator. AudioCapTeller generates synthetic captions for audio and provides confidence scores to evaluate their accuracy. The audio generator uses these synthetic captions and confidence scores to enable quality-aware audio generation. Additionally, we introduce a self-evolving training strategy that iteratively optimizes CosyAudio across both well-labeled and weakly-labeled datasets. Initially trained with well-labeled data, AudioCapTeller leverages its assessment capabilities on weakly-labeled datasets for high-quality filtering and reinforcement learning, which further improves its performance. The well-trained AudioCapTeller refines corpora by generating new captions and confidence scores, serving for the audio generator training. Extensive experiments on open-source datasets demonstrate that CosyAudio outperforms existing models in automated audio captioning, generates more faithful audio, and exhibits strong generalization across diverse scenarios., Comment: 12 pages, 5 figures, 7 tables
- Published
- 2025
12. ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
- Author
-
Zhu, Xinfa, He, Lei, Xiao, Yujia, Wang, Xi, Tan, Xu, Zhao, Sheng, and Xie, Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Style voice conversion aims to transform the speaking style of source speech into a desired style while keeping the original speaker's identity. However, previous style voice conversion approaches primarily focus on well-defined domains such as emotional aspects, limiting their practical applications. In this study, we present ZSVC, a novel Zero-shot Style Voice Conversion approach that utilizes a speech codec and a latent diffusion model with speech prompting mechanism to facilitate in-context learning for speaking style conversion. To disentangle speaking style and speaker timbre, we introduce information bottleneck to filter speaking style in the source speech and employ Uncertainty Modeling Adaptive Instance Normalization (UMAdaIN) to perturb the speaker timbre in the style prompt. Moreover, we propose a novel adversarial training strategy to enhance in-context learning and improve style similarity. Experiments conducted on 44,000 hours of speech data demonstrate the superior performance of ZSVC in generating speech with diverse speaking styles in zero-shot scenarios., Comment: 5 pages, 3 figures, accepted by ICASSP 2025
- Published
- 2025
13. Probing Stress and Magnetism at High Pressures with Two-Dimensional Quantum Sensors
- Author
-
He, Guanghui, Gong, Ruotian, Wang, Zhipan, Liu, Zhongyuan, Hong, Jeonghoon, Zhang, Tongxie, Riofrio, Ariana L., Rehfuss, Zachary, Chen, Mingfeng, Yao, Changyu, Poirier, Thomas, Ye, Bingtian, Wang, Xi, Ran, Sheng, Edgar, James H., Zhang, Shixiong, Yao, Norman Y., and Zu, Chong
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics ,Condensed Matter - Materials Science ,Quantum Physics - Abstract
Pressure serves as a fundamental tuning parameter capable of drastically modifying all properties of matter. The advent of diamond anvil cells (DACs) has enabled a compact and tabletop platform for generating extreme pressure conditions in laboratory settings. However, the limited spatial dimensions and ultrahigh pressures within these environments present significant challenges for conventional spectroscopy techniques. In this work, we integrate optical spin defects within a thin layer of two-dimensional (2D) materials directly into the high-pressure chamber, enabling an in situ quantum sensing platform for mapping local stress and magnetic environments up to 4~GPa. Compared to nitrogen-vacancy (NV) centers embedded in diamond anvils, our 2D sensors exhibit around three times stronger response to local stress and provide nanoscale proximity to the target sample in heterogeneous devices. We showcase the versatility of our approach by imaging both stress gradients within the high-pressure chamber and a pressure-driven magnetic phase transition in a room-temperature self-intercalated van der Waals ferromagnet, Cr$_{1+\delta}$Te$_2$. Our work demonstrates an integrated quantum sensing device for high-pressure experiments, offering potential applications in probing pressure-induced phenomena such as superconductivity, magnetism, and mechanical deformation., Comment: 9 pages, 7 figures
- Published
- 2025
14. Effects of hair on the image of a rotating black hole illuminated by a thin accretion disk
- Author
-
Meng, Yuan, Wang, Xi-Jing, Li, Yong-Zhuang, and Kuang, Xiao-Mei
- Subjects
General Relativity and Quantum Cosmology - Abstract
In this paper, we investigate the shadow and optical appearance of the hairy Kerr black hole illuminated by a thin accretion disk, the materials of which outside the innermost stable circular orbit (ISCO) move on the equatorial circular orbit, while inside the ISCO they quickly plunge into the black hole. The deformation parameter $\alpha$ and hair parameter $l_o$ are found to influence the motions of accretion as well as the redshift effect of the photon, such that they significantly affect the shadow and image of the hairy Kerr black hole. Especially, these two parameters have competing effects on the size of the black hole's shadow, and significantly increase the width of photon ring. This study provides a preliminary theoretical prediction that the image of the hairy Kerr black hole, especially the photon ring structure, may be used to constrain the hair parameters with future high-precision astronomical observation., Comment: 16 pages,7 figures
- Published
- 2025
15. Observational appearances of an inner extremal regular black hole illuminated by various accretion flows
- Author
-
Zhang, Dan, Fu, Guoyang, Wang, Xi-Jing, Pan, Qiyuan, Kuang, Xiao-Mei, and Wu, Jian-Pin
- Subjects
General Relativity and Quantum Cosmology - Abstract
This paper investigates the observational appearances of an inner extremal regular black hole(IERBH) illuminated by various types of accretion models. The study reveals that when the BH is illuminated by specific accretion flows, the effects of quantum gravity become more pronounced,significantly impacting key observational features such as the shadow radius, photon ring, and total observed intensity. Specifically, the introduction of a more realistic radially infalling spherical accretion flow further accentuates these differences. This dynamic flow results in a darker central region in the BH image due to the Doppler effect, which modulates the observed intensity based on the relative motion of the infalling matter. The shadow radius and total observed intensity are notably affected by the quantum correction parameters, providing additional signatures that distinguish regular BHs from their classical counterparts.
- Published
- 2024
16. AKiRa: Augmentation Kit on Rays for optical video generation
- Author
-
Wang, Xi, Courant, Robin, Christie, Marc, and Kalogeiton, Vicky
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Multimedia - Abstract
Recent advances in text-conditioned video diffusion have greatly improved video quality. However, these methods offer limited or sometimes no control to users on camera aspects, including dynamic camera motion, zoom, distorted lens and focus shifts. These motion and optical aspects are crucial for adding controllability and cinematic elements to generation frameworks, ultimately resulting in visual content that draws focus, enhances mood, and guides emotions according to filmmakers' controls. In this paper, we aim to close the gap between controllable video generation and camera optics. To achieve this, we propose AKiRa (Augmentation Kit on Rays), a novel augmentation framework that builds and trains a camera adapter with a complex camera model over an existing video generation backbone. It enables fine-tuned control over camera motion as well as complex optical parameters (focal length, distortion, aperture) to achieve cinematic effects such as zoom, fisheye effect, and bokeh. Extensive experiments demonstrate AKiRa's effectiveness in combining and composing camera optics while outperforming all state-of-the-art methods. This work sets a new landmark in controlled and optically enhanced video generation, paving the way for future optical video generation methods.
- Published
- 2024
17. LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
- Author
-
Wang, Xi, Li, Hongzhen, Fang, Heng, Peng, Yichen, Xie, Haoran, Yang, Xi, and Li, Chuntao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Image rendering from line drawings is vital in design and image generation technologies reduce costs, yet professional line drawings demand preserving complex details. Text prompts struggle with accuracy, and image translation struggles with consistency and fine-grained control. We present LineArt, a framework that transfers complex appearance onto detailed design drawings, facilitating design and artistic creation. It generates high-fidelity appearance while preserving structural accuracy by simulating hierarchical visual cognition and integrating human artistic experience to guide the diffusion process. LineArt overcomes the limitations of current methods in terms of difficulty in fine-grained control and style degradation in design drawings. It requires no precise 3D modeling, physical property specs, or network training, making it more convenient for design tasks. LineArt consists of two stages: a multi-frequency lines fusion module to supplement the input design drawing with detailed structural information and a two-part painting process for Base Layer Shaping and Surface Layer Coloring. We also present a new design drawing dataset ProLines for evaluation. The experiments show that LineArt performs better in accuracy, realism, and material precision compared to SOTAs., Comment: Project Page: https://meaoxixi.github.io/LineArt/
- Published
- 2024
18. Adaptive Visual Perception for Robotic Construction Process: A Multi-Robot Coordination Framework
- Author
-
Xu, Jia, Dixit, Manish, and Wang, Xi
- Subjects
Computer Science - Robotics - Abstract
Construction robots operate in unstructured construction sites, where effective visual perception is crucial for ensuring safe and seamless operations. However, construction robots often handle large elements and perform tasks across expansive areas, resulting in occluded views from onboard cameras and necessitating the use of multiple environmental cameras to capture the large task space. This study proposes a multi-robot coordination framework in which a team of supervising robots equipped with cameras adaptively adjust their poses to visually perceive the operation of the primary construction robot and its surrounding environment. A viewpoint selection method is proposed to determine each supervising robot's camera viewpoint, optimizing visual coverage and proximity while considering the visibility of the upcoming construction robot operation. A case study on prefabricated wooden frame installation demonstrates the system's feasibility, and further experiments are conducted to validate the performance and robustness of the proposed viewpoint selection method across various settings. This research advances visual perception of robotic construction processes and paves the way for integrating computer vision techniques to enable real-time adaption and responsiveness. Such advancements contribute to the safe and efficient operation of construction robots in inherently unstructured construction sites.
- Published
- 2024
19. GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
- Author
-
Hassan, Mariam, Stapf, Sebastian, Rahimi, Ahmad, Rezende, Pedro M B, Haghighi, Yasaman, Brüggemann, David, Katircioglu, Isinsu, Zhang, Lin, Chen, Xiaoran, Saha, Suman, Cannici, Marco, Aljalbout, Elie, Ye, Botao, Wang, Xi, Davtyan, Aram, Salzmann, Mathieu, Scaramuzza, Davide, Pollefeys, Marc, Favaro, Paolo, and Alahi, Alexandre
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present GEM, a Generalizable Ego-vision Multimodal world model that predicts future frames using a reference frame, sparse features, human poses, and ego-trajectories. Hence, our model has precise control over object dynamics, ego-agent motion and human poses. GEM generates paired RGB and depth outputs for richer spatial understanding. We introduce autoregressive noise schedules to enable stable long-horizon generations. Our dataset is comprised of 4000+ hours of multimodal data across domains like autonomous driving, egocentric human activities, and drone flights. Pseudo-labels are used to get depth maps, ego-trajectories, and human poses. We use a comprehensive evaluation framework, including a new Control of Object Manipulation (COM) metric, to assess controllability. Experiments show GEM excels at generating diverse, controllable scenarios and temporal consistency over long generations. Code, models, and datasets are fully open-sourced.
- Published
- 2024
20. Stochastic Kinematic Optimal Control on SO(3)
- Author
-
Wang, Xi, Wang, Xiaoyi, and Solo, Victor
- Subjects
Mathematics - Optimization and Control - Abstract
In this paper, we develop a novel method for deriving a global optimal control strategy for stochastic attitude kinematics on the special orthogonal group SO(3). We first introduce a stochastic Lie-Hamilton-Jacobi-Bellman (SL-HJB) equation on SO(3), which theoretically provides an optimality condition for the global optimal control strategy of the stochastic attitude kinematics. Then we propose a novel numerical method, the Successive Wigner-Galerkin Approximation (SWGA) method, to solve the SL-HJB equation on SO(3). The SWGA method leverages the Wigner-D functions to represent the Galerkin solution of the SL-HJB equation in a policy iteration framework, providing a computationally efficient approach to derive a global optimal control strategy for systems on SO(3). We demonstrate the effectiveness of the SWGA method through numerical simulation on stochastic attitude stabilization., Comment: 16 pages, 4 figures, the supplement begin at Page 9
- Published
- 2024
21. Magnomechanically induced transparency in the ferrimagnetic bridge crystal of atom opto-magnomechanical system
- Author
-
Diao, Wenting, Wang, Xi, Di, Ke, Liu, Yu, Cheng, Anyu, Cai, Chunxiao, Yang, Wenhai, and Du, Jiajia
- Subjects
Quantum Physics - Abstract
We investigate the absorption and transmission properties of a weak probe field in an atom opto-magnomechanics system. The system comprises an assembly of two-level atoms and a magnon mode within a ferrimagnetic crystal, which directly interacts with an optical cavity mode through the crystal's deformation displacement. We observe optomechanically induced transparency (OMIT) via radiation pressure and a magnomechanically induced transparency (MMIT) due to the nonlinear magnon-phonon interaction. In addition, due to the coupling of the atom to the detected and signal light, the system's width transparency window is divided into two narrow windows. Additionally, we demonstrate that the group delay is contingent upon the tunability of the magnon-phonon coupling strength. Our solution possesses significant in the field of quantum precision measurement.
- Published
- 2024
22. Holistic Understanding of 3D Scenes as Universal Scene Description
- Author
-
Halacheva, Anna-Maria, Miao, Yang, Zaech, Jan-Nico, Wang, Xi, Van Gool, Luc, and Paudel, Danda Pani
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
3D scene understanding is a long-standing challenge in computer vision and a key component in enabling mixed reality, wearable computing, and embodied AI. Providing a solution to these applications requires a multifaceted approach that covers scene-centric, object-centric, as well as interaction-centric capabilities. While there exist numerous datasets approaching the former two problems, the task of understanding interactable and articulated objects is underrepresented and only partly covered by current works. In this work, we address this shortcoming and introduce (1) an expertly curated dataset in the Universal Scene Description (USD) format, featuring high-quality manual annotations, for instance, segmentation and articulation on 280 indoor scenes; (2) a learning-based model together with a novel baseline capable of predicting part segmentation along with a full specification of motion attributes, including motion type, articulated and interactable parts, and motion parameters; (3) a benchmark serving to compare upcoming methods for the task at hand. Overall, our dataset provides 8 types of annotations - object and part segmentations, motion types, movable and interactable parts, motion parameters, connectivity, and object mass annotations. With its broad and high-quality annotations, the data provides the basis for holistic 3D scene understanding models. All data is provided in the USD format, allowing interoperability and easy integration with downstream tasks. We provide open access to our dataset, benchmark, and method's source code.
- Published
- 2024
23. Understanding the World's Museums through Vision-Language Reasoning
- Author
-
Balauca, Ada-Astrid, Garai, Sanjana, Balauca, Stefan, Shetty, Rasesh Udayakumar, Agrawal, Naitik, Shah, Dhwanil Subhashbhai, Fu, Yuqian, Wang, Xi, Toutanova, Kristina, Paudel, Danda Pani, and Van Gool, Luc
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Computation and Language - Abstract
Museums serve as vital repositories of cultural heritage and historical artifacts spanning diverse epochs, civilizations, and regions, preserving well-documented collections. Data reveal key attributes such as age, origin, material, and cultural significance. Understanding museum exhibits from their images requires reasoning beyond visual features. In this work, we facilitate such reasoning by (a) collecting and curating a large-scale dataset of 65M images and 200M question-answer pairs in the standard museum catalog format for exhibits from all around the world; (b) training large vision-language models on the collected dataset; (c) benchmarking their ability on five visual question answering tasks. The complete dataset is labeled by museum experts, ensuring the quality as well as the practical significance of the labels. We train two VLMs from different categories: the BLIP model, with vision-language aligned embeddings, but lacking the expressive power of large language models, and the LLaVA model, a powerful instruction-tuned LLM enriched with vision-language reasoning capabilities. Through exhaustive experiments, we provide several insights on the complex and fine-grained understanding of museum exhibits. In particular, we show that some questions whose answers can often be derived directly from visual features are well answered by both types of models. On the other hand, questions that require the grounding of the visual features in repositories of human knowledge are better answered by the large vision-language models, thus demonstrating their superior capacity to perform the desired reasoning. Find our dataset, benchmarks, and source code at: https://github.com/insait-institute/Museum-65
- Published
- 2024
24. InTraGen: Trajectory-controlled Video Generation for Object Interactions
- Author
-
Liu, Zuhao, Yanev, Aleksandar, Mahmood, Ahmad, Nikolov, Ivan, Motamed, Saman, Zheng, Wei-Shi, Wang, Xi, Van Gool, Luc, and Paudel, Danda Pani
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Advances in video generation have significantly improved the realism and quality of created scenes. This has fueled interest in developing intuitive tools that let users leverage video generation as world simulators. Text-to-video (T2V) generation is one such approach, enabling video creation from text descriptions only. Yet, due to the inherent ambiguity in texts and the limited temporal information offered by text prompts, researchers have explored additional control signals like trajectory-guided systems, for more accurate T2V generation. Nonetheless, methods to evaluate whether T2V models can generate realistic interactions between multiple objects are lacking. We introduce InTraGen, a pipeline for improved trajectory-based generation of object interaction scenarios. We propose 4 new datasets and a novel trajectory quality metric to evaluate the performance of the proposed InTraGen. To achieve object interaction, we introduce a multi-modal interaction encoding pipeline with an object ID injection mechanism that enriches object-environment interactions. Our results demonstrate improvements in both visual fidelity and quantitative performance. Code and datasets are available at https://github.com/insait-institute/InTraGen
- Published
- 2024
25. UVLLM: An Automated Universal RTL Verification Framework using LLMs
- Author
-
Hu, Yuchen, Ye, Junhao, Xu, Ke, Sun, Jialin, Zhang, Shiyue, Jiao, Xinyao, Pan, Dingrong, Zhou, Jie, Wang, Ning, Shan, Weiwei, Fang, Xinwei, Wang, Xi, Guan, Nan, and Jiang, Zhe
- Subjects
Computer Science - Hardware Architecture - Abstract
Verifying hardware designs in embedded systems is crucial but often labor-intensive and time-consuming. While existing solutions have improved automation, they frequently rely on unrealistic assumptions. To address these challenges, we introduce a novel framework, UVLLM, which combines Large Language Models (LLMs) with the Universal Verification Methodology (UVM) to relax these assumptions. UVLLM significantly enhances the automation of testing and repairing error-prone Register Transfer Level (RTL) codes, a critical aspect of verification development. Unlike existing methods, UVLLM ensures that all errors are triggered during verification, achieving a syntax error fix rate of 86.99% and a functional error fix rate of 71.92% on our proposed benchmark. These results demonstrate a substantial improvement in verification efficiency. Additionally, our study highlights the current limitations of LLM applications, particularly their reliance on extensive training data. We emphasize the transformative potential of LLMs in hardware design verification and suggest promising directions for future research in AI-driven hardware design methodologies. The Repo. of dataset and code: https://anonymous.4open.science/r/UVLLM/.
- Published
- 2024
26. Look a Group at Once: Multi-Slide Modeling for Survival Prediction
- Author
-
Li, Xinyang, Zhang, Yi, Xie, Yi, Yang, Jianfei, Wang, Xi, Chen, Hao, and Zhang, Haixian
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Survival prediction is a critical task in pathology. In clinical practice, pathologists often examine multiple cases, leveraging a broader spectrum of cancer phenotypes to enhance pathological assessment. Despite significant advancements in deep learning, current solutions typically model each slide as a sample, struggling to effectively capture comparable and slide-agnostic pathological features. In this paper, we introduce GroupMIL, a novel framework inspired by the clinical practice of collective analysis, which models multiple slides as a single sample and organizes groups of patches and slides sequentially to capture cross-slide prognostic features. We also present GPAMamba, a model designed to facilitate intra- and inter-slide feature interactions, effectively capturing local micro-environmental characteristics within slide-level graphs while uncovering essential prognostic patterns across an extended patch sequence within the group framework. Furthermore, we develop a dual-head predictor that delivers comprehensive survival risk and probability assessments for each patient. Extensive empirical evaluations demonstrate that our model significantly outperforms state-of-the-art approaches across five datasets from The Cancer Genome Atlas.
- Published
- 2024
27. Responsible AI in Construction Safety: Systematic Evaluation of Large Language Models and Prompt Engineering
- Author
-
Sammour, Farouq, Xu, Jia, Wang, Xi, Hu, Mo, and Zhang, Zhenyu
- Subjects
Computer Science - Artificial Intelligence - Abstract
Construction remains one of the most hazardous sectors. Recent advancements in AI, particularly Large Language Models (LLMs), offer promising opportunities for enhancing workplace safety. However, responsible integration of LLMs requires systematic evaluation, as deploying them without understanding their capabilities and limitations risks generating inaccurate information, fostering misplaced confidence, and compromising worker safety. This study evaluates the performance of two widely used LLMs, GPT-3.5 and GPT-4o, across three standardized exams administered by the Board of Certified Safety Professionals (BCSP). Using 385 questions spanning seven safety knowledge areas, the study analyzes the models' accuracy, consistency, and reliability. Results show that both models consistently exceed the BCSP benchmark, with GPT-4o achieving an accuracy rate of 84.6% and GPT-3.5 reaching 73.8%. Both models demonstrate strengths in safety management systems and hazard identification and control, but exhibit weaknesses in science, mathematics, emergency response, and fire prevention. An error analysis identifies four primary limitations affecting LLM performance: lack of knowledge, reasoning flaws, memory issues, and calculation errors. Our study also highlights the impact of prompt engineering strategies, with variations in accuracy reaching 13.5% for GPT-3.5 and 7.9% for GPT-4o. However, no single prompt configuration proves universally effective. This research advances knowledge in three ways: by identifying areas where LLMs can support safety practices and where human oversight remains essential, by offering practical insights into improving LLM implementation through prompt engineering, and by providing evidence-based direction for future research and development. These contributions support the responsible integration of AI in construction safety management toward achieving zero injuries., Comment: 29 pages, 5 figures
- Published
- 2024
28. Lambda-pure global dimension of Grothendieck categories and some applications
- Author
-
Wang, Xi, Yao, Hailou, and Shen, Lei
- Subjects
Mathematics - Category Theory - Abstract
We study the $\lambda$-pure global dimension of a Grothendieck category $\cal A$, and provide two different applications about this dimension. We obtain that if the $\lambda$-pure global dimension $\plgldA<\infty$, then (1) The ordinary bounded derived category (where $\cal A$ has enough projective objects) and the bounded $\lambda$-pure one differ only by a homotopy category; (2) The $\lambda$-pure singularity category $\DlsgA =0$. At last, we explore the reason why the general construction of classic Buchweitz-Happel Theorem is not feasible for $\lambda$-pure one.
- Published
- 2024
29. Augmenting the Veracity and Explanations of Complex Fact Checking via Iterative Self-Revision with LLMs
- Author
-
Zhang, Xiaocheng, Wang, Xi, Lu, Yifei, Ye, Zhuangzhuang, Wang, Jianing, Bao, Mengjiao, Yan, Peng, and Su, Xiaohong
- Subjects
Computer Science - Computation and Language - Abstract
Explanation generation plays a more pivotal role than fact verification in producing interpretable results and facilitating comprehensive fact-checking, which has recently garnered considerable attention. However, previous studies on explanation generation has shown several limitations, such as being confined to English scenarios, involving overly complex inference processes, and not fully unleashing the potential of the mutual feedback between veracity labels and explanation texts. To address these issues, we construct two complex fact-checking datasets in the Chinese scenarios: CHEF-EG and TrendFact. These datasets involve complex facts in areas such as health, politics, and society, presenting significant challenges for fact verification methods. In response to these challenges, we propose a unified framework called FactISR (Augmenting Fact-Checking via Iterative Self-Revision) to perform mutual feedback between veracity and explanations by leveraging the capabilities of large language models(LLMs). FactISR uses a single model to address tasks such as fact verification and explanation generation. Its self-revision mechanism can further revision the consistency between veracity labels, explanation texts, and evidence, as well as eliminate irrelevant noise. We conducted extensive experiments with baselines and FactISR on the proposed datasets. The experimental results demonstrate the effectiveness of our method.
- Published
- 2024
30. LEAD: Latent Realignment for Human Motion Diffusion
- Author
-
Andreou, Nefeli, Wang, Xi, Abrevaya, Victoria Fernández, Cani, Marie-Paule, Chrysanthou, Yiorgos, and Kalogeiton, Vicky
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Graphics - Abstract
Our goal is to generate realistic human motion from natural language. Modern methods often face a trade-off between model expressiveness and text-to-motion alignment. Some align text and motion latent spaces but sacrifice expressiveness; others rely on diffusion models producing impressive motions, but lacking semantic meaning in their latent space. This may compromise realism, diversity, and applicability. Here, we address this by combining latent diffusion with a realignment mechanism, producing a novel, semantically structured space that encodes the semantics of language. Leveraging this capability, we introduce the task of textual motion inversion to capture novel motion concepts from a few examples. For motion synthesis, we evaluate LEAD on HumanML3D and KIT-ML and show comparable performance to the state-of-the-art in terms of realism, diversity, and text-motion consistency. Our qualitative analysis and user study reveal that our synthesized motions are sharper, more human-like and comply better with the text compared to modern methods. For motion textual inversion, our method demonstrates improved capacity in capturing out-of-distribution characteristics in comparison to traditional VAEs.
- Published
- 2024
31. Self-Supervised Scene Flow Estimation with Point-Voxel Fusion and Surface Representation
- Author
-
Xiang, Xuezhi, Wang, Xi, Zhang, Lei, Ombati, Denis, Himu, Himaloy, and Zhen, Xiantong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Scene flow estimation aims to generate the 3D motion field of points between two consecutive frames of point clouds, which has wide applications in various fields. Existing point-based methods ignore the irregularity of point clouds and have difficulty capturing long-range dependencies due to the inefficiency of point-level computation. Voxel-based methods suffer from the loss of detail information. In this paper, we propose a point-voxel fusion method, where we utilize a voxel branch based on sparse grid attention and the shifted window strategy to capture long-range dependencies and a point branch to capture fine-grained features to compensate for the information loss in the voxel branch. In addition, since xyz coordinates are difficult to describe the geometric structure of complex 3D objects in the scene, we explicitly encode the local surface information of the point cloud through the umbrella surface feature extraction (USFE) module. We verify the effectiveness of our method by conducting experiments on the Flyingthings3D and KITTI datasets. Our method outperforms all other self-supervised methods and achieves highly competitive results compared to fully supervised methods. We achieve improvements in all metrics, especially EPE, which is reduced by 8.51% on the KITTIo dataset and 10.52% on the KITTIs dataset, respectively.
- Published
- 2024
32. KBLaM: Knowledge Base augmented Language Model
- Author
-
Wang, Xi, Isazawa, Taketomo, Mikaelyan, Liana, and Hensman, James
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
In this paper, we propose Knowledge Base augmented Language Model (KBLaM), a new method for augmenting Large Language Models (LLMs) with external knowledge. KBLaM works with a knowledge base (KB) constructed from a corpus of documents, transforming each piece of knowledge in the KB into continuous key-value vector pairs via pre-trained sentence encoders with linear adapters and integrating them into pre-trained LLMs via a specialized rectangular attention mechanism. Unlike Retrieval-Augmented Generation, KBLaM eliminates external retrieval modules, and unlike in-context learning, its computational overhead scales linearly with KB size rather than quadratically. Our approach enables integrating a large KB of more than 10K triples into an 8B pre-trained LLM of only 8K context window on one single A100 80GB GPU and allows for dynamic updates without model fine-tuning or retraining. Experiments demonstrate KBLaM's effectiveness in various tasks, including question-answering and open-ended reasoning, while providing interpretable insights into its use of the augmented knowledge. Code and datasets are available at https://github.com/microsoft/KBLaM/
- Published
- 2024
33. Formation of Anisotropic Polarons in Antimony Selenide
- Author
-
Shi, Yijie, Wang, Xi, Wang, Zhong, Zhang, Zheng, Hua, Fuyong, Chen, Chao, Hu, Chunlong, Tang, Jiang, and Liang, Wenxi
- Subjects
Condensed Matter - Materials Science ,Physics - Optics - Abstract
Antimony Selenide (Sb$_2$Se$_3$) is an attractive candidate of photovoltaics with not yet satisfying efficiency. Beside defects, polaron formation originated from lattice distortion was proposed to account for trapping free carriers, and the subsequent photoexcitation dynamics and optoelectronic properties, but such a mechanism is still lack of structural observations. Here we directly track the pathways of carrier and lattice evolutions after photoexcitation through optical and electron diffraction pump-probe methods, revealing the temporal correlations between dynamics of both degrees of freedom. The observed opposite separation changes of Se2-Sb2 and Sb2-Sb1 atom pairs in a few picoseconds, and the intermediate state induced by local structural distortions lasting several tens of picoseconds, coinciding with the optical phonons population and coupling, and the trapping process of carriers, respectively, together with the analyses of modulation on diffuse scattering by the atomic displacement fields of polaron model, indicate the formation of anisotropic polarons with large size. Our findings provide carrier and structural information for helping the elucidation of polaron scenario in Sb2Se3, and probably in materials with anisotropic structure and soft lattice which are popular in developing novel optoelectronics.
- Published
- 2024
34. Active nonreciprocal cloaking for pseudo-Hermitian magnons
- Author
-
Schulz, Dominik, Berakdar, Jamal, and Wang, Xi-guang
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Cloaking has important applications but entails sophisticated control of signal propagation and scattering characteristics. Here, we show that invisibility for magnon signals is achievable in a non-reciprocal and electrically controlled way by engineering the magnonic channels such that they exhibit PT-symmetry. This is accomplished by attaching current-carrying heavy metal contacts to the magnon waveguides and exerting fields from an attached bias layer. Tuning the current density in the metal layer, the magnons in this setup experience electrically controlled, compensated gain and loss due to spin-orbit torque which renders the setup PT-symmetric. The magnon dynamics is then shown to be pseudo-Hermitian with exceptional points (EPs) determined actively by an external electric field. We analyze the magnon scattering from single and periodic PT-symmetric regions and identify the conditions necessary for the formation of unidirectional invisibility which can be steered by specific combinations of bias layers and current amplitudes in the heavy metal as to reach the EP. The unidirectional invisibility at EP is found to be extended for a periodic PT-symmetric region. Intrinsic damping on PT-symmetric unidirectional invisibility is shown to be marginal confirming the experimental feasibility. It is shown how the unidirectional magnons can be utilized to amplify and generate magnonic orbital angular momentum states in coupled magnetic rings demonstrating a new path for manipulating magnon propagation and processing., Comment: 25 pages, 6 figures
- Published
- 2024
35. Design, manufacturing, and inverse dynamic modeling of soft parallel robots actuated by dielectric elastomer actuators
- Author
-
Chang, Jung-Che, Wang, Xi, Axinte, Dragos, and Dong, Xin
- Subjects
Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Soft parallel robots with their manipulation safety and low commercial cost show a promising future for delicate operations and safe human-robot interactions. However, promoting the use of electroactive polymers (EAPs) is still challenging due to the under-improving quality of the product and the dynamic modelling of the collaborations between multiple actuators. This article presents the design, fabrication, modelling and control of a parallel kinematics Delta robot actuated by dielectric elastomer actuators (DEAs). The trade-off between the actuation force and stroke is retaken by an angular stroke amplification mechanism, and the weight of the robot frame is reduced by utilizing 3D puzzling strip structures. A generic way of constructing a high-stability conductive paint on a silicon-based film has been achieved by laser scanning the DE-film and then sandwiching a conductive particle-based electrode with a paint which is mixed by the particles and photosensitive resin. Compared to the wildly used carbon grease, the fabricated electrode shows a higher consistency in its dynamic behaviour before and after the on-stand test. Finally, to predict the output force and inverse motion of the robot end effector, we constructed the inverse dynamic model by introducing an expanded Bergstrom-Boyce model to the constitutive behavior of the dielectric film. The experimental results show a prediction of robot output force with RSME of 12.4% when the end effector remains stationary, and a well-followed trajectory with less than RSME 2.5%., Comment: 17 pages, 12 figures
- Published
- 2024
36. Bi-stable thin soft robot for in-plane locomotion in narrow space
- Author
-
Wang, Xi, Chang, Jung-che, Wang, Feiran, Axinte, Dragos, and Dong, Xin
- Subjects
Computer Science - Robotics ,Physics - Classical Physics - Abstract
Dielectric elastomer actuators (DEAs), also recognized as artificial muscle, have been widely developed for the soft locomotion robot. With the complaint skeleton and miniaturized dimension, they are well suited for the narrow space inspection. In this work, we propose a novel low profile (1.1mm) and lightweight (1.8g) bi-stable in-plane DEA (Bi-DEA) constructed by supporting a dielectric elastomer onto a flat bi-stable mechanism. It has an amplified displacement and output force compared with the in-plane DEA (I-DEA) without the bi-stable mechanism. Then, the Bi-DEA is applied to a thin soft robot, using three electrostatic adhesive pads (EA-Pads) as anchoring elements. This robot is capable of crawling and climbing to access millimetre-scale narrow gaps. A theoretical model of the bi-stable mechanism and the DEA are presented. The enhanced performance of the Bi-DEA induced by the mechanism is experimentally validated. EA-Pad provides the adhesion between the actuator and the locomotion substrate, allowing crawling and climbing on various surfaces, i.e., paper and acrylic. The thin soft robot has been demonstrated to be capable of crawling through a 4mm narrow gap with a speed up to 3.3mm/s (0.07 body length per second and 2.78 body thickness per second)., Comment: 8 pages, 12 figures
- Published
- 2024
37. Distinguishing black holes with and without spontaneous scalarization in Einstein-scalar-Gauss-Bonnet theories via optical features
- Author
-
Wang, Xi-Jing, Meng, Yuan, Kuang, Xiao-Mei, and Liao, Kai
- Subjects
General Relativity and Quantum Cosmology - Abstract
Spontaneous scalarization in Einstein-scalar-Gauss-Bonnet theory admits both vacuum-general relativity (GR) and scalarized hairy black holes as valid solutions, which provides a distinctive signature of new physics in strong gravity regime. In this paper, we shall examine the optical features of Gauss-Bonnet black holes with spontaneous scalarization, which is governed by the coupling parameter $\lambda$. We find that the photon sphere, critical impact parameter and innermost stable circular orbit all decrease as the increasing of $\lambda$. Using observable data from Event Horizon Telescope, we establish the upper limit for $\lambda$. Then we construct the optical appearances of the scalarized black holes illuminated by various thin accretions. Our findings reveal that the scalarized black holes consistently exhibit smaller shadow sizes and reduced brightness compared to Schwarzschild black holes. Notably, in the case of thin spherical accretion, the shadow of the scalarized black hole is smaller, but the surrounding bright ring is more pronounced. Our results highlight the observable features of the scalarized black holes, providing a distinguishable probe from their counterpart in GR in strong gravity regime., Comment: 18 pages, 10 figures, matching the version published in EPJC
- Published
- 2024
- Full Text
- View/download PDF
38. Report on the Workshop on Simulations for Information Access (Sim4IA 2024) at SIGIR 2024
- Author
-
Breuer, Timo, Kreutz, Christin Katharina, Fuhr, Norbert, Balog, Krisztian, Schaer, Philipp, Bernard, Nolwenn, Frommholz, Ingo, Gohsen, Marcel, Ji, Kaixin, Jones, Gareth J. F., Keller, Jüri, Liu, Jiqun, Mladenov, Martin, Pasi, Gabriella, Trippas, Johanne, Wang, Xi, Zerhoudi, Saber, and Zhai, ChengXiang
- Subjects
Computer Science - Information Retrieval - Abstract
This paper is a report of the Workshop on Simulations for Information Access (Sim4IA) workshop at SIGIR 2024. The workshop had two keynotes, a panel discussion, nine lightning talks, and two breakout sessions. Key takeaways were user simulation's importance in academia and industry, the possible bridging of online and offline evaluation, and the issues of organizing a companion shared task around user simulations for information access. We report on how we organized the workshop, provide a brief overview of what happened at the workshop, and summarize the main topics and findings of the workshop and future work., Comment: Preprint of a SIGIR Forum submission for Vol. 58 No. 2 - December 2024
- Published
- 2024
39. Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking
- Author
-
Wang, Xi, Chen, Tianxing, Yu, Qiaojun, Xu, Tianling, Chen, Zanxin, Fu, Yiting, Lu, Cewu, Mu, Yao, and Luo, Ping
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Graphics ,Computer Science - Machine Learning - Abstract
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experiments in simulated environments demonstrate that our method outperforms baseline approaches, especially in tasks that demand precise axis-based control. Project Page: https://hytidel.github.io/video-tracking-for-axis-estimation/., Comment: Project Page: https://hytidel.github.io/video-tracking-for-axis-estimation/
- Published
- 2024
40. MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation
- Author
-
He, Junqing, Zhu, Liang, Wang, Rui, Wang, Xi, Haffari, Reza, and Zhang, Jiaxing
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS). To evaluate the effectiveness of such MADS, existing commonly used evaluation metrics, like retrieval accuracy and perplexity (PPL), mainly focus on query-oriented factualness and language quality assessment. However, these metrics often lack practical value. Moreover, the evaluation dimensions are insufficient for human-like assessment in DS. Regarding memory-recalling paradigms, current evaluation schemes only consider passive memory retrieval while ignoring diverse memory recall with rich triggering factors, e.g., emotions and surroundings, which can be essential in emotional support scenarios. To bridge the gap, we construct a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) covering various memory-recalling paradigms based on cognitive science and psychology theories. The benchmark assesses two tasks separately: memory retrieval and memory recognition with the incorporation of both passive and proactive memory recall data. We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses. Results from cutting-edge embedding models and large language models on this benchmark indicate the potential for further advancement. Extensive testing further reveals correlations between memory injection, ES proficiency, and intimacy., Comment: Submitted to NAACL 2025
- Published
- 2024
41. Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog
- Author
-
Yao, Bingkun, Wang, Ning, Zhou, Jie, Wang, Xi, Gao, Hong, Jiang, Zhe, and Guan, Nan
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Artificial Intelligence - Abstract
Bug localization in Verilog code is a crucial and time-consuming task during the verification of hardware design. Since introduction, Large Language Models (LLMs) have showed their strong programming capabilities. However, no work has yet considered using LLMs for bug localization in Verilog code. This paper presents Location-is-Key, an opensource LLM solution to locate functional errors in Verilog snippets. LiK achieves high localization accuracy, with a pass@1 localization accuracy of 93.3% on our test dataset based on RTLLM, surpassing GPT-4's 77.9% and comparable to Claude-3.5's 90.8%. Additionally, the bug location obtained by LiK significantly improves GPT-3.5's bug repair efficiency (Functional pass@1 increased from 40.39% to 58.92%), highlighting the importance of bug localization in LLM-based Verilog debugging. Compared to existing methods, LiK only requires the design specification and the erroneous code snippet, without the need for testbenches, assertions, or any other EDA tools. This research demonstrates the feasibility of using LLMs for Verilog error localization, thus providing a new direction for automatic Verilog code debugging.
- Published
- 2024
42. Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings
- Author
-
Wang, Xi, Liu, Xin, Zhu, Songming, Li, Zhanwen, and Gao, Lina
- Subjects
Physics - Geophysics ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Signal Processing - Abstract
The recent emergence of Distributed Acoustic Sensing (DAS) technology has facilitated the effective capture of traffic-induced seismic data. The traffic-induced seismic wave is a prominent contributor to urban vibrations and contain crucial information to advance urban exploration and governance. However, identifying vehicular movements within massive noisy data poses a significant challenge. In this study, we introduce a real-time semi-supervised vehicle monitoring framework tailored to urban settings. It requires only a small fraction of manual labels for initial training and exploits unlabeled data for model improvement. Additionally, the framework can autonomously adapt to newly collected unlabeled data. Before DAS data undergo object detection as two-dimensional images to preserve spatial information, we leveraged comprehensive one-dimensional signal preprocessing to mitigate noise. Furthermore, we propose a novel prior loss that incorporates the shapes of vehicular traces to track a single vehicle with varying speeds. To evaluate our model, we conducted experiments with seismic data from the Stanford 2 DAS Array. The results showed that our model outperformed the baseline model Efficient Teacher and its supervised counterpart, YOLO (You Only Look Once), in both accuracy and robustness. With only 35 labeled images, our model surpassed YOLO's mAP 0.5:0.95 criterion by 18% and showed a 7% increase over Efficient Teacher. We conducted comparative experiments with multiple update strategies for self-updating and identified an optimal approach. This approach surpasses the performance of non-overfitting training conducted with all data in a single pass.
- Published
- 2024
43. Toward satisfactory public accessibility: A crowdsourcing approach through online reviews to inclusive urban design
- Author
-
Li, Lingyao, Hu, Songhua, Dai, Yinpei, Deng, Min, Momeni, Parisa, Laverghetta, Gabriel, Fan, Lizhou, Ma, Zihui, Wang, Xi, Ma, Siyuan, Ligatti, Jay, and Hemphill, Libby
- Subjects
Computer Science - Social and Information Networks - Abstract
As urban populations grow, the need for accessible urban design has become urgent. Traditional survey methods for assessing public perceptions of accessibility are often limited in scope. Crowdsourcing via online reviews offers a valuable alternative to understanding public perceptions, and advancements in large language models can facilitate their use. This study uses Google Maps reviews across the United States and fine-tunes Llama 3 model with the Low-Rank Adaptation technique to analyze public sentiment on accessibility. At the POI level, most categories -- restaurants, retail, hotels, and healthcare -- show negative sentiments. Socio-spatial analysis reveals that areas with higher proportions of white residents and greater socioeconomic status report more positive sentiment, while areas with more elderly, highly-educated residents exhibit more negative sentiment. Interestingly, no clear link is found between the presence of disabilities and public sentiments. Overall, this study highlights the potential of crowdsourcing for identifying accessibility challenges and providing insights for urban planners.
- Published
- 2024
44. Learning to Discover Forgery Cues for Face Forgery Detection
- Author
-
Tian, Jiahe, Chen, Peng, Yu, Cai, Fu, Xiaomeng, Wang, Xi, Dai, Jiao, and Han, Jizhong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Locating manipulation maps, i.e., pixel-level annotation of forgery cues, is crucial for providing interpretable detection results in face forgery detection. Related learning objects have also been widely adopted as auxiliary tasks to improve the classification performance of detectors whereas they require comparisons between paired real and forged faces to obtain manipulation maps as supervision. This requirement restricts their applicability to unpaired faces and contradicts real-world scenarios. Moreover, the used comparison methods annotate all changed pixels, including noise introduced by compression and upsampling. Using such maps as supervision hinders the learning of exploitable cues and makes models prone to overfitting. To address these issues, we introduce a weakly supervised model in this paper, named Forgery Cue Discovery (FoCus), to locate forgery cues in unpaired faces. Unlike some detectors that claim to locate forged regions in attention maps, FoCus is designed to sidestep their shortcomings of capturing partial and inaccurate forgery cues. Specifically, we propose a classification attentive regions proposal module to locate forgery cues during classification and a complementary learning module to facilitate the learning of richer cues. The produced manipulation maps can serve as better supervision to enhance face forgery detectors. Visualization of the manipulation maps of the proposed FoCus exhibits superior interpretability and robustness compared to existing methods. Experiments on five datasets and four multi-task models demonstrate the effectiveness of FoCus in both in-dataset and cross-dataset evaluations., Comment: TIFS 2024
- Published
- 2024
- Full Text
- View/download PDF
45. SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
- Author
-
Rahmani, Hossein A., Wang, Xi, Yilmaz, Emine, Craswell, Nick, Mitra, Bhaskar, and Thomas, Paul
- Subjects
Computer Science - Information Retrieval - Abstract
Large-scale test collections play a crucial role in Information Retrieval (IR) research. However, according to the Cranfield paradigm and the research into publicly available datasets, the existing information retrieval research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments - a time-intensive and expensive process. Recent studies have shown the strong capability of Large Language Models (LLMs) in producing reliable relevance judgments with human accuracy but at a greatly reduced cost. In this paper, to address the missing large-scale ad-hoc document retrieval dataset, we extend the TREC Deep Learning Track (DL) test collection via additional language model synthetic labels to enable researchers to test and evaluate their search systems at a large scale. Specifically, such a test collection includes more than 1,900 test queries from the previous years of tracks. We compare system evaluation with past human labels from past years and find that our synthetically created large-scale test collection can lead to highly correlated system rankings., Comment: 9 pages, resource paper, WWW 2025
- Published
- 2024
46. Cloud-Based Federation Framework and Prototype for Open, Scalable, and Shared Access to NextG and IoT Testbeds
- Author
-
McManus, Maxwell, Rinchen, Tenzin, Dey, Annoy, Thota, Sumanth, Zhang, Zhaoxi, Hu, Jiangqi, Wang, Xi, Ji, Mingyue, Mastronarde, Nicholas, Bentley, Elizabeth Serena, Medley, Michael, and Guan, Zhangyu
- Subjects
Electrical Engineering and Systems Science - Signal Processing ,Computer Science - Networking and Internet Architecture - Abstract
In this work, we present a new federation framework for UnionLabs, an innovative cloud-based resource-sharing infrastructure designed for next-generation (NextG) and Internet of Things (IoT) over-the-air (OTA) experiments. The framework aims to reduce the federation complexity for testbeds developers by automating tedious backend operations, thereby providing scalable federation and remote access to various wireless testbeds. We first describe the key components of the new federation framework, including the Systems Manager Integration Engine (SMIE), the Automated Script Generator (ASG), and the Database Context Manager (DCM). We then prototype and deploy the new Federation Plane on the Amazon Web Services (AWS) public cloud, demonstrating its effectiveness by federating two wireless testbeds: i) UB NeXT, a 5G-and-beyond (5G+) testbed at the University at Buffalo, and ii) UT IoT, an IoT testbed at the University of Utah. Through this work we aim to initiate a grassroots campaign to democratize access to wireless research testbeds with heterogeneous hardware resources and network environment, and accelerate the establishment of a mature, open experimental ecosystem for the wireless community. The API of the new Federation Plane will be released to the community after internal testing is completed.
- Published
- 2024
47. Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation
- Author
-
Yang, Kaixiang, Shan, Wenqi, Li, Xudong, Wang, Xuan, Yang, Xikai, Wang, Xi, Heng, Pheng-Ann, Li, Qiang, and Wang, Zhiwei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multi-modal brain tumor segmentation typically involves four magnetic resonance imaging (MRI) modalities, while incomplete modalities significantly degrade performance. Existing solutions employ explicit or implicit modality adaptation, aligning features across modalities or learning a fused feature robust to modality incompleteness. They share a common goal of encouraging each modality to express both itself and the others. However, the two expression abilities are entangled as a whole in a seamless feature space, resulting in prohibitive learning burdens. In this paper, we propose DeMoSeg to enhance the modality adaptation by Decoupling the task of representing the ego and other Modalities for robust incomplete multi-modal Segmentation. The decoupling is super lightweight by simply using two convolutions to map each modality onto four feature sub-spaces. The first sub-space expresses itself (Self-feature), while the remaining sub-spaces substitute for other modalities (Mutual-features). The Self- and Mutual-features interactively guide each other through a carefully-designed Channel-wised Sparse Self-Attention (CSSA). After that, a Radiologist-mimic Cross-modality expression Relationships (RCR) is introduced to have available modalities provide Self-feature and also `lend' their Mutual-features to compensate for the absent ones by exploiting the clinical prior knowledge. The benchmark results on BraTS2020, BraTS2018 and BraTS2015 verify the DeMoSeg's superiority thanks to the alleviated modality adaptation difficulty. Concretely, for BraTS2020, DeMoSeg increases Dice by at least 0.92%, 2.95% and 4.95% on whole tumor, tumor core and enhanced tumor regions, respectively, compared to other state-of-the-arts. Codes are at https://github.com/kk42yy/DeMoSeg, Comment: 8 pages, 4 figures
- Published
- 2024
48. Spatio-Temporal Communication Compression for Distributed Prime-Dual Optimization
- Author
-
Ren, Zihao, Wang, Lei, Yi, Xinlei, Wang, Xi, Yuan, Deming, Yang, Tao, Wu, Zhengguang, and Shi, Guodong
- Subjects
Electrical Engineering and Systems Science - Systems and Control - Abstract
Several data compressors have been proposed in distributed optimization frameworks of network systems to reduce communication overhead in large-scale applications. In this paper, we demonstrate that effective information compression may occur over time or space during sequences of node communications in distributed algorithms, leading to the concept of spatio-temporal compressors. This abstraction classifies existing compressors as spatio-temporal compressors, with their effectiveness described by constructive stability criteria from nonlinear system theory. Subsequently, we apply these spatio-temporal compressors to standard continuous-time consensus flows and distributed prime-dual flows, establishing conditions ensuring convergence. Additionally, we introduce a novel observer-based distributed primal-dual continuous flow integrated with spatio-temporal compressors, which provides broader convergence conditions. These continuous flows achieve exponential convergence to the global optimum when the objective function is strongly convex and can be discretized using Euler approximations. Finally, numerical simulations illustrate the versatility of the proposed spatio-temporal compressors and verify the convergence of algorithms., Comment: arXiv admin note: text overlap with arXiv:2408.02332
- Published
- 2024
49. Context-Driven Index Trimming: A Data Quality Perspective to Enhancing Precision of RALMs
- Author
-
Ma, Kexin, Jin, Ruochun, Wang, Xi, Chen, Huan, Ren, Jing, and Tang, Yuhua
- Subjects
Computer Science - Computation and Language ,Computer Science - Databases - Abstract
Retrieval-Augmented Large Language Models (RALMs) have made significant strides in enhancing the accuracy of generated responses.However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods.We propose to boost the precision of RALMs' answers from a data quality perspective through the Context-Driven Index Trimming (CDIT) framework, where Context Matching Dependencies (CMDs) are employed as logical data quality rules to capture and regulate the consistency between retrieved contexts.Based on the semantic comprehension capabilities of Large Language Models (LLMs), CDIT can effectively identify and discard retrieval results that are inconsistent with the query context and further modify indexes in the database, thereby improving answer quality.Experiments demonstrate on challenging question-answering tasks.Also, the flexibility of CDIT is verified through its compatibility with various language models and indexing methods, which offers a promising approach to bolster RALMs' data quality and retrieval precision jointly.
- Published
- 2024
50. Source-Free Domain-Invariant Performance Prediction
- Author
-
Khramtsova, Ekaterina, Baktashmotlagh, Mahsa, Zuccon, Guido, Wang, Xi, and Salzmann, Mathieu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Accurately estimating model performance poses a significant challenge, particularly in scenarios where the source and target domains follow different data distributions. Most existing performance prediction methods heavily rely on the source data in their estimation process, limiting their applicability in a more realistic setting where only the trained model is accessible. The few methods that do not require source data exhibit considerably inferior performance. In this work, we propose a source-free approach centred on uncertainty-based estimation, using a generative model for calibration in the absence of source data. We establish connections between our approach for unsupervised calibration and temperature scaling. We then employ a gradient-based strategy to evaluate the correctness of the calibrated predictions. Our experiments on benchmark object recognition datasets reveal that existing source-based methods fall short with limited source sample availability. Furthermore, our approach significantly outperforms the current state-of-the-art source-free and source-based methods, affirming its effectiveness in domain-invariant performance estimation., Comment: Accepted in ECCV 2024
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.