136,337 results on '"wang, Yu"'
Search Results
2. Listening to the Enemy: Radio Consumption and Technological Culture in Maoist China, 1949–1965
- Author
-
Wang, Yu
- Published
- 2022
- Full Text
- View/download PDF
3. On the Hochschild homology of singularity categories
- Author
-
Wang, Yu, Arunachalam, Umamaheswaran, and Keller, Bernhard
- Subjects
Mathematics ,QA1-939 - Abstract
Let $k$ be an algebraically closed field and $A$ a finite-dimensional $k$-algebra. In this note, we determine complexes which compute the Hochschild homology of the canonical dg enhancement of the bounded derived category of $A$ and of the canonical dg enhancement of the singularity category of $A$. As an application, we obtain a new approach to the computation of Hochschild homology of Leavitt path algebras.
- Published
- 2022
- Full Text
- View/download PDF
4. Effectiveness Evaluation of Recombinant Antigen rCPI For iELISA Detection of Camel Parabronemiasis
- Author
-
Wang, Yu, Feng, Chenchen, Liu, Chunxia, LI, Jianyun, and Wang, Wenlong
- Published
- 2021
- Full Text
- View/download PDF
5. Multi-UAV Behavior-based Formation with Static and Dynamic Obstacles Avoidance via Reinforcement Learning
- Author
-
Xie, Yuqing, Yu, Chao, Zang, Hongzhi, Gao, Feng, Tang, Wenhao, Huang, Jingyi, Chen, Jiayu, Xu, Botian, Wu, Yi, and Wang, Yu
- Subjects
Computer Science - Robotics - Abstract
Formation control of multiple Unmanned Aerial Vehicles (UAVs) is vital for practical applications. This paper tackles the task of behavior-based UAV formation while avoiding static and dynamic obstacles during directed flight. We present a two-stage reinforcement learning (RL) training pipeline to tackle the challenge of multi-objective optimization, large exploration spaces, and the sim-to-real gap. The first stage searches in a simplified scenario for a linear utility function that balances all task objectives simultaneously, whereas the second stage applies the utility function in complex scenarios, utilizing curriculum learning to navigate large exploration spaces. Additionally, we apply an attention-based observation encoder to enhance formation maintenance and manage varying obstacle quantity. Experiments in simulation and real world demonstrate that our method outperforms planning-based and RL-based baselines regarding collision-free rate and formation maintenance in scenarios with static, dynamic, and mixed obstacles.
- Published
- 2024
6. A Comprehensive Analysis of Social Tie Strength: Definitions, Prediction Methods, and Future Directions
- Author
-
Cheng, Xueqi, Yang, Catherine, Zhao, Yuying, Wang, Yu, Karimi, Hamid, and Derr, Tyler
- Subjects
Computer Science - Social and Information Networks - Abstract
The rapid growth of online social networks has underscored the importance of understanding the intensity of user relationships, referred to as "tie strength." Over the past few decades, extensive efforts have been made to assess tie strength in networks. However, the lack of ground-truth tie strength labels and the differing perspectives on tie strength among researchers have complicated the development of effective prediction methods for real-world applications. In our study, we first categorize mainstream understandings of tie strength into seven standardized definitions and verify their effectiveness by investigating the class distributions and correlations across these definitions. We also draw key insights into tie resilience from the perspective of tie dissolution that (1) stronger ties are more resilient than weaker ones, and (2) this tie resiliency ratio increases as the network evolves. We then conduct extensive experiments to evaluate existing tie strength prediction methods under these definitions, revealing that (1) neural network methods capable of learning from semantic features hold great potential for high performance, (2) models struggle under definitions that offer limited understandings of tie strength in the network, (3) existing models face imbalance issues that cannot be addressed by traditional quantity imbalance techniques, and (4) different definitions of tie strength allow for the inference of not only the current state but also the future state of a tie. Building on these findings, we propose strategies to improve existing methods and suggest several promising directions for future research.
- Published
- 2024
7. Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models
- Author
-
Cao, He, Luo, Weidi, Wang, Yu, Liu, Zijing, Feng, Bing, Yao, Yuan, and Li, Yu
- Subjects
Computer Science - Artificial Intelligence - Abstract
With the extensive deployment of Large Language Models (LLMs), ensuring their safety has become increasingly critical. However, existing defense methods often struggle with two key issues: (i) inadequate defense capabilities, particularly in domain-specific scenarios like chemistry, where a lack of specialized knowledge can lead to the generation of harmful responses to malicious queries. (ii) over-defensiveness, which compromises the general utility and responsiveness of LLMs. To mitigate these issues, we introduce a multi-agents-based defense framework, Guide for Defense (G4D), which leverages accurate external information to provide an unbiased summary of user intentions and analytically grounded safety response guidance. Extensive experiments on popular jailbreak attacks and benign datasets show that our G4D can enhance LLM's robustness against jailbreak attacks on general and domain-specific scenarios without compromising the model's general functionality.
- Published
- 2024
8. ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents
- Author
-
Liao, Yusheng, Jiang, Shuyang, Wang, Yanfeng, and Wang, Yu
- Subjects
Computer Science - Computation and Language - Abstract
Large Language Models (LLMs) have shown promising potential in the medical domain, assisting with tasks like clinical note generation and patient communication. However, current LLMs are limited to text-based communication, hindering their ability to interact with diverse forms of information in clinical environments. Despite clinical agents succeeding in diverse signal interaction, they are oriented to a single clinical scenario and hence fail for broader applications. To evaluate clinical agents holistically, we propose ClinicalAgent Bench~(CAB), a comprehensive medical agent benchmark consisting of 18 tasks across five key realistic clinical dimensions. Building on this, we introduce ReflecTool, a novel framework that excels at utilizing domain-specific tools within two stages. The first optimization stage progressively enlarges a long-term memory by saving successful solving processes and tool-wise experience of agents in a tiny pre-defined training set. In the following inference stage, ReflecTool can search for supportive successful demonstrations from already built long-term memory to guide the tool selection strategy, and a verifier improves the tool usage according to the tool-wise experience with two verification methods--iterative refinement and candidate selection. Extensive experiments on ClinicalAgent Benchmark demonstrate that ReflecTool surpasses the pure LLMs with more than 10 points and the well-established agent-based methods with 3 points, highlighting its adaptability and effectiveness in solving complex clinical tasks., Comment: 20 pages
- Published
- 2024
9. PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting
- Author
-
Wang, Yu, Wei, Xiaobao, Lu, Ming, and Kang, Guoliang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Previous methods utilize the Neural Radiance Field (NeRF) for panoptic lifting, while their training and rendering speed are unsatisfactory. In contrast, 3D Gaussian Splatting (3DGS) has emerged as a prominent technique due to its rapid training and rendering speed. However, unlike NeRF, the conventional 3DGS may not satisfy the basic smoothness assumption as it does not rely on any parameterized structures to render (e.g., MLPs). Consequently, the conventional 3DGS is, in nature, more susceptible to noisy 2D mask supervision. In this paper, we propose a new method called PLGS that enables 3DGS to generate consistent panoptic segmentation masks from noisy 2D segmentation masks while maintaining superior efficiency compared to NeRF-based methods. Specifically, we build a panoptic-aware structured 3D Gaussian model to introduce smoothness and design effective noise reduction strategies. For the semantic field, instead of initialization with structure from motion, we construct reliable semantic anchor points to initialize the 3D Gaussians. We then use these anchor points as smooth regularization during training. Additionally, we present a self-training approach using pseudo labels generated by merging the rendered masks with the noisy masks to enhance the robustness of PLGS. For the instance field, we project the 2D instance masks into 3D space and match them with oriented bounding boxes to generate cross-view consistent instance masks for supervision. Experiments on various benchmarks demonstrate that our method outperforms previous state-of-the-art methods in terms of both segmentation quality and speed.
- Published
- 2024
10. Error estimates between SGD with momentum and underdamped Langevin diffusion
- Author
-
Guillin, Arnaud, Wang, Yu, Xu, Lihu, and Yang, Haoran
- Subjects
Statistics - Machine Learning ,Computer Science - Machine Learning ,Mathematics - Probability - Abstract
Stochastic gradient descent with momentum is a popular variant of stochastic gradient descent, which has recently been reported to have a close relationship with the underdamped Langevin diffusion. In this paper, we establish a quantitative error estimate between them in the 1-Wasserstein and total variation distances.
- Published
- 2024
11. Few-shot In-Context Preference Learning Using Large Language Models
- Author
-
Yu, Chao, Lu, Hong, Gao, Jiaxuan, Tan, Qixin, Yang, Xinting, Wang, Yu, Wu, Yi, and Vinitsky, Eugene
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Designing reward functions is a core component of reinforcement learning but can be challenging for truly complex behavior. Reinforcement Learning from Human Feedback (RLHF) has been used to alleviate this challenge by replacing a hand-coded reward function with a reward function learned from preferences. However, it can be exceedingly inefficient to learn these rewards as they are often learned tabula rasa. We investigate whether Large Language Models (LLMs) can reduce this query inefficiency by converting an iterative series of human preferences into code representing the rewards. We propose In-Context Preference Learning (ICPL), a method that uses the grounding of an LLM to accelerate learning reward functions from preferences. ICPL takes the environment context and task description, synthesizes a set of reward functions, and then repeatedly updates the reward functions using human rankings of videos of the resultant policies. Using synthetic preferences, we demonstrate that ICPL is orders of magnitude more efficient than RLHF and is even competitive with methods that use ground-truth reward functions instead of preferences. Finally, we perform a series of human preference-learning trials and observe that ICPL extends beyond synthetic settings and can work effectively with humans-in-the-loop. Additional information and videos are provided at https://sites.google.com/view/few-shot-icpl/home.
- Published
- 2024
12. DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization
- Author
-
Zhu, Haowei, Tang, Dehua, Liu, Ji, Lu, Mingjie, Zheng, Jintu, Peng, Jinzhang, Li, Dong, Wang, Yu, Jiang, Fan, Tian, Lu, Tiwari, Spandan, Sirasao, Ashish, Yong, Jun-Hai, Wang, Bin, and Barsoum, Emad
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Diffusion models have achieved remarkable progress in the field of image generation due to their outstanding capabilities. However, these models require substantial computing resources because of the multi-step denoising process during inference. While traditional pruning methods have been employed to optimize these models, the retraining process necessitates large-scale training datasets and extensive computational costs to maintain generalization ability, making it neither convenient nor efficient. Recent studies attempt to utilize the similarity of features across adjacent denoising stages to reduce computational costs through simple and static strategies. However, these strategies cannot fully harness the potential of the similar feature patterns across adjacent timesteps. In this work, we propose a novel pruning method that derives an efficient diffusion model via a more intelligent and differentiable pruner. At the core of our approach is casting the model pruning process into a SubNet search process. Specifically, we first introduce a SuperNet based on standard diffusion via adding some backup connections built upon the similar features. We then construct a plugin pruner network and design optimization losses to identify redundant computation. Finally, our method can identify an optimal SubNet through few-step gradient optimization and a simple post-processing procedure. We conduct extensive experiments on various diffusion models including Stable Diffusion series and DiTs. Our DiP-GO approach achieves 4.4 x speedup for SD-1.5 without any loss of accuracy, significantly outperforming the previous state-of-the-art methods.
- Published
- 2024
13. Large Language Model-based Augmentation for Imbalanced Node Classification on Text-Attributed Graphs
- Author
-
Wang, Leyao, Wang, Yu, Ni, Bo, Zhao, Yuying, and Derr, Tyler
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning ,Computer Science - Social and Information Networks - Abstract
Node classification on graphs frequently encounters the challenge of class imbalance, leading to biased performance and posing significant risks in real-world applications. Although several data-centric solutions have been proposed, none of them focus on Text-Attributed Graphs (TAGs), and therefore overlook the potential of leveraging the rich semantics encoded in textual features for boosting the classification of minority nodes. Given this crucial gap, we investigate the possibility of augmenting graph data in the text space, leveraging the textual generation power of Large Language Models (LLMs) to handle imbalanced node classification on TAGs. Specifically, we propose a novel approach called LA-TAG (LLM-based Augmentation on Text-Attributed Graphs), which prompts LLMs to generate synthetic texts based on existing node texts in the graph. Furthermore, to integrate these synthetic text-attributed nodes into the graph, we introduce a text-based link predictor to connect the synthesized nodes with the existing nodes. Our experiments across multiple datasets and evaluation metrics show that our framework significantly outperforms traditional non-textual-based data augmentation strategies and specific node imbalance solutions. This highlights the promise of using LLMs to resolve imbalance issues on TAGs., Comment: 11 pages, 4 figures
- Published
- 2024
14. Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
- Author
-
Lu, Yuxiang, Cao, Shengcao, and Wang, Yu-Xiong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Vision Foundation Models (VFMs) have demonstrated outstanding performance on numerous downstream tasks. However, due to their inherent representation biases originating from different training paradigms, VFMs exhibit advantages and disadvantages across distinct vision tasks. Although amalgamating the strengths of multiple VFMs for downstream tasks is an intuitive strategy, effectively exploiting these biases remains a significant challenge. In this paper, we propose a novel and versatile "Swiss Army Knife" (SAK) solution, which adaptively distills knowledge from a committee of VFMs to enhance multi-task learning. Unlike existing methods that use a single backbone for knowledge transfer, our approach preserves the unique representation bias of each teacher by collaborating the lightweight Teacher-Specific Adapter Path modules with the Teacher-Agnostic Stem. Through dynamic selection and combination of representations with Mixture-of-Representations Routers, our SAK is capable of synergizing the complementary strengths of multiple VFMs. Extensive experiments show that our SAK remarkably outperforms prior state of the arts in multi-task learning by 10% on the NYUD-v2 benchmark, while also providing a flexible and robust framework that can readily accommodate more advanced model designs.
- Published
- 2024
15. SPF-EMPC Planner: A real-time multi-robot trajectory planner for complex environments with uncertainties
- Author
-
Liu, Peng, Zhu, Pengming, Zeng, Zhiwen, Qiu, Xuekai, Wang, Yu, and Lu, Huimin
- Subjects
Computer Science - Robotics - Abstract
In practical applications, the unpredictable movement of obstacles and the imprecise state observation of robots introduce significant uncertainties for the swarm of robots, especially in cluster environments. However, existing methods are difficult to realize safe navigation, considering uncertainties, complex environmental structures, and robot swarms. This paper introduces an extended state model predictive control planner with a safe probability field to address the multi-robot navigation problem in complex, dynamic, and uncertain environments. Initially, the safe probability field offers an innovative approach to model the uncertainty of external dynamic obstacles, combining it with an unconstrained optimization method to generate safe trajectories for multi-robot online. Subsequently, the extended state model predictive controller can accurately track these generated trajectories while considering the robots' inherent model constraints and state uncertainty, thus ensuring the practical feasibility of the planned trajectories. Simulation experiments show a success rate four times higher than that of state-of-the-art algorithms. Physical experiments demonstrate the method's ability to operate in real-time, enabling safe navigation for multi-robot in uncertain environments.
- Published
- 2024
16. Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation
- Author
-
Shimizu, Ryotaro, Wada, Takashi, Wang, Yu, Kruse, Johannes, O'Brien, Sean, HtaungKham, Sai, Song, Linxin, Yoshikawa, Yuya, Saito, Yuki, Tsung, Fugee, Goto, Masayuki, and McAuley, Julian
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Information Retrieval - Abstract
Recent research on explainable recommendation generally frames the task as a standard text generation problem, and evaluates models simply based on the textual similarity between the predicted and ground-truth explanations. However, this approach fails to consider one crucial aspect of the systems: whether their outputs accurately reflect the users' (post-purchase) sentiments, i.e., whether and why they would like and/or dislike the recommended items. To shed light on this issue, we introduce new datasets and evaluation methods that focus on the users' sentiments. Specifically, we construct the datasets by explicitly extracting users' positive and negative opinions from their post-purchase reviews using an LLM, and propose to evaluate systems based on whether the generated explanations 1) align well with the users' sentiments, and 2) accurately identify both positive and negative opinions of users on the target items. We benchmark several recent models on our datasets and demonstrate that achieving strong performance on existing metrics does not ensure that the generated explanations align well with the users' sentiments. Lastly, we find that existing models can provide more sentiment-aware explanations when the users' (predicted) ratings for the target items are directly fed into the models as input. We will release our code and datasets upon acceptance.
- Published
- 2024
17. Soft-Matter-Based Topological Vertical Cavity Surface Emitting Lasers
- Author
-
Wang, Yu, Xia, Shiqi, Shao, Jingbin, Xie, Qun, Yang, Donghao, Zhang, Xinzheng, Drevensek-Olenik, Irena, Wu, Qiang, Chen, Zhigang, and Xu, Jingjun
- Subjects
Physics - Optics - Abstract
Polarized topological vertical cavity surface-emitting lasers (VCSELs), as stable and efficient on-chip light sources, play an important role in the next generation of optical storage and optical communications. However, most current topological lasers demand complex design and expensive fabrication processes, and their semiconductor-based structures pose challenges for flexible device applications. By use of an analogy with two-dimensional Semenov insulators in synthetic parametric space, we design and realize a one-dimensional optical superlattice (stacked polymerized cholesteric liquid crystal films and Mylar films), thereby we demonstrate a flexible, low threshold, circularly polarized topological VCSEL with high slope efficiency. We show that such a laser maintains a good single-mode property under low pump power and inherits the transverse spatial profile of the pump laser. Thanks to the soft-matter-based flexibility, our topological VCSEL can be "attached" to substrates of various shapes, enabling desired laser properties and robust beam steering even after undergoing hundreds of bends. Our results may find applications in consumer electronics, laser scanning and displays, as well as wearable devices.
- Published
- 2024
18. Instability of steady-state mixed-state symmetry-protected topological order to strong-to-weak spontaneous symmetry breaking
- Author
-
Shah, Jeet, Fechisin, Christopher, Wang, Yu-Xin, Iosue, Joseph T., Watson, James D., Wang, Yan-Qi, Ware, Brayden, Gorshkov, Alexey V., and Lin, Cheng-Ju
- Subjects
Quantum Physics ,Condensed Matter - Statistical Mechanics ,Condensed Matter - Strongly Correlated Electrons - Abstract
Recent experimental progress in controlling open quantum systems enables the pursuit of mixed-state nonequilibrium quantum phases. We investigate whether open quantum systems hosting mixed-state symmetry-protected topological states as steady states retain this property under symmetric perturbations. Focusing on the decohered cluster state -- a mixed-state symmetry-protected topological state protected by a combined strong and weak symmetry -- we construct a parent Lindbladian that hosts it as a steady state. This Lindbladian can be mapped onto exactly solvable reaction-diffusion dynamics, even in the presence of certain perturbations, allowing us to solve the parent Lindbladian in detail and reveal previously-unknown steady states. Using both analytical and numerical methods, we find that typical symmetric perturbations cause strong-to-weak spontaneous symmetry breaking at arbitrarily small perturbations, destabilize the steady-state mixed-state symmetry-protected topological order. However, when perturbations introduce only weak symmetry defects, the steady-state mixed-state symmetry-protected topological order remains stable. Additionally, we construct a quantum channel which replicates the essential physics of the Lindbladian and can be efficiently simulated using only Clifford gates, Pauli measurements, and feedback., Comment: 21+12 pages, 10+4 figures
- Published
- 2024
19. Strings and membranes from $\mathcal{A}$-theory five brane
- Author
-
Hatsuda, Machiko, Hulík, Ondřej, Linch, William D., Siegel, Warren D., Wang, Di, and Wang, Yu-Ping
- Subjects
High Energy Physics - Theory - Abstract
The $\mathcal{A}$-theory takes U-duality symmetry as a guiding principle, with the SL(5) U-duality symmetry being described as the world-volume theory of a 5-brane. Furthermore, by unifying the 6-dimensional world-volume Lorentz symmetry with the SL(5) spacetime symmetry, it extends to SL(6) U-duality symmetry. The SL(5) spacetime vielbein fields and the 5-brane world-volume vielbein fields are mixed under the SL(6) U-duality transformation. We demonstrate that consistent sectionings of the SL(6) $\mathcal{A}$5-brane world-volume Lagrangian yield Lagrangians of the $\mathcal{T}$-string with O(D,D) T-duality symmetry, the conventional string, the ${\cal M}$5-brane with GL(4) duality symmetry, and the non-perturbative M2-brane in supergravity theory. The GL(4) covariant Lagrangian of the ${\cal M}$5-brane derived in this manner is a new, perturbatively quantizable theory., Comment: 41 pages
- Published
- 2024
20. SceneCraft: Layout-Guided 3D Scene Generation
- Author
-
Yang, Xiuyu, Man, Yunze, Chen, Jun-Kun, and Wang, Yu-Xiong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The creation of complex 3D scenes tailored to user specifications has been a tedious and challenging task with traditional 3D modeling tools. Although some pioneering methods have achieved automatic text-to-3D generation, they are generally limited to small-scale scenes with restricted control over the shape and texture. We introduce SceneCraft, a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences provided by users. Central to our method is a rendering-based technique, which converts 3D semantic layouts into multi-view 2D proxy maps. Furthermore, we design a semantic and depth conditioned diffusion model to generate multi-view images, which are used to learn a neural radiance field (NeRF) as the final scene representation. Without the constraints of panorama image generation, we surpass previous methods in supporting complicated indoor space generation beyond a single room, even as complicated as a whole multi-bedroom apartment with irregular shapes and layouts. Through experimental analysis, we demonstrate that our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality. Code and more results are available at: https://orangesodahub.github.io/SceneCraft, Comment: NeurIPS 2024. Code: https://github.com/OrangeSodahub/SceneCraft Project Page: https://orangesodahub.github.io/SceneCraft
- Published
- 2024
21. Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective
- Author
-
Ni, Bo, Wang, Yu, Cheng, Lu, Blasch, Erik, and Derr, Tyler
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Recently, Knowledge Graphs (KGs) have been successfully coupled with Large Language Models (LLMs) to mitigate their hallucinations and enhance their reasoning capability, such as in KG-based retrieval-augmented frameworks. However, current KG-LLM frameworks lack rigorous uncertainty estimation, limiting their reliable deployment in high-stakes applications. Directly incorporating uncertainty quantification into KG-LLM frameworks presents challenges due to their complex architectures and the intricate interactions between the knowledge graph and language model components. To address this gap, we propose a new trustworthy KG-LLM framework, Uncertainty Aware Knowledge-Graph Reasoning (UAG), which incorporates uncertainty quantification into the KG-LLM framework. We design an uncertainty-aware multi-step reasoning framework that leverages conformal prediction to provide a theoretical guarantee on the prediction set. To manage the error rate of the multi-step process, we additionally introduce an error rate control module to adjust the error rate within the individual components. Extensive experiments show that our proposed UAG can achieve any pre-defined coverage rate while reducing the prediction set/interval size by 40% on average over the baselines.
- Published
- 2024
22. Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
- Author
-
Yang, Yue, Zhang, Shuibai, Shao, Wenqi, Zhang, Kaipeng, Bin, Yi, Wang, Yu, and Luo, Ping
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across multimodal tasks such as visual perception and reasoning, leading to good performance on various multimodal evaluation benchmarks. However, these benchmarks keep a static nature and overlap with the pre-training data, resulting in fixed complexity constraints and data contamination issues. This raises the concern regarding the validity of the evaluation. To address these two challenges, we introduce a dynamic multimodal evaluation protocol called Vision-Language Bootstrapping (VLB). VLB provides a robust and comprehensive assessment for LVLMs with reduced data contamination and flexible complexity. To this end, VLB dynamically generates new visual question-answering samples through a multimodal bootstrapping module that modifies both images and language, while ensuring that newly generated samples remain consistent with the original ones by a judge module. By composing various bootstrapping strategies, VLB offers dynamic variants of existing benchmarks with diverse complexities, enabling the evaluation to co-evolve with the ever-evolving capabilities of LVLMs. Extensive experimental results across multiple benchmarks, including SEEDBench, MMBench, and MME, show that VLB significantly reduces data contamination and exposes performance limitations of LVLMs.
- Published
- 2024
23. Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision
- Author
-
Cao, Shengcao, Gui, Liang-Yan, and Wang, Yu-Xiong
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Current large multimodal models (LMMs) face challenges in grounding, which requires the model to relate language components to visual entities. Contrary to the common practice that fine-tunes LMMs with additional grounding supervision, we find that the grounding ability can in fact emerge in LMMs trained without explicit grounding supervision. To reveal this emerging grounding, we introduce an "attend-and-segment" method which leverages attention maps from standard LMMs to perform pixel-level segmentation. Furthermore, to enhance the grounding ability, we propose DIFFLMM, an LMM utilizing a diffusion-based visual encoder, as opposed to the standard CLIP visual encoder, and trained with the same weak supervision. Without being constrained by the biases and limited scale of grounding-specific supervision data, our approach is more generalizable and scalable. We achieve competitive performance on both grounding-specific and general visual question answering benchmarks, compared with grounding LMMs and generalist LMMs, respectively. Notably, we achieve a 44.2 grounding mask recall on grounded conversation generation without any grounding supervision, outperforming the extensively supervised model GLaMM. Project page: https://groundLMM.github.io.
- Published
- 2024
24. Multi-messenger signatures of a deformed magnetar in gamma-ray bursts
- Author
-
Hashemi, Parisa, Shakeri, Soroush, Wang, Yu, Li, Liang, and Moradi, Rahim
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Cosmology and Nongalactic Astrophysics ,High Energy Physics - Phenomenology - Abstract
We study the evolution of a newly formed magnetized neutron-star (NS) as a power source of gamma-ray bursts (GRBs) in the light of both gravitational wave (GW) and electromagnetic (EM) radiations. The compressible and incompressible fluids are employed in order to model the secular evolution of Maclaurian spheroids. It is shown that the GW and EM light curves evolve as a function of eccentricity and rotational frequency with time. We find that the light curve characteristics crucially depend on NS parameters such as magnitude and structure of magnetic field, ellipticity and the equation of state (EOS) of the fluid. The presence of X-ray flares, whose origins are not yet well understood, can be captured in our model regarding some specific nuclear EOSs. Our model allowing us to explain flares that occur within the wide range of $ 10$ to $10^4$ seconds and the peak luminosity in the order of $10^{46}$ - $10^{51}$ $\rm \text{erg}/s$ by using a reasonable set of parameters such as magnetic field strength around $10^{14}-10^{16}$ Gauss, the quadrupole to dipole ratio of magnetic field up to 500. By applying our model to a sample of GRB X-ray flares observed by Swift/XRT, we try to constraint the crucial parameters of a deformed magnetar via MCMC fitting method. Our analysis shows that ongoing and upcoming joint multi-messenger detections can be used to understand the nature of GRB's central engine and its evolution at the early times of the burst formation.
- Published
- 2024
25. InstructG2I: Synthesizing Images from Multimodal Attributed Graphs
- Author
-
Jin, Bowen, Pang, Ziqi, Guo, Bingjun, Wang, Yu-Xiong, You, Jiaxuan, and Han, Jiawei
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning ,Computer Science - Social and Information Networks - Abstract
In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs). This task poses significant challenges due to the explosion in graph size, dependencies among graph entities, and the need for controllability in graph conditions. To address these challenges, we propose a graph context-conditioned diffusion model called InstructG2I. InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling by combining personalized page rank and re-ranking based on vision-language features. Then, a Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process of diffusion. Finally, we propose graph classifier-free guidance, enabling controllable generation by varying the strength of graph guidance and multiple connected edges to a node. Extensive experiments conducted on three datasets from different domains demonstrate the effectiveness and controllability of our approach. The code is available at https://github.com/PeterGriffinJin/InstructG2I., Comment: 16 pages
- Published
- 2024
26. 3D UAV Trajectory Planning for IoT Data Collection via Matrix-Based Evolutionary Computation
- Author
-
Sun, Pei-Fa, Song, Yujae, Gao, Kang-Yu, Wang, Yu-Kai, Zhou, Changjun, Jeon, Sang-Woon, and Zhang, Jun
- Subjects
Computer Science - Neural and Evolutionary Computing - Abstract
UAVs are increasingly becoming vital tools in various wireless communication applications including internet of things (IoT) and sensor networks, thanks to their rapid and agile non-terrestrial mobility. Despite recent research, planning three-dimensional (3D) UAV trajectories over a continuous temporal-spatial domain remains challenging due to the need to solve computationally intensive optimization problems. In this paper, we study UAV-assisted IoT data collection aimed at minimizing total energy consumption while accounting for the UAV's physical capabilities, the heterogeneous data demands of IoT nodes, and 3D terrain. We propose a matrix-based differential evolution with constraint handling (MDE-CH), a computation-efficient evolutionary algorithm designed to address non-convex constrained optimization problems with several different types of constraints. Numerical evaluations demonstrate that the proposed MDE-CH algorithm provides a continuous 3D temporal-spatial UAV trajectory capable of efficiently minimizing energy consumption under various practical constraints and outperforms the conventional fly-hover-fly model for both two-dimensional (2D) and 3D trajectory planning.
- Published
- 2024
27. Reconstruction of Particle Flow Energy Distribution Using Deep Learning Algorithms
- Author
-
Zhang, Han, Lin, Shengxiang, Zhang, Xingyi, Wang, Yu, and Zhang, Yangguang
- Subjects
Physics - Instrumentation and Detectors ,Computer Science - Artificial Intelligence - Abstract
In high-energy particle physics, extracting information from complex detector signals is crucial for energy reconstruction. Recent advancements involve using deep learning to process calorimeter images from various sub-detectors in experiments like the Large Hadron Collider (LHC) for energy map reconstruction. This paper compares classical algorithms\-MLP, CNN, U-Net, and RNN\-with variants that include self-attention and 3D convolution modules to evaluate their effectiveness in reconstructing the initial energy distribution. Additionally, a test dataset of jet events is utilized to analyze and compare models' performance in handling anomalous high-energy events. The analysis highlights the effectiveness of deep learning techniques for energy image reconstruction and explores their potential in this area., Comment: 11 pages, 1 tables, 9 figures Code available at https://github.com/Image-processing-Particle-flow/Project1
- Published
- 2024
28. Supermassive primordial black holes for the GHZ9 and UHZ1 observed by the JWST
- Author
-
Huang, Hai-Long, Wang, Yu-Tong, and Piao, Yun-Song
- Subjects
Astrophysics - Astrophysics of Galaxies ,Astrophysics - Cosmology and Nongalactic Astrophysics ,General Relativity and Quantum Cosmology - Abstract
The high redshift ($z>10$) galaxies GHZ9 and UHZ1 observed by the James Webb Space Telescope (JWST) are very massive and have exceptionally high black hole-to-star mass ratios with the central black hole masses $M\gtrsim 10^7\rm~M_\odot$. In this paper, we explore the possibility that they are seeded by the supermassive primordial black holes (SMPBHs), which came into being in the very early universe, with initial masses $\sim 10^7\rm~M_\odot$. We present the self-similar accretion solutions for SMPBHs, and find that the mass growth of SMPBHs during pregalactic era may be negligible. These SMPBHs, when the redshift $z\lesssim 20$, can accelerate seeding high-redshift galaxies and their baryonic content, and consequently explain the central supermassive black holes (SMBHs) of high-redshift massive galaxies through sub-Eddington accretion. According to our results, SMPBHs actually could lead to the existence of more massive SMBHs at higher redshifts compared to other SMBH seed scenarios, specially SMBHs with masses $M\gtrsim 10^7~\rm M_\odot$ at $z>20$ might only origin from SMPBHs, thus the corresponding observation can serve as a potential probe to PBHs., Comment: 14 pages, 5 figures
- Published
- 2024
29. Exponential entanglement advantage in sensing correlated noise
- Author
-
Wang, Yu-Xin, Bringewatt, Jacob, Seif, Alireza, Brady, Anthony J., Oh, Changhun, and Gorshkov, Alexey V.
- Subjects
Quantum Physics - Abstract
In this work, we propose a new form of exponential quantum advantage in the context of sensing correlated noise. Specifically, we focus on the problem of estimating parameters associated with Lindblad dephasing dynamics, and show that entanglement can lead to an exponential enhancement in the sensitivity (as quantified via quantum Fisher information of the sensor state) for estimating a small parameter characterizing the deviation of system Lindbladians from a class of maximally correlated dephasing dynamics. This result stands in stark contrast with previously studied scenarios of sensing uncorrelated dephasing noise, where one can prove that entanglement does not lead to an advantage in the signal-to-noise ratio. Our work thus opens a novel pathway towards achieving entanglement-based sensing advantage, which may find applications in characterizing decoherence dynamics of near-term quantum devices. Further, our approach provides a potential quantum-enhanced probe of many-body correlated phases by measuring noise generated by a sensing target. We also discuss realization of our protocol using near-term quantum hardware., Comment: 7+2 pages, 1 figure
- Published
- 2024
30. Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
- Author
-
Li, Jinhao, Xu, Jiaming, Huang, Shan, Chen, Yonghua, Li, Wen, Liu, Jun, Lian, Yaoxiu, Pan, Jiayi, Ding, Li, Zhou, Hao, Wang, Yu, and Dai, Guohao
- Subjects
Computer Science - Hardware Architecture ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across various fields, from natural language understanding to text generation. Compared to non-generative LLMs like BERT and DeBERTa, generative LLMs like GPT series and Llama series are currently the main focus due to their superior algorithmic performance. The advancements in generative LLMs are closely intertwined with the development of hardware capabilities. Various hardware platforms exhibit distinct hardware characteristics, which can help improve LLM inference performance. Therefore, this paper comprehensively surveys efficient generative LLM inference on different hardware platforms. First, we provide an overview of the algorithm architecture of mainstream generative LLMs and delve into the inference process. Then, we summarize different optimization methods for different platforms such as CPU, GPU, FPGA, ASIC, and PIM/NDP, and provide inference results for generative LLMs. Furthermore, we perform a qualitative and quantitative comparison of inference performance with batch sizes 1 and 8 on different hardware platforms by considering hardware power consumption, absolute inference speed (tokens/s), and energy efficiency (tokens/J). We compare the performance of the same optimization methods across different hardware platforms, the performance across different hardware platforms, and the performance of different methods on the same hardware platform. This provides a systematic and comprehensive summary of existing inference acceleration work by integrating software optimization methods and hardware platforms, which can point to the future trends and potential developments of generative LLMs and hardware technology for edge-side scenarios., Comment: 43 pages, 15 figures
- Published
- 2024
31. Efficiently Identifying Watermarked Segments in Mixed-Source Texts
- Author
-
Zhao, Xuandong, Liao, Chenwen, Wang, Yu-Xiang, and Li, Lei
- Subjects
Computer Science - Computation and Language - Abstract
Text watermarks in large language models (LLMs) are increasingly used to detect synthetic text, mitigating misuse cases like fake news and academic dishonesty. While existing watermarking detection techniques primarily focus on classifying entire documents as watermarked or not, they often neglect the common scenario of identifying individual watermark segments within longer, mixed-source documents. Drawing inspiration from plagiarism detection systems, we propose two novel methods for partial watermark detection. First, we develop a geometry cover detection framework aimed at determining whether there is a watermark segment in long text. Second, we introduce an adaptive online learning algorithm to pinpoint the precise location of watermark segments within the text. Evaluated on three popular watermarking techniques (KGW-Watermark, Unigram-Watermark, and Gumbel-Watermark), our approach achieves high accuracy, significantly outperforming baseline methods. Moreover, our framework is adaptable to other watermarking techniques, offering new insights for precise watermark detection.
- Published
- 2024
32. Beyond the Phase Ordering Problem: Finding the Globally Optimal Code w.r.t. Optimization Phases
- Author
-
Wang, Yu, Chen, Hongyu, and Wang, Ke
- Subjects
Computer Science - Programming Languages - Abstract
In this paper, we propose a new concept called \textit{semantically equivalence} \wrt \textit{optimization phases} \textit{(\sep)}, which defines the set of programs a compiler considers semantically equivalent to the input using a set of optimization phases. We show both theoretically and empirically that solving the phase ordering problem does not necessarily result in the most efficient code among all programs that a compiler deems semantically equivalent to the input, hereinafter referred to as the global optimal code \wrt optimization phases. To find the global optimal code \wrt optimization phases, we present a conceptual framework, leveraging the reverse of existing optimization phases. In theory, we prove that the framework is capable of finding the global optimal code for any program. We realize this framework into a technique, called \textit{iterative bi-directional optimization (\tool)}, which performs both the normal and reverse optimizations to increase and decrease the efficiency of the generated code, respectively. We evaluate \tool on C/C++ files randomly extracted from highly mature and influential programs (\eg, Linux kernel, OpenSSL, Z3). Results show that \tool frequently generates more efficient code -- measured by either code size or runtime performance -- than exhaustive search, which is the solution to the phase ordering problem. We also find by simply incorporating \tool's reverse optimization phases, the effectiveness of the optimization of state-of-the-art compilers (\eg, GCC/LLVM) can be significantly improved.
- Published
- 2024
33. Complexity factor for a static self-gravitating sphere in Rastall-Rainbow gravity
- Author
-
Ye, Zhou-Li, Wang, Yu, Yang, Rui-Xin, and Liu, Dao-Jun
- Subjects
General Relativity and Quantum Cosmology - Abstract
We generalized Herrera's definition of complexity factor for static spherically symmetric fluid distributions to Rastall-Rainbow theory of gravity. For this purpose, an energy-dependent equation of motion is employed in accordance with the principle of gravity's rainbow. It is found that the complexity factor appears in the orthogonal splitting of the Riemann curvature tensor, and measures the deviation of the value of the active gravitational mass from the simplest system under the combined corrections of Rastall and rainbow. In the low-energy limit, all the results we have obtained reduce to the counterparts of general relativity when the non-conserved parameter is taken to be one. We also demonstrate how to build an anisotropic or isotropic star model using complexity approach. In particular, the vanishing complexity factor condition in Rastall-Rainbow gravity is exactly the same as that derived in general relativity. This fact may imply a deeper geometric foundation for the complexity factor., Comment: 10 two-column pages, 1 figure; accepted by Physics of the Dark Universe
- Published
- 2024
34. Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
- Author
-
Teng, Yao, Shi, Han, Liu, Xian, Ning, Xuefei, Dai, Guohao, Wang, Yu, Li, Zhenguo, and Liu, Xihui
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The current large auto-regressive models can generate high-quality, high-resolution images, but these models require hundreds or even thousands of steps of next-token prediction during inference, resulting in substantial time consumption. In existing studies, Jacobi decoding, an iterative parallel decoding algorithm, has been used to accelerate the auto-regressive generation and can be executed without training. However, the Jacobi decoding relies on a deterministic criterion to determine the convergence of iterations. Thus, it works for greedy decoding but is incompatible with sampling-based decoding which is crucial for visual quality and diversity in the current auto-regressive text-to-image generation. In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. By introducing a probabilistic convergence criterion, our SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding and allowing the model to generate diverse images. Specifically, SJD facilitates the model to predict multiple tokens at each step and accepts tokens based on the probabilistic criterion, enabling the model to generate images with fewer steps than the conventional next-token-prediction paradigm. We also investigate the token initialization strategies that leverage the spatial locality of visual data to further improve the acceleration ratio under specific scenarios. We conduct experiments for our proposed SJD on multiple auto-regressive text-to-image generation models, showing the effectiveness of model acceleration without sacrificing the visual quality.
- Published
- 2024
35. Topological one-way Weyl fiber
- Author
-
Lin, Hao, Wang, Yu, Ji, Zitao, Zheng, Yidong, Chen, Jianfeng, and Li, Zhi-Yuan
- Subjects
Physics - Optics ,Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
Topological photonics enables unprecedented photon manipulation by realizing various topological states, such as corner states, edge states, and surface states. However, achieving a topological fiber state has remained elusive. Here, we demonstrate a topological fiber state in a Weyl gyromagnetic photonic crystal fiber. By applying an in-plane magnetic bias to a gyromagnetic photonic crystal fiber with broken parity-inversion symmetry, we create an asymmetrical Weyl bandgap that supports one-way fiber states associated with type-II Weyl points. Dispersion and topological invariant calculations reveal the transition from Weyl surface states to one-way Weyl fiber states. Electromagnetic field simulations confirm the existence of one-way Weyl fiber states and their robust transport in the presence of metallic obstacle along the transport path. Our findings offer an intriguing pathway for exploring novel topological states and guiding the design of topological fibers., Comment: 19 pages, 7 figures
- Published
- 2024
36. Atmospheric Pressure Ammonia Synthesis on AuRu Catalysts Enabled by Plasmon-Controlled Hydrogenation and Nitrogen-species Desorption
- Author
-
Yuan, Lin, Bourgeois, Briley B., Begin, Elijah, Zhang, Yirui, Dai, Alan X., Cheng, Zhihua, McKeown-Green, Amy S., Xue, Zhichen, Cui, Yi, Xu, Kun, Wang, Yu, Jones, Matthew R., Majumdar, Arun, Bao, Junwei Lucas, and Dionne, Jennifer A.
- Subjects
Physics - Chemical Physics - Abstract
Ammonia is a key component of fertilizer and a potential clean fuel and hydrogen carrier. The Haber-Bosch process for ammonia synthesis consumes more than half of industrial hydrogen and contributes up to ~3% of global greenhouse gas emissions. Light-driven reactions via surface plasmon resonances offer a less energy-intensive pathway for ammonia production by altering reaction intermediates. Here, we report gold-ruthenium plasmonic bimetallic alloys for ammonia synthesis at room temperature and pressure, driven by visible light. We use colloidal synthesis to create AuRu$_x$ alloys (x=0.1, 0.2, 0.3) and disperse these nanoparticles on MgO supports for gas-phase ammonia synthesis. We observe a ~60 $\mu$mol/g/h reactivity and ~0.12% external quantum efficiency on a AuRu$_0$$_.$$_2$ sample under 100 mW/cm$^2$ visible light. In-situ diffuse reflective infrared Fourier transform spectroscopic measurements show that hydrogenation of nitrogen adsorbates is accelerated under light compared to thermocatalysis. Combining wavelength-dependent reactivity and spectroscopic findings with semi-classical electromagnetic modeling, we show plasmonic bimetallic alloys expedite ammonia synthesis by aiding hydrogenation of adsorbed nitrogen species via plasmon-mediated hot electrons. Quantum mechanical calculations reveal hydrogen-assisted N$_2$ splitting in the excited state is key to activating the reaction under ambient conditions. Therefore, light or H$_2$ alone cannot dissociate N$_2$ -- the key bottleneck to breaking N$_2$'s triple bond. Our findings are consistent with recent hypotheses on how nitrogenase enzymes catalyze ammonia production at mild conditions and provide insights for sustainable photochemical transformations., Comment: 21 pages, 4 figures, journal article submission soon
- Published
- 2024
37. Self-Updatable Large Language Models with Parameter Integration
- Author
-
Wang, Yu, Liu, Xinshuang, Chen, Xiusi, O'Brien, Sean, Wu, Junda, and McAuley, Julian
- Subjects
Computer Science - Computation and Language - Abstract
Despite significant advancements in large language models (LLMs), the rapid and frequent integration of small-scale experiences, such as interactions with surrounding objects, remains a substantial challenge. Two critical factors in assimilating these experiences are (1) Efficacy: the ability to accurately remember recent events; (2) Retention: the capacity to recall long-past experiences. Current methods either embed experiences within model parameters using continual learning, model editing, or knowledge distillation techniques, which often struggle with rapid updates and complex interactions, or rely on external storage to achieve long-term retention, thereby increasing storage requirements. In this paper, we propose SELF-PARAM (Self-Updatable Large Language Models with Parameter Integration). SELF-PARAM requires no extra parameters while ensuring near-optimal efficacy and long-term retention. Our method employs a training objective that minimizes the Kullback-Leibler (KL) divergence between the predictions of an original model (with access to contextual information) and a target model (without such access). By generating diverse question-answer pairs related to the knowledge and minimizing the KL divergence across this dataset, we update the target model to internalize the knowledge seamlessly within its parameters. Evaluations on question-answering and conversational recommendation tasks demonstrate that SELF-PARAM significantly outperforms existing methods, even when accounting for non-zero storage requirements. This advancement paves the way for more efficient and scalable integration of experiences in large language models by embedding knowledge directly into model parameters.
- Published
- 2024
38. Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
- Author
-
Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Yang, Chao-Han Huck, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank, and Lee, Hung-yi
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Computation and Language ,Computer Science - Sound - Abstract
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) by incorporating pre-trained speech models. However, these SLMs often undergo extensive speech instruction-tuning to bridge the gap between speech and text modalities. This requires significant annotation efforts and risks catastrophic forgetting of the original language capabilities. In this work, we present a simple yet effective automatic process for creating speech-text pair data that carefully injects speech paralinguistic understanding abilities into SLMs while preserving the inherent language capabilities of the text-based LLM. Our model demonstrates general capabilities for speech-related tasks without the need for speech instruction-tuning data, achieving impressive performance on Dynamic-SUPERB and AIR-Bench-Chat benchmarks. Furthermore, our model exhibits the ability to follow complex instructions derived from LLMs, such as specific output formatting and chain-of-thought reasoning. Our approach not only enhances the versatility and effectiveness of SLMs but also reduces reliance on extensive annotated datasets, paving the way for more efficient and capable speech understanding systems., Comment: Submitted to ICASSP 2025
- Published
- 2024
39. Multi-Designated Detector Watermarking for Language Models
- Author
-
Huang, Zhengan, Zeng, Gongxian, Mu, Xin, Wang, Yu, and Yu, Yue
- Subjects
Computer Science - Cryptography and Security ,Computer Science - Artificial Intelligence - Abstract
In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs). This technique allows model providers to generate watermarked outputs from LLMs with two key properties: (i) only specific, possibly multiple, designated detectors can identify the watermarks, and (ii) there is no perceptible degradation in the output quality for ordinary users. We formalize the security definitions for MDDW and present a framework for constructing MDDW for any LLM using multi-designated verifier signatures (MDVS). Recognizing the significant economic value of LLM outputs, we introduce claimability as an optional security feature for MDDW, enabling model providers to assert ownership of LLM outputs within designated-detector settings. To support claimable MDDW, we propose a generic transformation converting any MDVS to a claimable MDVS. Our implementation of the MDDW scheme highlights its advanced functionalities and flexibility over existing methods, with satisfactory performance metrics.
- Published
- 2024
40. Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective
- Author
-
Wang, Yu, Yin, Yuxuan, and Li, Peng
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Semi-supervised learning (SSL) commonly exhibits confirmation bias, where models disproportionately favor certain classes, leading to errors in predicted pseudo labels that accumulate under a self-training paradigm. Unlike supervised settings, which benefit from a rich, static data distribution, SSL inherently lacks mechanisms to correct this self-reinforced bias, necessitating debiased interventions at each training step. Although the generation of debiased pseudo labels has been extensively studied, their effective utilization remains underexplored. Our analysis indicates that data from biased classes should have a reduced influence on parameter updates, while more attention should be given to underrepresented classes. To address these challenges, we introduce TaMatch, a unified framework for debiased training in SSL. TaMatch employs a scaling ratio derived from both a prior target distribution and the model's learning status to estimate and correct bias at each training step. This ratio adjusts the raw predictions on unlabeled data to produce debiased pseudo labels. In the utilization phase, these labels are differently weighted according to their predicted class, enhancing training equity and minimizing class bias. Additionally, TaMatch dynamically adjust the target distribution in response to the model's learning progress, facilitating robust handling of practical scenarios where the prior distribution is unknown. Empirical evaluations show that TaMatch significantly outperforms existing state-of-the-art methods across a range of challenging image classification tasks, highlighting the critical importance of both the debiased generation and utilization of pseudo labels in SSL., Comment: 11 pages, 4 figures
- Published
- 2024
41. Behavior evolution-inspired approach to walking gait reinforcement training for quadruped robots
- Author
-
Wang, Yu, Jia, Wenchuan, Sun, Yi, and He, Dong
- Subjects
Computer Science - Robotics - Abstract
Reinforcement learning method is extremely competitive in gait generation techniques for quadrupedal robot, which is mainly due to the fact that stochastic exploration in reinforcement training is beneficial to achieve an autonomous gait. Nevertheless, although incremental reinforcement learning is employed to improve training success and movement smoothness by relying on the continuity inherent during limb movements, challenges remain in adapting gait policy to diverse terrain and external disturbance. Inspired by the association between reinforcement learning and the evolution of animal motion behavior, a self-improvement mechanism for reference gait is introduced in this paper to enable incremental learning of action and self-improvement of reference action together to imitate the evolution of animal motion behavior. Further, a new framework for reinforcement training of quadruped gait is proposed. In this framework, genetic algorithm is specifically adopted to perform global probabilistic search for the initial value of the arbitrary foot trajectory to update the reference trajectory with better fitness. Subsequently, the improved reference gait is used for incremental reinforcement learning of gait. The above process is repeatedly and alternatively executed to finally train the gait policy. The analysis considering terrain, model dimensions, and locomotion condition is presented in detail based on simulation, and the results show that the framework is significantly more adaptive to terrain compared to regular incremental reinforcement learning.
- Published
- 2024
42. Multi-UAV Pursuit-Evasion with Online Planning in Unknown Environments by Deep Reinforcement Learning
- Author
-
Chen, Jiayu, Yu, Chao, Li, Guosheng, Tang, Wenhao, Yang, Xinyi, Xu, Botian, Yang, Huazhong, and Wang, Yu
- Subjects
Computer Science - Robotics ,Computer Science - Machine Learning - Abstract
Multi-UAV pursuit-evasion, where pursuers aim to capture evaders, poses a key challenge for UAV swarm intelligence. Multi-agent reinforcement learning (MARL) has demonstrated potential in modeling cooperative behaviors, but most RL-based approaches remain constrained to simplified simulations with limited dynamics or fixed scenarios. Previous attempts to deploy RL policy to real-world pursuit-evasion are largely restricted to two-dimensional scenarios, such as ground vehicles or UAVs at fixed altitudes. In this paper, we address multi-UAV pursuit-evasion by considering UAV dynamics and physical constraints. We introduce an evader prediction-enhanced network to tackle partial observability in cooperative strategy learning. Additionally, we propose an adaptive environment generator within MARL training, enabling higher exploration efficiency and better policy generalization across diverse scenarios. Simulations show our method significantly outperforms all baselines in challenging scenarios, generalizing to unseen scenarios with a 100% capture rate. Finally, we derive a feasible policy via a two-stage reward refinement and deploy the policy on real quadrotors in a zero-shot manner. To our knowledge, this is the first work to derive and deploy an RL-based policy using collective thrust and body rates control commands for multi-UAV pursuit-evasion in unknown environments. The open-source code and videos are available at https://sites.google.com/view/pursuit-evasion-rl.
- Published
- 2024
43. Partial disruption of a planet around a white dwarf: the effect of perturbation from the remnant planet on the accretion
- Author
-
Kurban, Abdusattar, Zhou, Xia, Wang, Na, Huang, Yong-Feng, Wang, Yu-Bin, and Nurmamat, Nurimangul
- Subjects
Astrophysics - Earth and Planetary Astrophysics ,Astrophysics - Solar and Stellar Astrophysics - Abstract
About 25\% -50\% of white dwarfs (WDs) are found to be polluted by heavy elements. It has been argued that the pollution could be caused by the tidal disruption of an approaching planet around the WD, during which a large number of clumps would be produced and would finally fall onto the WD. The reason that the planet approaches the WD is usually believed to be due to gravitational perturbations from another distant planet or stellar companion. However, the dynamics of the perturbation and the detailed partial disruption process are still poorly understood. In this study, we present an in-depth investigation of these issues. A triple system composed of a WD, an inner orbit planet, and an outer orbit planet is considered. The inner plant would be partially disrupted periodically in the long-term evolution. Fragments generated in the process are affected by the gravitational perturbations from the remnant planet, facilitating their falling toward the WD. The mass loss rate of the inner planet depends on both its internal structure and also on the orbital configuration of the planetary system., Comment: 14 pages, 6 figures, 4 tables, Accepted for publication in The Astrophysical Journal
- Published
- 2024
- Full Text
- View/download PDF
44. The sparseness of g-convex functions
- Author
-
Wang, Yu and Ye, Ke
- Subjects
Mathematics - Differential Geometry ,Mathematics - Optimization and Control - Abstract
The g-convexity of functions on manifolds is a generalization of the convexity of functions on Rn. It plays an essential role in both differential geometry and non-convex optimization theory. This paper is concerned with g-convex smooth functions on manifolds. We establish criteria for the existence of a Riemannian metric (or connection) with respect to which a given function is g-convex. Using these criteria, we obtain three sparseness results for g-convex functions: (1) The set of g-convex functions on a compact manifold is nowhere dense in the space of smooth functions. (2) Most polynomials on Rn that is g-convex with respect to some geodesically complete connection has at most one critical point. (3) The density of g-convex univariate (resp. quadratic, monomial, additively separable) polynomials asymptotically decreases to zero
- Published
- 2024
45. Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification
- Author
-
Kong, Jingyu, Guo, Yuan, Wang, Yu, and Duan, Yuping
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Real-world data often has a long-tailed distribution, where the scarcity of tail samples significantly limits the model's generalization ability. Denoising Diffusion Probabilistic Models (DDPM) are generative models based on stochastic differential equation theory and have demonstrated impressive performance in image classification tasks. However, existing diffusion probabilistic models do not perform satisfactorily in classifying tail classes. In this work, we propose the Anisotropic Diffusion Probabilistic Model (ADPM) for imbalanced image classification problems. We utilize the data distribution to control the diffusion speed of different class samples during the forward process, effectively improving the classification accuracy of the denoiser in the reverse process. Specifically, we provide a theoretical strategy for selecting noise levels for different categories in the diffusion process based on error analysis theory to address the imbalanced classification problem. Furthermore, we integrate global and local image prior in the forward process to enhance the model's discriminative ability in the spatial dimension, while incorporate semantic-level contextual information in the reverse process to boost the model's discriminative power and robustness. Through comparisons with state-of-the-art methods on four medical benchmark datasets, we validate the effectiveness of the proposed method in handling long-tail data. Our results confirm that the anisotropic diffusion model significantly improves the classification accuracy of rare classes while maintaining the accuracy of head classes. On the skin lesion datasets, PAD-UFES and HAM10000, the F1-scores of our method improved by 4% and 3%, respectively compared to the original diffusion probabilistic model.
- Published
- 2024
46. Towards LifeSpan Cognitive Systems
- Author
-
Wang, Yu, Han, Chi, Wu, Tongtong, He, Xiaoxin, Zhou, Wangchunshu, Sadeq, Nafis, Chen, Xiusi, He, Zexue, Wang, Wei, Haffari, Gholamreza, Ji, Heng, and McAuley, Julian
- Subjects
Computer Science - Computation and Language - Abstract
Building a human-like system that continuously interacts with complex environments -- whether simulated digital worlds or human society -- presents several key challenges. Central to this is enabling continuous, high-frequency interactions, where the interactions are termed experiences. We refer to this envisioned system as the LifeSpan Cognitive System (LSCS). A critical feature of LSCS is its ability to engage in incremental and rapid updates while retaining and accurately recalling past experiences. We identify two major challenges in achieving this: (1) Abstraction and Experience Merging, and (2) Long-term Retention with Accurate Recall. These properties are essential for storing new experiences, organizing past experiences, and responding to the environment in ways that leverage relevant historical data. Unlike language models with continual learning, which typically rely on large corpora for fine-tuning and focus on improving performance within specific domains or tasks, LSCS must rapidly and incrementally update with new information from its environment at a high frequency. Existing technologies with the potential of solving the above two major challenges can be classified into four classes based on a conceptual metric called Storage Complexity, which measures the relative space required to store past experiences. Each of these four classes of technologies has its own strengths and limitations. Given that none of the existing technologies can achieve LSCS alone, we propose a novel paradigm for LSCS that integrates all four classes of technologies. The new paradigm operates through two core processes: Absorbing Experiences and Generating Responses.
- Published
- 2024
47. Human-Robot Cooperative Distribution Coupling for Hamiltonian-Constrained Social Navigation
- Author
-
Wang, Weizheng, Yu, Chao, Wang, Yu, and Min, Byung-Cheol
- Subjects
Computer Science - Robotics - Abstract
Navigating in human-filled public spaces is a critical challenge for deploying autonomous robots in real-world environments. This paper introduces NaviDIFF, a novel Hamiltonian-constrained socially-aware navigation framework designed to address the complexities of human-robot interaction and socially-aware path planning. NaviDIFF integrates a port-Hamiltonian framework to model dynamic physical interactions and a diffusion model to manage uncertainty in human-robot cooperation. The framework leverages a spatial-temporal transformer to capture social and temporal dependencies, enabling more accurate pedestrian strategy predictions and port-Hamiltonian dynamics construction. Additionally, reinforcement learning from human feedback is employed to fine-tune robot policies, ensuring adaptation to human preferences and social norms. Extensive experiments demonstrate that NaviDIFF outperforms state-of-the-art methods in social navigation tasks, offering improved stability, efficiency, and adaptability.
- Published
- 2024
48. Efficient Entanglement Routing for Satellite-Aerial-Terrestrial Quantum Networks
- Author
-
Zhang, Yu, Gong, Yanmin, Fan, Lei, Wang, Yu, Han, Zhu, and Guo, Yuanxiong
- Subjects
Quantum Physics ,Computer Science - Networking and Internet Architecture - Abstract
In the era of 6G and beyond, space-aerial-terrestrial quantum networks (SATQNs) are shaping the future of the global-scale quantum Internet. This paper investigates the collaboration among satellite, aerial, and terrestrial quantum networks to efficiently transmit high-fidelity quantum entanglements over long distances. We begin with a comprehensive overview of existing satellite-, aerial-, and terrestrial-based quantum networks. Subsequently, we address the entanglement routing problem with the objective of maximizing quantum network throughput by jointly optimizing path selection and entanglement generation rates (PS-EGR). Given that the original problem is formulated as a mixed-integer linear programming (MILP) problem, which is inherently intractable, we propose a Benders' decomposition (BD)-based algorithm to solve the problem efficiently. Numerical results validate the effectiveness of the proposed PS-EGR scheme, offering valuable insights into various optimizable factors within the system. Finally, we discuss the current challenges and propose promising avenues for future research in SATQNs.
- Published
- 2024
49. Quantum-Assisted Joint Virtual Network Function Deployment and Maximum Flow Routing for Space Information Networks
- Author
-
Zhang, Yu, Gong, Yanmin, Fan, Lei, Wang, Yu, Han, Zhu, and Guo, Yuanxiong
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
Network function virtualization (NFV)-enabled space information network (SIN) has emerged as a promising method to facilitate global coverage and seamless service. This paper proposes a novel NFV-enabled SIN to provide end-to-end communication and computation services for ground users. Based on the multi-functional time expanded graph (MF-TEG), we jointly optimize the user association, virtual network function (VNF) deployment, and flow routing strategy (U-VNF-R) to maximize the total processed data received by users. The original problem is a mixed-integer linear program (MILP) that is intractable for classical computers. Inspired by quantum computing techniques, we propose a hybrid quantum-classical Benders' decomposition (HQCBD) algorithm. Specifically, we convert the master problem of the Benders' decomposition into the quadratic unconstrained binary optimization (QUBO) model and solve it with quantum computers. To further accelerate the optimization, we also design a multi-cut strategy based on the quantum advantages in parallel computing. Numerical results demonstrate the effectiveness and efficiency of the proposed algorithm and U-VNF-R scheme.
- Published
- 2024
- Full Text
- View/download PDF
50. Enhanced Krylov Methods for Molecular Hamiltonians: Reduced Memory Cost and Complexity Scaling via Tensor Hypercontraction
- Author
-
Wang, Yu, Luo, Maxine, and Mendl, Christian B.
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Physics - Chemical Physics ,Physics - Computational Physics ,Quantum Physics - Abstract
We present a matrix product operator (MPO) construction based on the tensor hypercontraction (THC) format for ab initio molecular Hamiltonians. Such an MPO construction dramatically lowers the memory requirement and cost scaling of Krylov subspace methods. These can find low-lying eigenstates while avoiding local minima and simulate quantum time evolution with high accuracy. In our approach, the molecular Hamiltonian is represented as a sum of products of four MPOs, each with a bond dimension of only $2$. Iteratively applying the MPOs to the current quantum state in matrix product state (MPS) form, summing and re-compressing the MPS leads to a scheme with the same asymptotic memory cost as the bare MPS and reduces the computational cost scaling compared to the Krylov method based on a conventional MPO construction. We provide a detailed theoretical derivation of these statements and conduct supporting numerical experiments.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.