70,727 results on '"zhang, Hao"'
Search Results
2. Body size effect in Eothenomys miletus of different regions from Hengduan mountain regions: Roles of growth factor contents
- Author
-
Jia, Ting, Liao, Yuqiu, Zhang, Di, Zhang, Hao, Fan, Lixian, and Zhu, Wanlong
- Published
- 2024
- Full Text
- View/download PDF
3. Two hormones: Ghrelin and leptin, based on AMPK signaling pathway, play a role in body mass control of Eothenomys miletus during fasting in Kunming and Dali regions
- Author
-
Jia, Ting, Liu, Yu-Ting, Zhang, Hao, and Zhu, Wan-Long
- Published
- 2024
- Full Text
- View/download PDF
4. Iterative Proximal-Minimization for Computing Saddle Points with Fixed Index
- Author
-
Gu, Shuting, Zhang, Hao, Zhang, Xiaoqun, and Zhou, Xiang
- Subjects
Mathematics - Optimization and Control ,Mathematics - Numerical Analysis - Abstract
Computing saddle points with a prescribed Morse index on potential energy surfaces is crucial for characterizing transition states for nosie-induced rare transition events in physics and chemistry. Many numerical algorithms for this type of saddle points are based on the eigenvector-following idea and can be cast as an iterative minimization formulation (SINUM. Vol. 53, p.1786, 2015), but they may struggle with convergence issues and require good initial guesses. To address this challenge, we discuss the differential game interpretation of this iterative minimization formulation and investigate the relationship between this game's Nash equilibrium and saddle points on the potential energy surface. Our main contribution is that adding a proximal term, which grows faster than quadratic, to the game's cost function can enhance the stability and robustness. This approach produces a robust Iterative Proximal Minimization (IPM) algorithm for saddle point computing. We show that the IPM algorithm surpasses the preceding methods in robustness without compromising the convergence rate or increasing computational expense. The algorithm's efficacy and robustness are showcased through a two-dimensional test problem, and the Allen-Cahn, Cahn-Hilliard equation, underscoring its numerical robustness., Comment: arXiv admin note: text overlap with arXiv:2212.08256
- Published
- 2025
5. Knowledge-Informed Multi-Agent Trajectory Prediction at Signalized Intersections for Infrastructure-to-Everything
- Author
-
Yin, Huilin, Xu, Yangwenhui, Li, Jiaxiang, Zhang, Hao, and Rigoll, Gerhard
- Subjects
Computer Science - Robotics ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multiagent Systems - Abstract
Multi-agent trajectory prediction at signalized intersections is crucial for developing efficient intelligent transportation systems and safe autonomous driving systems. Due to the complexity of intersection scenarios and the limitations of single-vehicle perception, the performance of vehicle-centric prediction methods has reached a plateau. Furthermore, most works underutilize critical intersection information, including traffic signals, and behavior patterns induced by road structures. Therefore, we propose a multi-agent trajectory prediction framework at signalized intersections dedicated to Infrastructure-to-Everything (I2XTraj). Our framework leverages dynamic graph attention to integrate knowledge from traffic signals and driving behaviors. A continuous signal-informed mechanism is proposed to adaptively process real-time traffic signals from infrastructure devices. Additionally, leveraging the prior knowledge of the intersection topology, we propose a driving strategy awareness mechanism to model the joint distribution of goal intentions and maneuvers. To the best of our knowledge, I2XTraj represents the first multi-agent trajectory prediction framework explicitly designed for infrastructure deployment, supplying subscribable prediction services to all vehicles at intersections. I2XTraj demonstrates state-of-the-art performance on both the Vehicle-to-Infrastructure dataset V2X-Seq and the aerial-view dataset SinD for signalized intersections. Quantitative evaluations show that our approach outperforms existing methods by more than 30% in both multi-agent and single-agent scenarios.
- Published
- 2025
6. Predicting Compact Phrasal Rewrites with Large Language Models for ASR Post Editing
- Author
-
Zhang, Hao, Stahlberg, Felix, and Kumar, Shankar
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Large Language Models (LLMs) excel at rewriting tasks such as text style transfer and grammatical error correction. While there is considerable overlap between the inputs and outputs in these tasks, the decoding cost still increases with output length, regardless of the amount of overlap. By leveraging the overlap between the input and the output, Kaneko and Okazaki (2023) proposed model-agnostic edit span representations to compress the rewrites to save computation. They reported an output length reduction rate of nearly 80% with minimal accuracy impact in four rewriting tasks. In this paper, we propose alternative edit phrase representations inspired by phrase-based statistical machine translation. We systematically compare our phrasal representations with their span representations. We apply the LLM rewriting model to the task of Automatic Speech Recognition (ASR) post editing and show that our target-phrase-only edit representation has the best efficiency-accuracy trade-off. On the LibriSpeech test set, our method closes 50-60% of the WER gap between the edit span model and the full rewrite model while losing only 10-20% of the length reduction rate of the edit span model., Comment: accepted by ICASSP 2025
- Published
- 2025
7. Sunny.jl: A Julia Package for Spin Dynamics
- Author
-
Dahlbom, David, Zhang, Hao, Miles, Cole, Quinn, Sam, Niraula, Alin, Thipe, Bhushan, Wilson, Matthew, Matin, Sakib, Mankad, Het, Hahn, Steven, Pajerowski, Daniel, Johnston, Steve, Wang, Zhentao, Lane, Harry, Li, Ying Wai, Bai, Xiaojian, Mourigal, Martin, Batista, Cristian D., and Barros, Kipton
- Subjects
Quantum Physics ,Condensed Matter - Strongly Correlated Electrons ,Physics - Computational Physics - Abstract
Sunny is a Julia package designed to serve the needs of the quantum magnetism community. It supports the specification of a very broad class of spin models and a diverse suite of numerical solvers. These include powerful methods for simulating spin dynamics both in and out of equilibrium. Uniquely, it features a broad generalization of classical and semiclassical approaches to SU(N) coherent states, which is useful for studying systems exhibiting strong spin-orbit coupling or local entanglement effects. Sunny also offers a well-developed framework for calculating the dynamical spin structure factor, enabling direct comparison with scattering experiments. Ease of use is a priority, with tools for symmetry-guided modeling and interactive visualization.
- Published
- 2025
8. Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
- Author
-
Zhao, Zibo, Lai, Zeqiang, Lin, Qingxiang, Zhao, Yunfei, Liu, Haolin, Yang, Shuhui, Feng, Yifei, Yang, Mingxin, Zhang, Sheng, Yang, Xianghui, Shi, Huiwen, Liu, Sicong, Wu, Junta, Lian, Yihang, Yang, Fan, Tang, Ruining, He, Zebin, Wang, Xinzhou, Liu, Jian, Zuo, Xuhui, Chen, Zhuo, Lei, Biwen, Weng, Haohan, Xu, Jing, Zhu, Yiling, Liu, Xinhai, Xu, Lixin, Hu, Changrong, Huang, Tianyu, Wang, Lifu, Zhang, Jihong, Chen, Meng, Dong, Liang, Jia, Yiwen, Cai, Yulin, Yu, Jiaao, Tang, Yixuan, Zhang, Hao, Ye, Zheng, He, Peng, Wu, Runzhou, Zhang, Chao, Tan, Yonghao, Xiao, Jie, Tao, Yangyu, Zhu, Jianchen, Xue, Jinbao, Liu, Kai, Zhao, Chongqing, Wu, Xinming, Hu, Zhichao, Qin, Lei, Peng, Jianbing, Li, Zhan, Chen, Minghui, Zhang, Xipeng, Niu, Lin, Wang, Paige, Wang, Yingkai, Kuang, Haozhao, Fan, Zhongyi, Zheng, Xu, Zhuang, Weihao, He, YingPing, Liu, Tian, Yang, Yong, Wang, Di, Liu, Yuhong, Jiang, Jie, Huang, Jingwei, and Guo, Chunchao
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We present Hunyuan3D 2.0, an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets. This system includes two foundation components: a large-scale shape generation model -- Hunyuan3D-DiT, and a large-scale texture synthesis model -- Hunyuan3D-Paint. The shape generative model, built on a scalable flow-based diffusion transformer, aims to create geometry that properly aligns with a given condition image, laying a solid foundation for downstream applications. The texture synthesis model, benefiting from strong geometric and diffusion priors, produces high-resolution and vibrant texture maps for either generated or hand-crafted meshes. Furthermore, we build Hunyuan3D-Studio -- a versatile, user-friendly production platform that simplifies the re-creation process of 3D assets. It allows both professional and amateur users to manipulate or even animate their meshes efficiently. We systematically evaluate our models, showing that Hunyuan3D 2.0 outperforms previous state-of-the-art models, including the open-source models and closed-source models in geometry details, condition alignment, texture quality, and etc. Hunyuan3D 2.0 is publicly released in order to fill the gaps in the open-source 3D community for large-scale foundation generative models. The code and pre-trained weights of our models are available at: https://github.com/Tencent/Hunyuan3D-2, Comment: GitHub link: https://github.com/Tencent/Hunyuan3D-2
- Published
- 2025
9. Kimi k1.5: Scaling Reinforcement Learning with LLMs
- Author
-
Kimi Team, Du, Angang, Gao, Bofei, Xing, Bowei, Jiang, Changjiu, Chen, Cheng, Li, Cheng, Xiao, Chenjun, Du, Chenzhuang, Liao, Chonghua, Tang, Chuning, Wang, Congcong, Zhang, Dehao, Yuan, Enming, Lu, Enzhe, Tang, Fengxiang, Sung, Flood, Wei, Guangda, Lai, Guokun, Guo, Haiqing, Zhu, Han, Ding, Hao, Hu, Hao, Yang, Hao, Zhang, Hao, Yao, Haotian, Zhao, Haotian, Lu, Haoyu, Li, Haoze, Yu, Haozhen, Gao, Hongcheng, Zheng, Huabin, Yuan, Huan, Chen, Jia, Guo, Jianhang, Su, Jianlin, Wang, Jianzhou, Zhao, Jie, Zhang, Jin, Liu, Jingyuan, Yan, Junjie, Wu, Junyan, Shi, Lidong, Ye, Ling, Yu, Longhui, Dong, Mengnan, Zhang, Neo, Ma, Ningchen, Pan, Qiwei, Gong, Qucheng, Liu, Shaowei, Ma, Shengling, Wei, Shupeng, Cao, Sihan, Huang, Siying, Jiang, Tao, Gao, Weihao, Xiong, Weimin, He, Weiran, Huang, Weixiao, Wu, Wenhao, He, Wenyang, Wei, Xianghui, Jia, Xianqing, Wu, Xingzhe, Xu, Xinran, Zu, Xinxing, Zhou, Xinyu, Pan, Xuehai, Charles, Y., Li, Yang, Hu, Yangyang, Liu, Yangyang, Chen, Yanru, Wang, Yejie, Liu, Yibo, Qin, Yidao, Liu, Yifeng, Yang, Ying, Bao, Yiping, Du, Yulun, Wu, Yuxin, Wang, Yuzhi, Zhou, Zaida, Wang, Zhaoji, Li, Zhaowei, Zhu, Zhen, Zhang, Zheng, Wang, Zhexu, Yang, Zhilin, Huang, Zhiqi, Huang, Zihao, Xu, Ziyao, and Yang, Zonghan
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value functions, and process reward models. Notably, our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities -- e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista -- matching OpenAI's o1. Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3.5 by a large margin (up to +550%)., Comment: 25 pages
- Published
- 2025
10. Eagle 2: Building Post-Training Data Strategies from Scratch for Frontier Vision-Language Models
- Author
-
Li, Zhiqi, Chen, Guo, Liu, Shilong, Wang, Shihao, VS, Vibashan, Ji, Yishen, Lan, Shiyi, Zhang, Hao, Zhao, Yilin, Radhakrishnan, Subhashree, Chang, Nadine, Sapra, Karan, Deshmukh, Amala Sanjay, Rintamaki, Tuomas, Le, Matthieu, Karmanov, Ilia, Voegtle, Lukas, Fischer, Philipp, Huang, De-An, Roman, Timo, Lu, Tong, Alvarez, Jose M., Catanzaro, Bryan, Kautz, Jan, Tao, Andrew, Liu, Guilin, and Yu, Zhiding
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Recently, promising progress has been made by open-source vision-language models (VLMs) in bringing their capabilities closer to those of proprietary frontier models. However, most open-source models only publish their final model weights, leaving the critical details of data strategies and implementation largely opaque. In this work, we address VLM post-training from a data-centric perspective, showing the key role of data strategy in developing frontier VLMs. By studying and building our post-training data strategy from scratch, we share detailed insights into the development processes, aiming to benefit the development of competitive models for the open-source community. Our introduced data strategy, together with training recipes and model design, leads to a family of performant VLMs named Eagle2. Specifically, Eagle2-9B achieves state-of-the-art results across various multimodal benchmarks, matching certain competitive models with up to 70B parameters.
- Published
- 2025
11. OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML
- Author
-
Zhou, Xuanhe, Zhou, Wei, Qi, Liguo, Zhang, Hao, Chen, Dihao, He, Bingsheng, Lu, Mian, Li, Guoliang, Wu, Fan, and Chen, Yuqiang
- Subjects
Computer Science - Databases ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Efficient and consistent feature computation is crucial for a wide range of online ML applications. Typically, feature computation is divided into two distinct phases, i.e., offline stage for model training and online stage for model serving. These phases often rely on execution engines with different interface languages and function implementations, causing significant inconsistencies. Moreover, many online ML features involve complex time-series computations (e.g., functions over varied-length table windows) that differ from standard streaming and analytical queries. Existing data processing systems (e.g., Spark, Flink, DuckDB) often incur multi-second latencies for these computations, making them unsuitable for real-time online ML applications that demand timely feature updates. This paper presents OpenMLDB, a feature computation system deployed in 4Paradigm's SageOne platform and over 100 real scenarios. Technically, OpenMLDB first employs a unified query plan generator for consistent computation results across the offline and online stages, significantly reducing feature deployment overhead. Second, OpenMLDB provides an online execution engine that resolves performance bottlenecks caused by long window computations (via pre-aggregation) and multi-table window unions (via data self-adjusting). It also provides a high-performance offline execution engine with window parallel optimization and time-aware data skew resolving. Third, OpenMLDB features a compact data format and stream-focused indexing to maximize memory usage and accelerate data access. Evaluations in testing and real workloads reveal significant performance improvements and resource savings compared to the baseline systems. The open community of OpenMLDB now has over 150 contributors and gained 1.6k stars on GitHub.
- Published
- 2025
12. Overcoming the surface paradox: Buried perovskite quantum dots in wide-bandgap perovskite thin films
- Author
-
Zhang, Hao, Pasha, Altaf, Metcalf, Isaac, Zhou, Jianlin, Staunstrup, Mathias, Zhu, Yunxuan, Liao, Shusen, Ssennyimba, Ken, Chen, Jia-Shiang, Reddy, Surya Prakash, Thébaud, Simon, Hou, Jin, Shuai, Xinting, Mandani, Faiz, Sidhik, Siraj, Jones, Matthew R., Ma, Xuedan, Balakrishna, R Geetha, Susarla, Sandhya, Ginger, David S., Katan, Claudine, Kanatzidis, Mercouri G., Bawendi, Moungi G., Natelson, Douglas, Tamarat, Philippe, Lounis, Brahim, Even, Jacky, and Mohite, Aditya D.
- Subjects
Physics - Optics ,Condensed Matter - Materials Science ,Physics - Medical Physics ,Quantum Physics - Abstract
Colloidal perovskite quantum dots (PQDs) are an exciting platform for on-demand quantum, and classical optoelectronic and photonic devices. However, their potential success is limited by the extreme sensitivity and low stability arising from their weak intrinsic lattice bond energy and complex surface chemistry. Here we report a novel platform of buried perovskite quantum dots (b-PQDs) in a three-dimensional perovskite thin-film, fabricated using one-step, flash annealing, which overcomes surface related instabilities in colloidal perovskite dots. The b-PQDs demonstrate ultrabright and stable single-dot emission, with resolution-limited linewidths below 130 {\mu}eV, photon-antibunching (g^2(0)=0.1), no blinking, suppressed spectral diffusion, and high photon count rates of 10^4/s, consistent with unity quantum yield. The ultrasharp linewidth resolves exciton fine-structures (dark and triplet excitons) and their dynamics under a magnetic field. Additionally, b-PQDs can be electrically driven to emit single photons with 1 meV linewidth and photon-antibunching (g^2(0)=0.4). These results pave the way for on-chip, low-cost single-photon sources for next generation quantum optical communication and sensing., Comment: 26 pages, 4 figures
- Published
- 2025
13. Anisotropy of PbTe nanowires with and without a superconductor
- Author
-
Li, Zonglin, Song, Wenyu, Zhang, Shan, Wang, Yuhao, Wang, Zhaoyu, Yu, Zehao, Li, Ruidong, Yan, Zeyu, Xu, Jiaye, Gao, Yichun, Yang, Shuai, Yang, Lining, Feng, Xiao, Wang, Tiantian, Zang, Yunyi, Li, Lin, Shang, Runan, Xue, Qi-Kun, He, Ke, and Zhang, Hao
- Subjects
Condensed Matter - Mesoscale and Nanoscale Physics - Abstract
We investigate the anisotropic behaviors in PbTe and PbTe-Pb hybrid nanowires. In previous studies on PbTe, wire-to-wire variations in anisotropy indicate poor device control, posing a serious challenge for applications. Here, we achieve reproducible anisotropy in PbTe nanowires through a substantial reduction of disorder. We then couple PbTe to a superconductor Pb, and observe a pronounced deviation in the anisotropy behavior compared to bare PbTe nanowires. This deviation is gate-tunable and attributed to spin-orbit interaction and orbital effect, controlled by charge transfer between Pb and PbTe. These results provide a guidance for the controlled engineering of exotic quantum states in this hybrid material platform.
- Published
- 2025
14. Spatiotemporal Gaussian Optimization for 4D Cone Beam CT Reconstruction from Sparse Projections
- Author
-
Fu, Yabo, Zhang, Hao, Cai, Weixing, Xie, Huiqiao, Kuo, Licheng, Cervino, Laura, Moran, Jean, Li, Xiang, and Li, Tianfang
- Subjects
Physics - Medical Physics ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
In image-guided radiotherapy (IGRT), four-dimensional cone-beam computed tomography (4D-CBCT) is critical for assessing tumor motion during a patients breathing cycle prior to beam delivery. However, generating 4D-CBCT images with sufficient quality requires significantly more projection images than a standard 3D-CBCT scan, leading to extended scanning times and increased imaging dose to the patient. To address these limitations, there is a strong demand for methods capable of reconstructing high-quality 4D-CBCT images from a 1-minute 3D-CBCT acquisition. The challenge lies in the sparse sampling of projections, which introduces severe streaking artifacts and compromises image quality. This paper introduces a novel framework leveraging spatiotemporal Gaussian representation for 4D-CBCT reconstruction from sparse projections, achieving a balance between streak artifact reduction, dynamic motion preservation, and fine detail restoration. Each Gaussian is characterized by its 3D position, covariance, rotation, and density. Two-dimensional X-ray projection images can be rendered from the Gaussian point cloud representation via X-ray rasterization. The properties of each Gaussian were optimized by minimizing the discrepancy between the measured projections and the rendered X-ray projections. A Gaussian deformation network is jointly optimized to deform these Gaussian properties to obtain a 4D Gaussian representation for dynamic CBCT scene modeling. The final 4D-CBCT images are reconstructed by voxelizing the 4D Gaussians, achieving a high-quality representation that preserves both motion dynamics and spatial detail. The code and reconstruction results can be found at https://github.com/fuyabo/4DGS_for_4DCBCT/tree/main, Comment: 11 pages, 10 figures
- Published
- 2025
15. Re-Visible Dual-Domain Self-Supervised Deep Unfolding Network for MRI Reconstruction
- Author
-
Zhang, Hao, Wang, Qi, Sun, Jian, Wen, Zhijie, Shi, Jun, and Ying, Shihui
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Magnetic Resonance Imaging (MRI) is widely used in clinical practice, but suffered from prolonged acquisition time. Although deep learning methods have been proposed to accelerate acquisition and demonstrate promising performance, they rely on high-quality fully-sampled datasets for training in a supervised manner. However, such datasets are time-consuming and expensive-to-collect, which constrains their broader applications. On the other hand, self-supervised methods offer an alternative by enabling learning from under-sampled data alone, but most existing methods rely on further partitioned under-sampled k-space data as model's input for training, resulting in a loss of valuable information. Additionally, their models have not fully incorporated image priors, leading to degraded reconstruction performance. In this paper, we propose a novel re-visible dual-domain self-supervised deep unfolding network to address these issues when only under-sampled datasets are available. Specifically, by incorporating re-visible dual-domain loss, all under-sampled k-space data are utilized during training to mitigate information loss caused by further partitioning. This design enables the model to implicitly adapt to all under-sampled k-space data as input. Additionally, we design a deep unfolding network based on Chambolle and Pock Proximal Point Algorithm (DUN-CP-PPA) to achieve end-to-end reconstruction, incorporating imaging physics and image priors to guide the reconstruction process. By employing a Spatial-Frequency Feature Extraction (SFFE) block to capture global and local feature representation, we enhance the model's efficiency to learn comprehensive image priors. Experiments conducted on the fastMRI and IXI datasets demonstrate that our method significantly outperforms state-of-the-art approaches in terms of reconstruction performance.
- Published
- 2025
16. Atomic Higgsings of 6D SCFTs II: Induced Flows
- Author
-
Bao, Jiakang and Zhang, Hao Y.
- Subjects
High Energy Physics - Theory ,Mathematical Physics ,Mathematics - Group Theory - Abstract
We study a specific type of atomic Higgsings of the 6d $\mathcal{N}=(1,0)$ theories, which we call the induced flows. For the conformal matter theory associated with a pair of nilpotent orbits, the induced flows are given by the inductions of the orbits. We also consider the induced flows for the orbi-instanton theories (as well as some little string theories) that are associated with the homomorphisms from the discrete subgroups of $\mathrm{SU}(2)$ to $E_8$. This gives a physical definition of the inductions among these discrete homomorphisms, analogous to the inductions of the nilpotent orbits. We analyze the Higgs branch dimensions, the monotonicity of the Weyl anomalies (or the 2-group structure constants for LSTs) and the brane pictures under the induced flows., Comment: Part II of a series of papers following 2409.17224, 40 pages and appendices
- Published
- 2025
17. DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
- Author
-
Song, Ziyang, Wang, Zerong, Li, Bo, Zhang, Hao, Zhu, Ruijie, Liu, Li, Jiang, Peng-Tao, and Zhang, Tianzhu
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Monocular depth estimation within the diffusion-denoising paradigm demonstrates impressive generalization ability but suffers from low inference speed. Recent methods adopt a single-step deterministic paradigm to improve inference efficiency while maintaining comparable performance. However, they overlook the gap between generative and discriminative features, leading to suboptimal results. In this work, we propose DepthMaster, a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task. First, to mitigate overfitting to texture details introduced by generative features, we propose a Feature Alignment module, which incorporates high-quality semantic features to enhance the denoising network's representation capability. Second, to address the lack of fine-grained details in the single-step deterministic framework, we propose a Fourier Enhancement module to adaptively balance low-frequency structure and high-frequency details. We adopt a two-stage training strategy to fully leverage the potential of the two modules. In the first stage, we focus on learning the global scene structure with the Feature Alignment module, while in the second stage, we exploit the Fourier Enhancement module to improve the visual quality. Through these efforts, our model achieves state-of-the-art performance in terms of generalization and detail preservation, outperforming other diffusion-based methods across various datasets. Our project page can be found at https://indu1ge.github.io/DepthMaster_page., Comment: 11 pages, 6 figures, 6 tables
- Published
- 2025
18. On Computational Complexity of 3D Ising Spin Glass: Lessons from D-Wave Annealer
- Author
-
Zhang, Hao and Kamenev, Alex
- Subjects
Condensed Matter - Disordered Systems and Neural Networks ,Quantum Physics - Abstract
Finding an exact ground state of a 3D Ising spin glass is proven to be an NP-hard problem. Given validity of the exponential time hypothesis, its computational complexity was proven to be no less than $2^{N^{2/3}}$, where $N$ is the total number of spins. Here we report results of extensive experimentation with D-Wave 3D annealer with $N\leq 5627$. We found exact ground states (in a probabilistic sense) for typical realizations of 3D spin glasses with the efficiency, which scales as $2^{N/\beta}$ with $\beta\approx 10^{3}$. Based on statistical analysis of low energy states, we argue that with an improvement of annealing protocols and device noise reduction, $\beta$ can be increased even further. This suggests that, for $N<\beta^3$, annealing devices provide a most efficient way to find the ground state., Comment: 9 pages, 6 figures
- Published
- 2025
19. Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction
- Author
-
Zhang, Ding, Li, Yangning, Bai, Lichen, Zhang, Hao, Li, Yinghui, Lin, Haiye, Zheng, Hai-Tao, Su, Xin, and Shan, Zifei
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence - Abstract
Chinese grammatical error correction (CGEC) aims to detect and correct errors in the input Chinese sentences. Recently, Pre-trained Language Models (PLMS) have been employed to improve the performance. However, current approaches ignore that correction difficulty varies across different instances and treat these samples equally, enhancing the challenge of model learning. To address this problem, we propose a multi-granularity Curriculum Learning (CL) framework. Specifically, we first calculate the correction difficulty of these samples and feed them into the model from easy to hard batch by batch. Then Instance-Level CL is employed to help the model optimize in the appropriate direction automatically by regulating the loss function. Extensive experimental results and comprehensive analyses of various datasets prove the effectiveness of our method., Comment: ICASSP 2025
- Published
- 2024
20. Efficiently Serving LLM Reasoning Programs with Certaindex
- Author
-
Fu, Yichao, Chen, Junda, Zhu, Siqi, Fu, Zheyu, Dai, Zhongdongming, Qiao, Aurick, and Zhang, Hao
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
The rapid evolution of large language models (LLMs) has unlocked their capabilities in advanced reasoning tasks like mathematical problem-solving, code generation, and legal analysis. Central to this progress are inference-time reasoning algorithms, which refine outputs by exploring multiple solution paths, at the cost of increasing compute demands and response latencies. Existing serving systems fail to adapt to the scaling behaviors of these algorithms or the varying difficulty of queries, leading to inefficient resource use and unmet latency targets. We present Dynasor, a system that optimizes inference-time compute for LLM reasoning queries. Unlike traditional engines, Dynasor tracks and schedules requests within reasoning queries and uses Certaindex, a proxy that measures statistical reasoning progress based on model certainty, to guide compute allocation dynamically. Dynasor co-adapts scheduling with reasoning progress: it allocates more compute to hard queries, reduces compute for simpler ones, and terminates unpromising queries early, balancing accuracy, latency, and cost. On diverse datasets and algorithms, Dynasor reduces compute by up to 50% in batch processing and sustaining 3.3x higher query rates or 4.7x tighter latency SLOs in online serving.
- Published
- 2024
21. Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation
- Author
-
Zhang, Hao, Wang, Hao, Huang, Xiucai, Chen, Wenrui, and Kan, Zhen
- Subjects
Computer Science - Robotics ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. However, RL based methods often suffer from low sampling efficiency in the exploration phase, especially for long-horizon manipulation tasks, and generally neglect the semantic information from the task level, resulted in a delayed convergence or even tasks failure. To tackle these challenges, we propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent's performance. Specifically, the task specifications are encoded via linear temporal logic (LTL) to improve performance and offer interpretability. And a waypoints planning module is designed with the feedback from the LTL-encoded task level as a high-level policy to improve the exploration efficiency. The middle-level policy selects which behavior primitives to execute, and the low-level policy specifies the corresponding parameters to interact with the environment. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability. Our project is available at: https://sites.google.com/view/hytl-0257/., Comment: Accepted by IROS 2024. Code:https://github.com/Charlie0257/HyTL
- Published
- 2024
- Full Text
- View/download PDF
22. LoGFiLM: Fine-Tuning A Large Language Model for Automated Generation of Log Statements
- Author
-
Zhang, Hao, Yu, Dongjun, Zhang, Lei, Rong, Guoping, Yu, Yongda, Shen, Haifeng, Zhang, He, Shao, Dong, and Kuang, Hongyu
- Subjects
Computer Science - Software Engineering - Abstract
Log statements have become an integral part of modern software systems. Prior research efforts have focused on supporting the decisions of placing log statements, such as where/what to log, while automated generation or completion of log statements has received little attention. With the increasing use of Large Language Models (LLMs) for code-related tasks such as code completion or generation, automated methods for generating or completing log statements have gained much momentum. Fine-tuning open-source LLMs like the Llama series is often preferred by enterprises over using commercial ones like the GPT series due to considerations including privacy, security, openness, performance, etc. Fine-tuning LLMs requires task-specific training data and custom-designed processing algorithms, which, however, have not been thoroughly explored for the log statement generation task. This paper fills this gap by contributing such a fine-tuning method LoGFiLM and an exemplar model by using the proposed method to fine-tune Llama-3-8B. Experiments with our own curated dataset and a public dataset show that LoGFiLM consistently outperforms the original Llama-3-8B and the commercial LLMs of GPT-3.5 and GPT-4. The results further reveal that fine-tuning Llama-3-8B with data encompassing broader contextual ranges surrounding log statements yields a better model for the automated generation of log statements.
- Published
- 2024
23. Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
- Author
-
Luo, Yucong, Qin, Qitao, Zhang, Hao, Cheng, Mingyue, Yan, Ruiran, Wang, Kefan, and Ouyang, Jie
- Subjects
Computer Science - Information Retrieval ,Computer Science - Artificial Intelligence - Abstract
Sequential recommendation (SR) systems have evolved significantly over the past decade, transitioning from traditional collaborative filtering to deep learning approaches and, more recently, to large language models (LLMs). While the adoption of LLMs has driven substantial advancements, these models inherently lack collaborative filtering information, relying primarily on textual content data neglecting other modalities and thus failing to achieve optimal recommendation performance. To address this limitation, we propose Molar, a Multimodal large language sequential recommendation framework that integrates multiple content modalities with ID information to capture collaborative signals effectively. Molar employs an MLLM to generate unified item representations from both textual and non-textual data, facilitating comprehensive multimodal modeling and enriching item embeddings. Additionally, it incorporates collaborative filtering signals through a post-alignment mechanism, which aligns user representations from content-based and ID-based models, ensuring precise personalization and robust performance. By seamlessly combining multimodal content with collaborative filtering insights, Molar captures both user interests and contextual semantics, leading to superior recommendation accuracy. Extensive experiments validate that Molar significantly outperforms traditional and LLM-based baselines, highlighting its strength in utilizing multimodal data and collaborative signals for sequential recommendation tasks. The source code is available at https://anonymous.4open.science/r/Molar-8B06/.
- Published
- 2024
24. Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity
- Author
-
Ren, Yixin, Zhang, Haocheng, Xia, Yewei, Zhang, Hao, Guan, Jihong, and Zhou, Shuigeng
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Score-based causal discovery methods can effectively identify causal relationships by evaluating candidate graphs and selecting the one with the highest score. One popular class of scores is kernel-based generalized score functions, which can adapt to a wide range of scenarios and work well in practice because they circumvent assumptions about causal mechanisms and data distributions. Despite these advantages, kernel-based generalized score functions pose serious computational challenges in time and space, with a time complexity of $\mathcal{O}(n^3)$ and a memory complexity of $\mathcal{O}(n^2)$, where $n$ is the sample size. In this paper, we propose an approximate kernel-based generalized score function with $\mathcal{O}(n)$ time and space complexities by using low-rank technique and designing a set of rules to handle the complex composite matrix operations required to calculate the score, as well as developing sampling algorithms for different data types to benefit the handling of diverse data types efficiently. Our extensive causal discovery experiments on both synthetic and real-world data demonstrate that compared to the state-of-the-art method, our method can not only significantly reduce computational costs, but also achieve comparable accuracy, especially for large datasets.
- Published
- 2024
25. SyNeg: LLM-Driven Synthetic Hard-Negatives for Dense Retrieval
- Author
-
Li, Xiaopeng, Li, Xiangyang, Zhang, Hao, Du, Zhaocheng, Jia, Pengyue, Wang, Yichao, Zhao, Xiangyu, Guo, Huifeng, and Tang, Ruiming
- Subjects
Computer Science - Information Retrieval - Abstract
The performance of Dense retrieval (DR) is significantly influenced by the quality of negative sampling. Traditional DR methods primarily depend on naive negative sampling techniques or on mining hard negatives through external retriever and meticulously crafted strategies. However, naive negative sampling often fails to adequately capture the accurate boundaries between positive and negative samples, whereas existing hard negative sampling methods are prone to false negatives, resulting in performance degradation and training instability. Recent advancements in large language models (LLMs) offer an innovative solution to these challenges by generating contextually rich and diverse negative samples. In this work, we present a framework that harnesses LLMs to synthesize high-quality hard negative samples. We first devise a \textit{multi-attribute self-reflection prompting strategy} to direct LLMs in hard negative sample generation. Then, we implement a \textit{hybrid sampling strategy} that integrates these synthetic negatives with traditionally retrieved negatives, thereby stabilizing the training process and improving retrieval performance. Extensive experiments on five benchmark datasets demonstrate the efficacy of our approach, and code is also publicly available.
- Published
- 2024
26. Unit roots of the unit root $L$-functions
- Author
-
Yang, Liping and Zhang, Hao
- Subjects
Mathematics - Number Theory ,11T23, 11S40 - Abstract
Adolphson and Sperber characterized the unique unit root of $L$-function associated with toric exponential sums in terms of the $\mathcal{A}$-hypergeometric functions. For the unit root $L$-function associated with a family of toric exponential sums, Haessig and Sperber conjectured its unit root behaves similarly to the classical case studied by Adolphson and Sperber. Under the assumption of a lower deformation hypothesis, Haessig and Sperber proved this conjecture. In this paper, we demonstrate that Haessig and Sperber's conjecture holds in general., Comment: 18pages
- Published
- 2024
27. MotionBridge: Dynamic Video Inbetweening with Flexible Controls
- Author
-
Tanveer, Maham, Zhou, Yang, Niklaus, Simon, Amiri, Ali Mahdavi, Zhang, Hao, Singh, Krishna Kumar, and Zhao, Nanxuan
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
By generating plausible and smooth transitions between two image frames, video inbetweening is an essential tool for video editing and long video synthesis. Traditional works lack the capability to generate complex large motions. While recent video generation techniques are powerful in creating high-quality results, they often lack fine control over the details of intermediate frames, which can lead to results that do not align with the creative mind. We introduce MotionBridge, a unified video inbetweening framework that allows flexible controls, including trajectory strokes, keyframes, masks, guide pixels, and text. However, learning such multi-modal controls in a unified framework is a challenging task. We thus design two generators to extract the control signal faithfully and encode feature through dual-branch embedders to resolve ambiguities. We further introduce a curriculum training strategy to smoothly learn various controls. Extensive qualitative and quantitative experiments have demonstrated that such multi-modal controls enable a more dynamic, customizable, and contextually accurate visual narrative., Comment: Project website: [https://motionbridge.github.io/]
- Published
- 2024
28. Advancing Comprehensive Aesthetic Insight with Multi-Scale Text-Guided Self-Supervised Learning
- Author
-
Liu, Yuti, Liu, Shice, Gao, Junyuan, Jiang, Pengtao, Zhang, Hao, Chen, Jinwei, and Li, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence ,Computer Science - Machine Learning - Abstract
Image Aesthetic Assessment (IAA) is a vital and intricate task that entails analyzing and assessing an image's aesthetic values, and identifying its highlights and areas for improvement. Traditional methods of IAA often concentrate on a single aesthetic task and suffer from inadequate labeled datasets, thus impairing in-depth aesthetic comprehension. Despite efforts to overcome this challenge through the application of Multi-modal Large Language Models (MLLMs), such models remain underdeveloped for IAA purposes. To address this, we propose a comprehensive aesthetic MLLM capable of nuanced aesthetic insight. Central to our approach is an innovative multi-scale text-guided self-supervised learning technique. This technique features a multi-scale feature alignment module and capitalizes on a wealth of unlabeled data in a self-supervised manner to structurally and functionally enhance aesthetic ability. The empirical evidence indicates that accompanied with extensive instruct-tuning, our model sets new state-of-the-art benchmarks across multiple tasks, including aesthetic scoring, aesthetic commenting, and personalized image aesthetic assessment. Remarkably, it also demonstrates zero-shot learning capabilities in the emerging task of aesthetic suggesting. Furthermore, for personalized image aesthetic assessment, we harness the potential of in-context learning and showcase its inherent advantages., Comment: Accepted by AAAI 2025
- Published
- 2024
29. TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
- Author
-
Hu, Lanxiang, Rosing, Tajana, and Zhang, Hao
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Specializing large language models (LLMs) for local deployment in domain-specific use cases is necessary for strong performance while meeting latency and privacy constraints. However, conventional task-specific adaptation approaches do not show simultaneous memory saving and inference speedup at deployment time. Practical compression techniques like quantization and pruning require dedicated hardware or kernel support to achieve measured inference speedup. We develop TrimLLM based on the layer-wise specialization phenomenon we empirically observed and verified on contemporary LLMs. TrimLLM reduces the depth of LLMs via progressive layer dropping. We show it retains LLMs' capacity in specific domains and achieves inference speedup irrespective of hardware and deep learning frameworks. We evaluated TrimLLM on LLMs of various sizes for inference; models adapted on medical, legal, and financial datasets all demonstrate $2.1-5.7\times$ inference speedup on consumer GPUs and up to $3.1\times$ speedup on A100 when compared to state-of-the-art model compression algorithms, with no loss in accuracy at 50$\sim$60\% model compression ratio.
- Published
- 2024
30. B2 1308+326: a changing-look blazar or not?
- Author
-
Pandey, Ashwani, Hu, Chen, Wang, Jian-Min, Czerny, Bozena, Chen, Yong-Jie, Songsheng, Yu-Yang, Wang, Yi-Lin, Zhang, Hao, and Aceituno, Jesus
- Subjects
Astrophysics - High Energy Astrophysical Phenomena ,Astrophysics - Astrophysics of Galaxies - Abstract
In our previous study, we identified a shift in the synchrotron peak frequency of the blazar B2 1308$+$326 from 10$^{12.9}$ Hz to 10$^{14.8}$ Hz during a flare, suggesting it could be a changing-look blazar (CLB). In this work, we investigate the CL behaviour of B2 1308+326 by analysing a newly acquired optical spectrum and comparing it with an archival spectrum. We find that between the two epochs, the continuum flux increased by a factor of $\sim$4.4, while the Mg II emission line flux decreased by a factor of 1.4$\pm$0.2. Additionally, the equivalent width of the Mg II line reduced from $\sim 20$ \AA \ to $\sim 3$ \AA, indicating an apparent shift from a flat-spectrum radio quasar (FSRQ) class to a BL Lacertae (BL Lac) class. Despite this apparent change, the ratio of accretion disk luminosity to Eddington luminosity remains $>$ 10$^{-2}$ during both epochs, indicating efficient accretion persists in B2 1308$+$326. The measured black hole mass remains consistent with an average $\log M_{\rm BH} = 8.44$ M$_{\odot}$. Our findings suggest that B2 1308$+$326 is not a genuine CLB, but rather an intrinsic FSRQ that emerges as a BL Lac during high-flux states due to enhanced non-thermal emission., Comment: 8 pages, 4 figures, 2 tables, accepted for publication in The Astrophysical Journal
- Published
- 2024
31. Local forms for the double $A_n$ quiver
- Author
-
Zhang, Hao
- Subjects
Mathematics - Algebraic Geometry ,Mathematics - Representation Theory - Abstract
This paper studies the noncommutative singularity theory of the double $A_n$ quiver $Q_n$ (with a single loop at each vertex), with applications to algebraic geometry and representation theory. We give various intrinsic definitions of a Type A potential on $Q_n$, then via coordinate changes we (1) prove a monomialization result that expresses these potentials in a particularly nice form, (2) prove that Type A potentials precisely correspond to crepant resolutions of cAn singularities, (3) solve the Realisation Conjecture of Brown-Wemyss in this setting. For $n \leq 3$, we furthermore give a full classification of Type A potentials (without loops) up to isomorphism, and those with finite-dimensional Jacobi algebras up to derived equivalence. There are various algebraic corollaries, including to certain tame algebras of quaternion type due to Erdmann, where we describe all basic algebras in the derived equivalence class., Comment: 50 pages, minor typos corrected
- Published
- 2024
32. Hesitation and Tolerance in Recommender Systems
- Author
-
Zou, Kuan, Sun, Aixin, Jiang, Xuemeng, Ji, Yitong, Zhang, Hao, Wang, Jing, and Guo, Ruijie
- Subjects
Computer Science - Information Retrieval - Abstract
User interactions in recommender systems are inherently complex, often involving behaviors that go beyond simple acceptance or rejection. One particularly common behavior is hesitation, where users deliberate over recommended items, signaling uncertainty. Our large-scale surveys, with 6,644 and 3,864 responses respectively, confirm that hesitation is not only widespread but also has a profound impact on user experiences. When users spend additional time engaging with content they are ultimately uninterested in, this can lead to negative emotions, a phenomenon we term as tolerance. The surveys reveal that such tolerance behaviors often arise after hesitation and can erode trust, satisfaction, and long-term loyalty to the platform. For instance, a click might reflect a need for more information rather than genuine interest, and prolonged exposure to unsuitable content amplifies frustration. This misalignment between user intent and system interpretation introduces noise into recommendation training, resulting in suggestions that increase uncertainty and disengagement. To address these issues, we identified signals indicative of tolerance behavior and analyzed datasets from both e-commerce and short-video platforms. The analysis shows a strong correlation between increased tolerance behavior and decreased user activity. We integrated these insights into the training process of a recommender system for a major short-video platform. Results from four independent online A/B experiments demonstrated significant improvements in user retention, achieved with minimal additional computational costs. These findings underscore the importance of recognizing hesitation as a ubiquitous user behavior and addressing tolerance to enhance satisfaction, build trust, and sustain long-term engagement in recommender systems., Comment: 30 pages, 6 figures, 6 tables
- Published
- 2024
33. Bilevel Learning for Dual-Quadruped Collaborative Transportation under Kinematic and Anisotropic Velocity Constraints
- Author
-
Jose, Williard Joshua and Zhang, Hao
- Subjects
Computer Science - Robotics - Abstract
Multi-robot collaborative transportation is a critical capability that has attracted significant attention over recent years. To reliably transport a kinematically constrained payload, a team of robots must closely collaborate and coordinate their individual velocities to achieve the desired payload motion. For quadruped robots, a key challenge is caused by their anisotropic velocity limits, where forward and backward movement is faster and more stable than lateral motion. In order to enable dual-quadruped collaborative transportation and address the above challenges, we propose a novel Bilevel Learning for Collaborative Transportation (BLCT) approach. In the upper-level, BLCT learns a team collaboration policy for the two quadruped robots to move the payload to the goal position, while accounting for the kinematic constraints imposed by their connection to the payload. In the lower-level, BLCT optimizes velocity controls of each individual robot to closely follow the collaboration policy while satisfying the anisotropic velocity constraints and avoiding obstacles. Experiments demonstrate that our BLCT approach well enables collaborative transportation in challenging scenarios and outperforms baseline approaches., Comment: 8 pages, 5 figures, project website: https://hcrlab.gitlab.io/project/blct
- Published
- 2024
34. Boundary anomaly detection in two-dimensional subsystem symmetry-protected topological phases
- Author
-
Ding, Ke, Zhang, Hao-Ran, Liu, Bai-Ting, and Yang, Shuo
- Subjects
Condensed Matter - Strongly Correlated Electrons ,Quantum Physics - Abstract
We develop a method to detect quantum anomalies in systems with subsystem symmetry, building on the concept of anomaly indicators. This approach allows us to distinguish different subsystem symmetry-protected topological (SSPT) phases and uncover new ones. Using numerical simulations, we demonstrate the power of this method by identifying strong and weak $Z_2^\tau\times Z_2^\sigma$ SSPT phases in a tunable tensor network state. Our analysis reveals an intrinsic $Z_2$ SSPT phase characterized by its degenerate entanglement spectrum. Furthermore, we extend the anomaly indicator to mixed-state density matrices and show that quantum anomalies of subsystem symmetry can persist under both uniform and alternating disorders. This finding establishes a connection between boundary quantum anomalies in pure and mixed states. Our work provides a comprehensive framework for detecting and constructing topological quantum phases protected by subsystem symmetries, offering new insights into these exotic quantum phases., Comment: 25 pages, 13 figures
- Published
- 2024
35. GameArena: Evaluating LLM Reasoning through Live Computer Games
- Author
-
Hu, Lanxiang, Li, Qiyu, Xie, Anze, Jiang, Nan, Stoica, Ion, Jin, Haojian, and Zhang, Hao
- Subjects
Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Evaluating the reasoning abilities of large language models (LLMs) is challenging. Existing benchmarks often depend on static datasets, which are vulnerable to data contamination and may get saturated over time, or on binary live human feedback that conflates reasoning with other abilities. As the most prominent dynamic benchmark, Chatbot Arena evaluates open-ended questions in real-world settings, but lacks the granularity in assessing specific reasoning capabilities. We introduce GameArena, a dynamic benchmark designed to evaluate LLM reasoning capabilities through interactive gameplay with humans. GameArena consists of three games designed to test specific reasoning capabilities (e.g., deductive and inductive reasoning), while keeping participants entertained and engaged. We analyze the gaming data retrospectively to uncover the underlying reasoning processes of LLMs and measure their fine-grained reasoning capabilities. We collect over 2000 game sessions and provide detailed assessments of various reasoning capabilities for five state-of-the-art LLMs. Our user study with 100 participants suggests that GameArena improves user engagement compared to Chatbot Arena. For the first time, GameArena enables the collection of step-by-step LLM reasoning data in the wild.
- Published
- 2024
36. FreeSim: Toward Free-viewpoint Camera Simulation in Driving Scenes
- Author
-
Fan, Lue, Zhang, Hao, Wang, Qitai, Li, Hongsheng, and Zhang, Zhaoxiang
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
We propose FreeSim, a camera simulation method for autonomous driving. FreeSim emphasizes high-quality rendering from viewpoints beyond the recorded ego trajectories. In such viewpoints, previous methods have unacceptable degradation because the training data of these viewpoints is unavailable. To address such data scarcity, we first propose a generative enhancement model with a matched data construction strategy. The resulting model can generate high-quality images in a viewpoint slightly deviated from the recorded trajectories, conditioned on the degraded rendering of this viewpoint. We then propose a progressive reconstruction strategy, which progressively adds generated images of unrecorded views into the reconstruction process, starting from slightly off-trajectory viewpoints and moving progressively farther away. With this progressive generation-reconstruction pipeline, FreeSim supports high-quality off-trajectory view synthesis under large deviations of more than 3 meters., Comment: Project page: https://drive-sim.github.io/freesim
- Published
- 2024
37. Large Language Models show both individual and collective creativity comparable to humans
- Author
-
Sun, Luning, Yuan, Yuzhuo, Yao, Yuan, Li, Yanyan, Zhang, Hao, Xie, Xing, Wang, Xiting, Luo, Fang, and Stillwell, David
- Subjects
Computer Science - Artificial Intelligence - Abstract
Artificial intelligence has, so far, largely automated routine tasks, but what does it mean for the future of work if Large Language Models (LLMs) show creativity comparable to humans? To measure the creativity of LLMs holistically, the current study uses 13 creative tasks spanning three domains. We benchmark the LLMs against individual humans, and also take a novel approach by comparing them to the collective creativity of groups of humans. We find that the best LLMs (Claude and GPT-4) rank in the 52nd percentile against humans, and overall LLMs excel in divergent thinking and problem solving but lag in creative writing. When questioned 10 times, an LLM's collective creativity is equivalent to 8-10 humans. When more responses are requested, two additional responses of LLMs equal one extra human. Ultimately, LLMs, when optimally applied, may compete with a small group of humans in the future of work.
- Published
- 2024
38. CRAYM: Neural Field Optimization via Camera RAY Matching
- Author
-
Lin, Liqiang, Wu, Wenpeng, Fu, Chi-Wing, Zhang, Hao, and Huang, Hui
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Graphics - Abstract
We introduce camera ray matching (CRAYM) into the joint optimization of camera poses and neural fields from multi-view images. The optimized field, referred to as a feature volume, can be "probed" by the camera rays for novel view synthesis (NVS) and 3D geometry reconstruction. One key reason for matching camera rays, instead of pixels as in prior works, is that the camera rays can be parameterized by the feature volume to carry both geometric and photometric information. Multi-view consistencies involving the camera rays and scene rendering can be naturally integrated into the joint optimization and network training, to impose physically meaningful constraints to improve the final quality of both the geometric reconstruction and photorealistic rendering. We formulate our per-ray optimization and matched ray coherence by focusing on camera rays passing through keypoints in the input images to elevate both the efficiency and accuracy of scene correspondences. Accumulated ray features along the feature volume provide a means to discount the coherence constraint amid erroneous ray matching. We demonstrate the effectiveness of CRAYM for both NVS and geometry reconstruction, over dense- or sparse-view settings, with qualitative and quantitative comparisons to state-of-the-art alternatives., Comment: Published in NeurIPS 2024
- Published
- 2024
39. Learning Adaptive Lighting via Channel-Aware Guidance
- Author
-
Yang, Qirui, Jiang, Peng-Tao, Zhang, Hao, Chen, Jinwei, Li, Bo, Yue, Huanjing, and Yang, Jingyu
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Learning lighting adaption is a key step in obtaining a good visual perception and supporting downstream vision tasks. There are multiple light-related tasks (e.g., image retouching and exposure correction) and previous studies have mainly investigated these tasks individually. However, we observe that the light-related tasks share fundamental properties: i) different color channels have different light properties, and ii) the channel differences reflected in the time and frequency domains are different. Based on the common light property guidance, we propose a Learning Adaptive Lighting Network (LALNet), a unified framework capable of processing different light-related tasks. Specifically, we introduce the color-separated features that emphasize the light difference of different color channels and combine them with the traditional color-mixed features by Light Guided Attention (LGA). The LGA utilizes color-separated features to guide color-mixed features focusing on channel differences and ensuring visual consistency across channels. We introduce dual domain channel modulation to generate color-separated features and a wavelet followed by a vision state space module to generate color-mixed features. Extensive experiments on four representative light-related tasks demonstrate that LALNet significantly outperforms state-of-the-art methods on benchmark tests and requires fewer computational resources. We provide an anonymous online demo at https://xxxxxx2025.github.io/LALNet/.
- Published
- 2024
40. CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
- Author
-
Wang, Yuelei, Zhang, Jian, Jiang, Pengtao, Zhang, Hao, Chen, Jinwei, and Li, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Despite the significant advancements made by Diffusion Transformer (DiT)-based methods in video generation, there remains a notable gap with controllable camera pose perspectives. Existing works such as OpenSora do NOT adhere precisely to anticipated trajectories and physical interactions, thereby limiting the flexibility in downstream applications. To alleviate this issue, we introduce CPA, a unified camera-pose-awareness text-to-video generation approach that elaborates the camera movement and integrates the textual, visual, and spatial conditions. Specifically, we deploy the Sparse Motion Encoding (SME) module to transform camera pose information into a spatial-temporal embedding and activate the Temporal Attention Injection (TAI) module to inject motion patches into each ST-DiT block. Our plug-in architecture accommodates the original DiT parameters, facilitating diverse types of camera poses and flexible object movement. Extensive qualitative and quantitative experiments demonstrate that our method outperforms LDM-based methods for long video generation while achieving optimal performance in trajectory consistency and object consistency.
- Published
- 2024
41. Research on the changes and predictions of the burden of type 2 diabetes mellitus in Pacific Island countries from 1990 to 2019
- Author
-
Li, Yan, Zhang, Hao, and Jiang, Yi
- Published
- 2023
42. Effects of food restriction on energy metabolism in male Apodemus chevrieri from Hengduan mountain region of China
- Author
-
Gong, Xue-na, Chen, Li-xin, Zhang, Hao, and Zhu, Wan-long
- Published
- 2020
- Full Text
- View/download PDF
43. Consensus Recommendations for Studies of Outflow Facility and Intraocular Pressure Regulation Using Ex Vivo Perfusion Approaches.
- Author
-
Acott, Ted, Fautsch, Michael, Mao, Weiming, Ethier, C, Huang, Alex, Kelley, Mary, Aga, Mini, Bhattacharya, Sanjoy, Borras, Terete, Bovenkamp, Diane, Chowdhury, Uttio, Clark, Abbot, Dibas, Mohammed, Du, Yiqin, Elliott, Michael, Faralli, Jennifer, Gong, Haiyan, Herberg, Samuel, Johnstone, Murray, Kaufman, Paul, Keller, Kate, Kelly, Ruth, Krizaj, David, Kuehn, Markus, Li, Hoi, Lieberman, Raquel, Lin, Shan, Liu, Yutao, McDonnell, Fiona, McDowell, Colleen, McLellan, Gillian, Mzyk, Philip, Nair, Kayarat, Overby, Darryl, Peters, Donna, Raghunathan, VijayKrishna, Rao, Ponugoti, Roddy, Gavin, Sharif, Najam, Shim, Myoung, Sun, Yang, Thomson, Benjamin, Toris, Carol, Willoughby, Colin, Zhang, Hao, Freddo, Thomas, Fuchshofer, Rudolf, Hill, Kamisha, Karimi, Alireza, Kizhatil, Krishnakumar, Kopcyznski, Casey, Liton, Paloma, Patel, Gaurang, Peng, Michael, Pattabiraman, Padmanabhan, Prasanna, Ganesh, Reina-Torres, Ester, Samples, E, Samples, John, Steel, Cynthia, Strohmaier, Clemens, Subramanian, Preeti, Sugali, Chenna, van Batenburg-Sherwood, Joseph, Wong, Cydney, Youngblood, Hannah, Zode, Gulab, White, Elizabeth, and Stamer, W
- Subjects
Animals ,Humans ,Anterior Eye Segment ,Aqueous Humor ,Consensus ,Glaucoma ,Intraocular Pressure ,Organ Culture Techniques ,Perfusion ,Trabecular Meshwork - Abstract
Intraocular pressure (IOP) elevation is the primary risk factor and currently the main treatable factor for progression of glaucomatous optic neuropathy. In addition to direct clinical and living animal in vivo studies, ex vivo perfusion of anterior segments and whole eyes is a key technique for studying conventional outflow function as it is responsible for IOP regulation. We present well-tested experimental details, protocols, considerations, advantages, and limitations of several ex vivo model systems for studying IOP regulation. These include: (1) perfused whole globes, (2) stationary anterior segment organ culture, (3) perfused human anterior segment organ culture, (4) perfused animal anterior segment organ culture, (5) perfused human corneal rims, and (6) perfused human anterior segment wedges. These methods, with due consideration paid to their strengths and limitations, comprise a set of very strong tools for extending our understanding of IOP regulation.
- Published
- 2024
44. Simultaneous two-dimensional velocity and distance measurements based on laser triangulation
- Author
-
Zhang, Hao and Wang, Shiji
- Subjects
Physics - Instrumentation and Detectors ,Physics - Optics - Abstract
Laser triangulation sensors are widely used in industry for surface inspection due to simple setup, micron precision and low cost. Conventional laser triangulation methods only enable axial distance measurement limiting further applications, and their lateral resolution is limited by surface microstructure. For overcoming these issues, based on the geometric optics we propose novel theoretical models and methods to achieve lateral velocity measurement. Moreover, a novel axial distance measurement method using edge detection is presented, which can increase the lateral resolution by the order of one magnitude. The performance of the proposed methods are validated through simultaneous orthogonal velocity and distance measurements on a moving established metal specimen, showing the relative error and relative uncertainty can reach 10^{-4}. The versatility of this multi degree of freedom measurement method paves the way for its broad application across all laser triangulation systems. Therefore, this simultaneous two-dimensional velocity and distance sensing approach can propel advancements in dynamic behavior discipline, including but not limited to motion mechanology and fluid mechanics.
- Published
- 2024
45. GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
- Author
-
Zhou, Pengfei, Peng, Xiaopeng, Song, Jiajun, Li, Chuanhao, Xu, Zhaopan, Yang, Yue, Guo, Ziyao, Zhang, Hao, Lin, Yuqi, He, Yefei, Zhao, Lirui, Liu, Shuo, Li, Tianhua, Xie, Yuxuan, Chang, Xiaojun, Qiao, Yu, Shao, Wenqi, and Zhang, Kaipeng
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding and generation tasks. However, generating interleaved image-text content remains a challenge, which requires integrated multimodal understanding and generation abilities. While the progress in unified models offers new solutions, existing benchmarks are insufficient for evaluating these methods due to data size and diversity limitations. To bridge this gap, we introduce GATE OpenING (OpenING), a comprehensive benchmark comprising 5,400 high-quality human-annotated instances across 56 real-world tasks. OpenING covers diverse daily scenarios such as travel guide, design, and brainstorming, offering a robust platform for challenging interleaved generation methods. In addition, we present IntJudge, a judge model for evaluating open-ended multimodal generation methods. Trained with a novel data pipeline, our IntJudge achieves an agreement rate of 82. 42% with human judgments, outperforming GPT-based evaluators by 11.34%. Extensive experiments on OpenING reveal that current interleaved generation methods still have substantial room for improvement. Key findings on interleaved image-text generation are further presented to guide the development of next-generation models. The OpenING is open-sourced at https://opening-benchmark.github.io., Comment: 53 pages, 19 figures
- Published
- 2024
46. Cross-modal Medical Image Generation Based on Pyramid Convolutional Attention Network
- Author
-
Mao, Fuyou, Lin, Lixin, Jiang, Ming, Dai, Dong, Yang, Chao, Zhang, Hao, and Tang, Yan
- Subjects
Computer Science - Computational Engineering, Finance, and Science ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
The integration of multimodal medical imaging can provide complementary and comprehensive information for the diagnosis of Alzheimer's disease (AD). However, in clinical practice, since positron emission tomography (PET) is often missing, multimodal images might be incomplete. To address this problem, we propose a method that can efficiently utilize structural magnetic resonance imaging (sMRI) image information to generate high-quality PET images. Our generation model efficiently utilizes pyramid convolution combined with channel attention mechanism to extract multi-scale local features in sMRI, and injects global correlation information into these features using self-attention mechanism to ensure the restoration of the generated PET image on local texture and global structure. Additionally, we introduce additional loss functions to guide the generation model in producing higher-quality PET images. Through experiments conducted on publicly available ADNI databases, the generated images outperform previous research methods in various performance indicators (average absolute error: 0.0194, peak signal-to-noise ratio: 29.65, structural similarity: 0.9486) and are close to real images. In promoting AD diagnosis, the generated images combined with their corresponding sMRI also showed excellent performance in AD diagnosis tasks (classification accuracy: 94.21 %), and outperformed previous research methods of the same type. The experimental results demonstrate that our method outperforms other competing methods in quantitative metrics, qualitative visualization, and evaluation criteria., Comment: 18 pages, 6 figures, Machine Vision and Applications
- Published
- 2024
47. Specifications: The missing link to making the development of LLM systems an engineering discipline
- Author
-
Stoica, Ion, Zaharia, Matei, Gonzalez, Joseph, Goldberg, Ken, Sen, Koushik, Zhang, Hao, Angelopoulos, Anastasios, Patil, Shishir G., Chen, Lingjiao, Chiang, Wei-Lin, and Davis, Jared Q.
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence ,Computer Science - Computation and Language - Abstract
Despite the significant strides made by generative AI in just a few short years, its future progress is constrained by the challenge of building modular and robust systems. This capability has been a cornerstone of past technological revolutions, which relied on combining components to create increasingly sophisticated and reliable systems. Cars, airplanes, computers, and software consist of components-such as engines, wheels, CPUs, and libraries-that can be assembled, debugged, and replaced. A key tool for building such reliable and modular systems is specification: the precise description of the expected behavior, inputs, and outputs of each component. However, the generality of LLMs and the inherent ambiguity of natural language make defining specifications for LLM-based components (e.g., agents) both a challenging and urgent problem. In this paper, we discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute-and outline several future directions for research to enable the development of modular and reliable LLM-based systems through improved specifications.
- Published
- 2024
48. Local Learning for Covariate Selection in Nonparametric Causal Effect Estimation with Latent Variables
- Author
-
Li, Zheng, Xie, Feng, Guo, Xichen, Zeng, Yan, Zhang, Hao, and Geng, Zhi
- Subjects
Computer Science - Machine Learning ,Mathematics - Statistics Theory ,Statistics - Machine Learning - Abstract
Estimating causal effects from nonexperimental data is a fundamental problem in many fields of science. A key component of this task is selecting an appropriate set of covariates for confounding adjustment to avoid bias. Most existing methods for covariate selection often assume the absence of latent variables and rely on learning the global network structure among variables. However, identifying the global structure can be unnecessary and inefficient, especially when our primary interest lies in estimating the effect of a treatment variable on an outcome variable. To address this limitation, we propose a novel local learning approach for covariate selection in nonparametric causal effect estimation, which accounts for the presence of latent variables. Our approach leverages testable independence and dependence relationships among observed variables to identify a valid adjustment set for a target causal relationship, ensuring both soundness and completeness under standard assumptions. We validate the effectiveness of our algorithm through extensive experiments on both synthetic and real-world data.
- Published
- 2024
49. DINO-X: A Unified Vision Model for Open-World Object Detection and Understanding
- Author
-
Ren, Tianhe, Chen, Yihao, Jiang, Qing, Zeng, Zhaoyang, Xiong, Yuda, Liu, Wenlong, Ma, Zhengyu, Shen, Junyi, Gao, Yuan, Jiang, Xiaoke, Chen, Xingyu, Song, Zhuheng, Zhang, Yuhong, Huang, Hongjie, Gao, Han, Liu, Shilong, Zhang, Hao, Li, Feng, Yu, Kent, and Zhang, Lei
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
In this paper, we introduce DINO-X, which is a unified object-centric vision model developed by IDEA Research with the best open-world object detection performance to date. DINO-X employs the same Transformer-based encoder-decoder architecture as Grounding DINO 1.5 to pursue an object-level representation for open-world object understanding. To make long-tailed object detection easy, DINO-X extends its input options to support text prompt, visual prompt, and customized prompt. With such flexible prompt options, we develop a universal object prompt to support prompt-free open-world detection, making it possible to detect anything in an image without requiring users to provide any prompt. To enhance the model's core grounding capability, we have constructed a large-scale dataset with over 100 million high-quality grounding samples, referred to as Grounding-100M, for advancing the model's open-vocabulary detection performance. Pre-training on such a large-scale grounding dataset leads to a foundational object-level representation, which enables DINO-X to integrate multiple perception heads to simultaneously support multiple object perception and understanding tasks, including detection, segmentation, pose estimation, object captioning, object-based QA, etc. Experimental results demonstrate the superior performance of DINO-X. Specifically, the DINO-X Pro model achieves 56.0 AP, 59.8 AP, and 52.4 AP on the COCO, LVIS-minival, and LVIS-val zero-shot object detection benchmarks, respectively. Notably, it scores 63.3 AP and 56.5 AP on the rare classes of LVIS-minival and LVIS-val benchmarks, improving the previous SOTA performance by 5.8 AP and 5.0 AP. Such a result underscores its significantly improved capacity for recognizing long-tailed objects., Comment: Technical Report
- Published
- 2024
50. An Evaluation-Driven Approach to Designing LLM Agents: Process and Architecture
- Author
-
Xia, Boming, Lu, Qinghua, Zhu, Liming, Xing, Zhenchang, Zhao, Dehai, and Zhang, Hao
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
The advent of Large Language Models (LLMs) has enabled the development of LLM agents capable of autonomously achieving under-specified goals and continuously evolving through post-deployment improvement, sometimes without requiring code or model updates. Conventional approaches, such as pre-defined test cases and code/model redevelopment pipelines, are inadequate for addressing the unique challenges of LLM agent development, particularly in terms of quality and risk control. This paper introduces an evaluation-driven design approach, inspired by test-driven development, to address these challenges. Through a multivocal literature review (MLR), we synthesize existing LLM evaluation methods and propose a novel process model and reference architecture specifically designed for LLM agents. The proposed approach integrates online and offline evaluations to support adaptive runtime adjustments and systematic offline redevelopment, improving runtime pipelines, artifacts, system architecture, and LLMs by continuously incorporating evaluation results, including fine-grained feedback from human and AI evaluators.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.