128,835 results on '"Zhang, Li"'
Search Results
2. Water Diplomacy and China's Bid for Soft Power in the Mekong
- Author
-
Zhang, Li and Zhang, Hongzhou
- Published
- 2021
3. Polymorphism of Nrampl gene and its association with diarrhea in pigs
- Author
-
Chen, Lang, Peng, Shuai, Bao-Fu, Qiang, Qian, Du, Zhang, Li, and Li-Liu, Xia
- Published
- 2021
- Full Text
- View/download PDF
4. DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization
- Author
-
Xu, Yueming, Jiang, Haochen, Xiao, Zhongyang, Feng, Jianfeng, and Zhang, Li
- Subjects
Computer Science - Robotics - Abstract
Achieving robust and precise pose estimation in dynamic scenes is a significant research challenge in Visual Simultaneous Localization and Mapping (SLAM). Recent advancements integrating Gaussian Splatting into SLAM systems have proven effective in creating high-quality renderings using explicit 3D Gaussian models, significantly improving environmental reconstruction fidelity. However, these approaches depend on a static environment assumption and face challenges in dynamic environments due to inconsistent observations of geometry and photometry. To address this problem, we propose DG-SLAM, the first robust dynamic visual SLAM system grounded in 3D Gaussians, which provides precise camera pose estimation alongside high-fidelity reconstructions. Specifically, we propose effective strategies, including motion mask generation, adaptive Gaussian point management, and a hybrid camera tracking algorithm to improve the accuracy and robustness of pose estimation. Extensive experiments demonstrate that DG-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and novel-view synthesis in dynamic scenes, outperforming existing methods meanwhile preserving real-time rendering ability.
- Published
- 2024
5. WassFFed: Wasserstein Fair Federated Learning
- Author
-
Han, Zhongxuan, Zhang, Li, Chen, Chaochao, Zheng, Xiaolin, Zheng, Fei, Li, Yuyuan, and Yin, Jianwei
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Federated Learning (FL) employs a training approach to address scenarios where users' data cannot be shared across clients. Achieving fairness in FL is imperative since training data in FL is inherently geographically distributed among diverse user groups. Existing research on fairness predominantly assumes access to the entire training data, making direct transfer to FL challenging. However, the limited existing research on fairness in FL does not effectively address two key challenges, i.e., (CH1) Current methods fail to deal with the inconsistency between fair optimization results obtained with surrogate functions and fair classification results. (CH2) Directly aggregating local fair models does not always yield a globally fair model due to non Identical and Independent data Distributions (non-IID) among clients. To address these challenges, we propose a Wasserstein Fair Federated Learning framework, namely WassFFed. To tackle CH1, we ensure that the outputs of local models, rather than the loss calculated with surrogate functions or classification results with a threshold, remain independent of various user groups. To resolve CH2, we employ a Wasserstein barycenter calculation of all local models' outputs for each user group, bringing local model outputs closer to the global output distribution to ensure consistency between the global model and local models. We conduct extensive experiments on three real-world datasets, demonstrating that WassFFed outperforms existing approaches in striking a balance between accuracy and fairness., Comment: Submitted to TKDE
- Published
- 2024
6. Detection of two TeV gamma-ray outbursts from NGC 1275 by LHAASO
- Author
-
Cao, Zhen, Aharonian, F., Axikegu, Bai, Y. X., Bao, Y. W., Bastieri, D., Bi, X. J., Bi, Y. J., Cai, J. T., Cao, Q., Cao, W. Y., Cao, Zhe, Chang, J., Chang, J. F., Chen, A. M., Chen, E. S., Chen, Liang, Chen, Lin, Chen, Long, Chen, M. J., Chen, M. L., Chen, Q. H., Chen, S. H., Chen, S. Z., Chen, T. L., Chen, Y., Cheng, N., Cheng, Y. D., Cui, M. Y., Cui, S. W., Cui, X. H., Cui, Y. D., Dai, B. Z., Dai, H. L., Dai, Z. G., Danzengluobu, della Volpe, D., Dong, X. Q., Duan, K. K., Fan, J. H., Fan, Y. Z., Fang, J., Fang, K., Feng, C. F., Feng, L., Feng, S. H., Feng, X. T., Feng, Y. L., Gabici, S., Gao, B., Gao, C. D., Gao, L. Q., Gao, Q., Gao, W., Gao, W. K., Ge, M. M., Geng, L. S., Giacinti, G., Gong, G. H., Gou, Q. B., Gu, M. H., Guo, F. L., Guo, X. L., Guo, Y. Q., Guo, Y. Y., Han, Y. A., He, H. H., He, H. N., He, J. Y., He, X. B., He, Y., Heller, M., Hor, Y. K., Hou, B. W., Hou, C., Hou, X., Hu, H. B., Hu, Q., Hu, S. C., Huang, D. H., Huang, T. Q., Huang, W. J., Huang, X. T., Huang, X. Y., Huang, Y., Huang, Z. C., Ji, X. L., Jia, H. Y., Jia, K., Jiang, K., Jiang, X. W., Jiang, Z. J., Jin, M., Kang, M. M., Ke, T., Kuleshov, D., Kurinov, K., Li, B. B., Li, Cheng, Li, Cong, Li, D., Li, F., Li, H. B., Li, H. C., Li, H. Y., Li, J., Li, Jian, Li, Jie, Li, K., Li, W. L., Li, X. R., Li, Xin, Li, Y. Z., Li, Zhe, Li, Zhuo, Liang, E. W., Liang, Y. F., Lin, S. J., Liu, B., Liu, C., Liu, D., Liu, H., Liu, H. D., Liu, J., Liu, J. L., Liu, J. Y., Liu, M. Y., Liu, R. Y., Liu, S. M., Liu, W., Liu, Y., Liu, Y. N., Lu, R., Luo, Q., Lv, H. K., Ma, B. Q., Ma, L. L., Ma, X. H., Mao, J. R., Min, Z., Mitthumsiri, W., Mu, H. J., Nan, Y. C., Neronov, A., Ou, Z. W., Pang, B. Y., Pattarakijwanich, P., Pei, Z. Y., Qi, M. Y., Qi, Y. Q., Qiao, B. Q., Qin, J. J., Ruffolo, D., Sáiz, A., Semikoz, D., Shao, C. Y., Shao, L., Shchegolev, O., Sheng, X. D., Shu, F. W., Song, H. C., Stenkin, Yu. V., Stepanov, V., Su, Y., Sun, Q. N., Sun, X. N., Sun, Z. B., Tam, P. H. T., Tang, Q. W., Tang, Z. B., Tian, W. W., Wang, C., Wang, C. B., Wang, G. W., Wang, H. G., Wang, H. H., Wang, J. C., Wang, K., Wang, L. P., Wang, L. Y., Wang, P. H., Wang, R., Wang, W., Wang, X. G., Wang, X. Y., Wang, Y., Wang, Y. D., Wang, Y. J., Wang, Z. H., Wang, Z. X., Wang, Zhen, Wang, Zheng, Wei, D. M., Wei, J. J., Wei, Y. J., Wen, T., Wu, C. Y., Wu, H. R., Wu, S., Wu, X. F., Wu, Y. S., Xi, S. Q., Xia, J., Xia, J. J., Xiang, G. M., Xiao, D. X., Xiao, G., Xin, G. G., Xin, Y. L., Xing, Y., Xiong, Z., Xu, D. L., Xu, R. F., Xu, R. X., Xu, W. L., Xue, L., Yan, D. H., Yan, J. Z., Yan, T., Yang, C. W., Yang, F., Yang, F. F., Yang, H. W., Yang, J. Y., Yang, L. L., Yang, M. J., Yang, R. Z., Yang, S. B., Yao, Y. H., Yao, Z. G., Ye, Y. M., Yin, L. Q., Yin, N., You, X. H., You, Z. Y., Yu, Y. H., Yuan, Q., Yue, H., Zeng, H. D., Zeng, T. X., Zeng, W., Zha, M., Zhang, B. B., Zhang, F., Zhang, H. M., Zhang, H. Y., Zhang, J. L., Zhang, L. X., Zhang, Li, Zhang, P. F., Zhang, P. P., Zhang, R., Zhang, S. B., Zhang, S. R., Zhang, S. S., Zhang, X., Zhang, X. P., Zhang, Y. F., Zhang, Yi, Zhang, Yong, Zhao, B., Zhao, J., Zhao, L., Zhao, L. Z., Zhao, S. P., Zheng, F., Zhou, B., Zhou, H., Zhou, J. N., Zhou, M., Zhou, P., Zhou, R., Zhou, X. X., Zhu, C. G., Zhu, F. R., Zhu, H., Zhu, K. J., and Zuo., X.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
The Water Cherenkov Detector Array (WCDA) is one of the components of Large High Altitude Air Shower Observatory (LHAASO) and can monitor any sources over two-thirds of the sky for up to 7 hours per day with >98\% duty cycle. In this work, we report the detection of two outbursts of the Fanaroff-Riley I radio galaxy NGC 1275 that were detected by LHAASO-WCDA between November 2022 and January 2023 with statistical significance of 5.2~$\sigma$ and 8.3~$\sigma$. The observed spectral energy distribution in the range from 500 GeV to 3 TeV is fitted by a power-law with a best-fit spectral index of $\alpha=-3.37\pm0.52$ and $-3.35\pm0.29$, respectively. The outburst flux above 0.5~TeV was ($4.55\pm 4.21)\times~10^{-11}~\rm cm^{-2}~s^{-1}$ and ($3.45\pm 1.78)\times~10^{-11}~\rm cm^{-2}~s^{-1}$, corresponding to 60\%, 45\% of Crab Nebula flux. Variation analysis reveals the variability time-scale of days at the TeV energy band. A simple test by one-zone synchrotron self-Compton model reproduces the data in the gamma-ray band well., Comment: 11 pages, 8 figures, 3 tables
- Published
- 2024
7. Zero-energy Quantum Many-Body Scar under Emergent Chiral Symmetry and Pseudo Hilbert Space Fragmentation
- Author
-
Zhang, Li, Ke, Yongguan, and Lee, Chaohong
- Subjects
Quantum Physics - Abstract
Hilbert space fragmentation (HSF) is a mechanism for generating quantum many-body scar (QMBS), which provides a route to weakly break ergodicity. The zero-energy QMBSs widely exist across various systems due to the intertwining of chiral symmetry and spatial inversion symmetry. In this work, we study the phenomenology of the zero-energy QMBS under the interplay between the chiral symmetry and pseudo HSF, where the Hilbert space is approximately fragmented into different blocks. We consider a model of tilted chain of interacting spinless fermions with periodically varying tunneling strength. At small tunneling strength and under resonance condition, the system is described by an effective model with chiral symmetry and pseudo HSF. We find that the interplay between the two gives rise to a highly localized zero-energy QMBS when the particle number is even. We identify a simple product state to signalize the zero-energy QMBS, which gives rise to unusual scarred dynamics. The fidelity oscillates around a fixed value without decaying instead of showing the usual collapse and revival in common scarred systems. We show that the signature of the zero-energy QMBS can also be captured by the original Hamiltonian. Our results uncover a new scar phenomenon and provide an example that does not need the intertwining of chiral and spatial symmetries to support zero-energy QMBS.
- Published
- 2024
8. Deep Learning-based Software Engineering: Progress, Challenges, and Opportunities
- Author
-
Chen, Xiangping, Hu, Xing, Huang, Yuan, Jiang, He, Ji, Weixing, Jiang, Yanjie, Jiang, Yanyan, Liu, Bo, Liu, Hui, Li, Xiaochen, Lian, Xiaoli, Meng, Guozhu, Peng, Xin, Sun, Hailong, Shi, Lin, Wang, Bo, Wang, Chong, Wang, Jiayi, Wang, Tiantian, Xuan, Jifeng, Xia, Xin, Yang, Yibiao, Yang, Yixin, Zhang, Li, Zhou, Yuming, and Zhang, Lu
- Subjects
Computer Science - Software Engineering - Abstract
Researchers have recently achieved significant advances in deep learning techniques, which in turn has substantially advanced other research disciplines, such as natural language processing, image processing, speech recognition, and software engineering. Various deep learning techniques have been successfully employed to facilitate software engineering tasks, including code generation, software refactoring, and fault localization. Many papers have also been presented in top conferences and journals, demonstrating the applications of deep learning techniques in resolving various software engineering tasks. However, although several surveys have provided overall pictures of the application of deep learning techniques in software engineering, they focus more on learning techniques, that is, what kind of deep learning techniques are employed and how deep models are trained or fine-tuned for software engineering tasks. We still lack surveys explaining the advances of subareas in software engineering driven by deep learning techniques, as well as challenges and opportunities in each subarea. To this end, in this paper, we present the first task-oriented survey on deep learning-based software engineering. It covers twelve major software engineering subareas significantly impacted by deep learning techniques. Such subareas spread out the through the whole lifecycle of software development and maintenance, including requirements engineering, software development, testing, maintenance, and developer collaboration. As we believe that deep learning may provide an opportunity to revolutionize the whole discipline of software engineering, providing one survey covering as many subareas as possible in software engineering can help future research push forward the frontier of deep learning-based software engineering more systematically., Comment: Accepted in SCIENCE CHINA Information Sciences
- Published
- 2024
- Full Text
- View/download PDF
9. Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
- Author
-
Wang, Zhi, Zhang, Li, Wu, Wenhao, Zhu, Yuanheng, Zhao, Dongbin, and Chen, Chunlin
- Subjects
Computer Science - Machine Learning - Abstract
A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scaling up transformer-based models trained on massive datasets, while reinforcement learning (RL) agents still suffer from poor generalization capacity under such paradigms. To tackle this challenge, we propose Meta Decision Transformer (Meta-DT), which leverages the sequential modeling ability of the transformer architecture and robust task representation learning via world model disentanglement to achieve efficient generalization in offline meta-RL. We pretrain a context-aware world model to learn a compact task representation, and inject it as a contextual condition to the causal transformer to guide task-oriented sequence generation. Then, we subtly utilize history trajectories generated by the meta-policy as a self-guided prompt to exploit the architectural inductive bias. We select the trajectory segment that yields the largest prediction error on the pretrained world model to construct the prompt, aiming to encode task-specific information complementary to the world model maximally. Notably, the proposed framework eliminates the requirement of any expert demonstration or domain knowledge at test time. Experimental results on MuJoCo and Meta-World benchmarks across various dataset types show that Meta-DT exhibits superior few and zero-shot generalization capacity compared to strong baselines while being more practical with fewer prerequisites. Our code is available at https://github.com/NJU-RL/Meta-DT., Comment: NeurIPS 2024. TLDR: We leverage the sequential modeling ability of the transformer architecture and robust task representation learning via world model disentanglement to achieve efficient generalization in offline meta-RL
- Published
- 2024
10. NSSI-Net: Multi-Concept Generative Adversarial Network for Non-Suicidal Self-Injury Detection Using High-Dimensional EEG Signals in a Semi-Supervised Learning Framework
- Author
-
Liang, Zhen, Ye, Weishan, Liu, Qile, Zhang, Li, Huang, Gan, and Zhou, Yongjie
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
Non-suicidal self-injury (NSSI) is a serious threat to the physical and mental health of adolescents, significantly increasing the risk of suicide and attracting widespread public concern. Electroencephalography (EEG), as an objective tool for identifying brain disorders, holds great promise. However, extracting meaningful and reliable features from high-dimensional EEG data, especially by integrating spatiotemporal brain dynamics into informative representations, remains a major challenge. In this study, we introduce an advanced semi-supervised adversarial network, NSSI-Net, to effectively model EEG features related to NSSI. NSSI-Net consists of two key modules: a spatial-temporal feature extraction module and a multi-concept discriminator. In the spatial-temporal feature extraction module, an integrated 2D convolutional neural network (2D-CNN) and a bi-directional Gated Recurrent Unit (BiGRU) are used to capture both spatial and temporal dynamics in EEG data. In the multi-concept discriminator, signal, gender, domain, and disease levels are fully explored to extract meaningful EEG features, considering individual, demographic, disease variations across a diverse population. Based on self-collected NSSI data (n=114), the model's effectiveness and reliability are demonstrated, with a 7.44% improvement in performance compared to existing machine learning and deep learning methods. This study advances the understanding and early diagnosis of NSSI in adolescents with depression, enabling timely intervention. The source code is available at https://github.com/Vesan-yws/NSSINet.
- Published
- 2024
11. Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution
- Author
-
An, Hongyu, Zhang, Xinfeng, Zhang, Li, and Xiong, Ruiqin
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Omnidirectional video (ODV) can provide an immersive experience and is widely utilized in the field of virtual reality and augmented reality. However, the restricted capturing devices and transmission bandwidth lead to the low resolution of ODVs. Video super-resolution (VSR) methods are proposed to enhance the resolution of videos, but ODV projection distortions in the application are not well addressed directly applying such methods. To achieve better super-resolution reconstruction quality, we propose a novel Spatio-Temporal Distortion Aware Network (STDAN) oriented to ODV characteristics. Specifically, a spatio-temporal distortion modulation module is introduced to improve spatial ODV projection distortions and exploit the temporal correlation according to intra and inter alignments. Next, we design a multi-frame reconstruction and fusion mechanism to refine the consistency of reconstructed ODV frames. Furthermore, we incorporate latitude-saliency adaptive maps in the loss function to concentrate on important viewpoint regions with higher texture complexity and human-watching interest. In addition, we collect a new ODV-SR dataset with various scenarios. Extensive experimental results demonstrate that the proposed STDAN achieves superior super-resolution performance on ODVs and outperforms state-of-the-art methods.
- Published
- 2024
12. Constraints on Covariant Horava-Lifshitz Gravity from precision measurement of planetary gravitomagnetic field
- Author
-
Zhang, Li-dong, Li, Li-Fang, Xu, Peng, Bian, Xing, and Luo, Ziren
- Subjects
General Relativity and Quantum Cosmology ,High Energy Physics - Phenomenology ,High Energy Physics - Theory - Abstract
As a generalization of Einstein's theory, Horava-Lifshitz has attracted significant interests due to its healthy ultraviolet behavior. In this paper, we analyze the impact of the Horava-Lifshitz corrections on the gravitomagnetic field. We propose a new planetary gravitomagnetic field measurement method with the help of the space-based laser interferometry, which is further used to constrain the Horava-Lifshitz parameters. Our analysis shows that the high-precision laser gradiometers can indeed limit the parameters in Horava-Lifshitz gravity and improve the results by one or two orders when compared with the existing theories. Our novel method provides insights into constraining the parameters in the modified gravitational theory, which facilitates a deeper understanding of this complex framework and paving the way for potential technological advancements in the field.
- Published
- 2024
13. AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model
- Author
-
Li, Yuchen, Zhang, Li, Liang, Youwei, and Xie, Pengtao
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Segment Anything Model (SAM) has gained significant recognition in the field of semantic segmentation due to its versatile capabilities and impressive performance. Despite its success, SAM faces two primary limitations: (1) it relies heavily on meticulous human-provided prompts like key points, bounding boxes or text messages, which is labor-intensive; (2) the mask decoder's feature representation is sometimes inaccurate, as it solely employs dot product operations at the end of mask decoder, which inadequately captures the necessary correlations for precise segmentation. Current solutions to these problems such as fine-tuning SAM often require retraining a large number of parameters, which needs huge amount of time and computing resources. To address these limitations, we propose an automated prompting and mask calibration method called AM-SAM based on a bi-level optimization framework. Our approach automatically generates prompts for an input image, eliminating the need for human involvement with a good performance in early training epochs, achieving faster convergence. Additionally, we freeze the main part of SAM, and modify the mask decoder with Low-Rank Adaptation (LoRA), enhancing the mask decoder's feature representation by incorporating advanced techniques that go beyond simple dot product operations to more accurately capture and utilize feature correlations. Our experimental results demonstrate that AM-SAM achieves significantly accurate segmentation, matching or exceeding the effectiveness of human-generated and default prompts. Notably, on the body segmentation dataset, our method yields a 5% higher dice score with a 4-example few-shot training set compared to the SOTA method, underscoring its superiority in semantic segmentation tasks.
- Published
- 2024
14. ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
- Author
-
Jiang, Wei, Li, Junru, Zhang, Kai, and Zhang, Li
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing - Abstract
In Learned Video Compression (LVC), improving inter prediction, such as enhancing temporal context mining and mitigating accumulated errors, is crucial for boosting rate-distortion performance. Existing LVCs mainly focus on mining the temporal movements within adjacent frames, neglecting non-local correlations among frames. Additionally, current contextual video compression models use a single reference frame, which is insufficient for handling complex movements. To address these issues, we propose leveraging non-local correlations across multiple frames to enhance temporal priors, significantly boosting rate-distortion performance. To mitigate error accumulation, we introduce a partial cascaded fine-tuning strategy that supports fine-tuning on full-length sequences with constrained computational resources. This method reduces the train-test mismatch in sequence lengths and significantly decreases accumulated errors. Based on the proposed techniques, we present a video compression scheme ECVC. Experiments demonstrate that our ECVC achieves state-of-the-art performance, reducing 7.3% and 10.5% more bit-rates than DCVC-DC and DCVC-FM over VTM-13.2 low delay B (LDB), respectively, when the intra period (IP) is 32. Additionally, ECVC reduces 11.1% more bit-rate than DCVC-FM over VTM-13.2 LDB when the IP is -1. Our Code will be available at https://github.com/JiangWeibeta/ECVC., Comment: Code will be available at https://github.com/JiangWeibeta/ECVC
- Published
- 2024
15. FastFixer: An Efficient and Effective Approach for Repairing Programming Assignments
- Author
-
Liu, Fang, Liu, Zhenwei, Zhao, Qianhui, Jiang, Jing, Zhang, Li, Li, Ge, Sun, Zian, Li, Zhongqi, and Ma, Yuchi
- Subjects
Computer Science - Computers and Society ,Computer Science - Software Engineering - Abstract
Providing personalized and timely feedback for student's programming assignments is useful for programming education. Automated program repair (APR) techniques have been used to fix the bugs in programming assignments, where the Large Language Models (LLMs) based approaches have shown promising results. Given the growing complexity of identifying and fixing bugs in advanced programming assignments, current fine-tuning strategies for APR are inadequate in guiding the LLM to identify bugs and make accurate edits during the generative repair process. Furthermore, the autoregressive decoding approach employed by the LLM could potentially impede the efficiency of the repair, thereby hindering the ability to provide timely feedback. To tackle these challenges, we propose FastFixer, an efficient and effective approach for programming assignment repair. To assist the LLM in accurately identifying and repairing bugs, we first propose a novel repair-oriented fine-tuning strategy, aiming to enhance the LLM's attention towards learning how to generate the necessary patch and its associated context. Furthermore, to speed up the patch generation, we propose an inference acceleration approach that is specifically tailored for the program repair task. The evaluation results demonstrate that FastFixer obtains an overall improvement of 20.46% in assignment fixing when compared to the state-of-the-art baseline. Considering the repair efficiency, FastFixer achieves a remarkable inference speedup of 16.67 times compared to the autoregressive decoding algorithm., Comment: Accepted by the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE 2024)
- Published
- 2024
16. Observation of Higgs and Goldstone modes in U(1) symmetry-broken Rydberg atomic systems
- Author
-
Liu, Bang, Zhang, Li-Hua, Wang, Ya-Jun, Zhang, Jun, Wang, Qi-Feng, Ma, Yu, Han, Tian-Yu, Zhang, Zheng-Yuan, Shao, Shi-Yao, Li, Qing, Chen, Han-Chao, Nan, Jia-Dou, Zhu, Dong-Yang, Yin, Yi-Ming, Shi, Bao-Sen, and Ding, Dong-Sheng
- Subjects
Condensed Matter - Quantum Gases ,Quantum Physics - Abstract
Higgs and Goldstone modes manifest as fluctuations in the order parameter of system, offering insights into its phase transitions and symmetry properties. Exploring the dynamics of these collective excitations in a Rydberg atoms system advances various branches of condensed matter, particle physics, and cosmology. Here, we report an experimental signature of Higgs and Goldstone modes in a U(1) symmetry-broken Rydberg atomic gases. By constructing two probe fields to excite atoms, we observe the distinct phase and amplitude fluctuations of Rydberg atoms collective excitations under the particle-hole symmetry. Due to the van der Waals interactions between the Rydberg atoms, we detect a symmetric variance spectrum divided by the divergent regime and phase boundary, capturing the full dynamics of the additional Higgs and Goldstone modes. Studying the Higgs and Goldstone modes in Rydberg atoms allows us to explore fundamental aspects of quantum phase transitions and symmetry breaking phenomena, while leveraging the unique properties of these highly interacting systems to uncover new physics and potential applications in quantum simulation.
- Published
- 2024
17. Motion Forecasting in Continuous Driving
- Author
-
Song, Nan, Zhang, Bozhou, Zhu, Xiatian, and Zhang, Li
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Motion forecasting for agents in autonomous driving is highly challenging due to the numerous possibilities for each agent's next action and their complex interactions in space and time. In real applications, motion forecasting takes place repeatedly and continuously as the self-driving car moves. However, existing forecasting methods typically process each driving scene within a certain range independently, totally ignoring the situational and contextual relationships between successive driving scenes. This significantly simplifies the forecasting task, making the solutions suboptimal and inefficient to use in practice. To address this fundamental limitation, we propose a novel motion forecasting framework for continuous driving, named RealMotion. It comprises two integral streams both at the scene level: (1) The scene context stream progressively accumulates historical scene information until the present moment, capturing temporal interactive relationships among scene elements. (2) The agent trajectory stream optimizes current forecasting by sequentially relaying past predictions. Besides, a data reorganization strategy is introduced to narrow the gap between existing benchmarks and real-world applications, consistent with our network. These approaches enable exploiting more broadly the situational and progressive insights of dynamic motion across space and time. Extensive experiments on Argoverse series with different settings demonstrate that our RealMotion achieves state-of-the-art performance, along with the advantage of efficient real-world inference. The source code will be available at https://github.com/fudan-zvg/RealMotion., Comment: Accepted at NeurIPS 2024 Spotlight
- Published
- 2024
18. DeMo: Decoupling Motion Forecasting into Directional Intentions and Dynamic States
- Author
-
Zhang, Bozhou, Song, Nan, and Zhang, Li
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics - Abstract
Accurate motion forecasting for traffic agents is crucial for ensuring the safety and efficiency of autonomous driving systems in dynamically changing environments. Mainstream methods adopt a one-query-one-trajectory paradigm, where each query corresponds to a unique trajectory for predicting multi-modal trajectories. While straightforward and effective, the absence of detailed representation of future trajectories may yield suboptimal outcomes, given that the agent states dynamically evolve over time. To address this problem, we introduce DeMo, a framework that decouples multi-modal trajectory queries into two types: mode queries capturing distinct directional intentions and state queries tracking the agent's dynamic states over time. By leveraging this format, we separately optimize the multi-modality and dynamic evolutionary properties of trajectories. Subsequently, the mode and state queries are integrated to obtain a comprehensive and detailed representation of the trajectories. To achieve these operations, we additionally introduce combined Attention and Mamba techniques for global information aggregation and state sequence modeling, leveraging their respective strengths. Extensive experiments on both the Argoverse 2 and nuScenes benchmarks demonstrate that our DeMo achieves state-of-the-art performance in motion forecasting., Comment: NeurIPS 2024
- Published
- 2024
19. LHAASO detection of very-high-energy gamma-ray emission surrounding PSR J0248+6021
- Author
-
Cao, Zhen, Aharonian, F., An, Q., Axikegu, Bai, Y. X., Bao, Y. W., Bastieri, D., Bi, X. J., Bi, Y. J., Cai, J. T., Cao, Q., Cao, W. Y., Cao, Zhe, Chang, J., Chang, J. F., Chen, A. M., Chen, E. S., Chen, Liang, Chen, Lin, Chen, Long, Chen, M. J., Chen, M. L., Chen, Q. H., Chen, S. H., Chen, S. Z., Chen, T. L., Chen, Y., Cheng, N., Cheng, Y. D., Cui, M. Y., Cui, S. W., Cui, X. H., Cui, Y. D., Dai, B. Z., Dai, H. L., Dai, Z. G., Danzengluobu, Dong, X. Q., Duan, K. K., Fan, J. H., Fan, Y. Z., Fang, J., Fang, K., Feng, C. F., Feng, L., Feng, S. H., Feng, X. T., Feng, Y. L., Gabici, S., Gao, B., Gao, C. D., Gao, L. Q., Gao, Q., Gao, W., Gao, W. K., Ge, M. M., Geng, L. S., Giacinti, G., Gong, G. H., Gou, Q. B., Gu, M. H., Guo, F. L., Guo, X. L., Guo, Y. Q., Guo, Y. Y., Han, Y. A., He, H. H., He, H. N., He, J. Y., He, X. B., He, Y., Hor, Y. K., Hou, B. W., Hou, C., Hou, X., Hu, H. B., Hu, Q., Hu, S. C., Huang, D. H., Huang, T. Q., Huang, W. J., Huang, X. T., Huang, X. Y., Huang, Y., Huang, Z. C., Ji, X. L., Jia, H. Y., Jia, K., Jiang, K., Jiang, X. W., Jiang, Z. J., Jin, M., Kang, M. M., Ke, T., Kuleshov, D., Kurinov, K., Li, B. B., Li, Cheng, Li, Cong, Li, D., Li, F., Li, H. B., Li, H. C., Li, H. Y., Li, J., Li, Jian, Li, Jie, Li, K., Li, W. L., Li, X. R., Li, Xin, Li, Y. Z., Li, Zhe, Li, Zhuo, Liang, E. W., Liang, Y. F., Lin, J., Liu, B., Liu, C., Liu, D., Liu, H., Liu, H. D., Liu, J., Liu, J. L., Liu, J. Y., Liu, M. Y., Liu, R. Y., Liu, S. M., Liu, W., Liu, Y., Liu, Y. N., Lu, R., Luo, Q., Lv, H. K., Ma, B. Q., Ma, L. L., Ma, X. H., Mao, J. R., Min, Z., Mitthumsiri, W., Mu, H. J., Nan, Y. C., Neronov, A., Ou, Z. W., Pang, B. Y., Pattarakijwanich, P., Pei, Z. Y., Qi, M. Y., Qi, Y. Q., Qiao, B. Q., Qin, J. J., Ruffolo, D., Sáiz, A., Semikoz, D., Shao, C. Y., Shao, L., Shchegolev, O., Sheng, X. D., Shu, F. W., Song, H. C., Stenkin, Yu. V., Stepanov, V., Su, Y., Sun, Q. N., Sun, X. N., Sun, Z. B., Tam, P. H. T., Tang, Q. W., Tang, Z. B., Tian, W. W., Wang, C., Wang, C. B., Wang, G. W., Wang, H. G., Wang, H. H., Wang, J. C., Wang, K., Wang, L. P., Wang, L. Y., Wang, P. H., Wang, R., Wang, W., Wang, X. G., Wang, X. Y., Wang, Y., Wang, Y. D., Wang, Y. J., Wang, Z. H., Wang, Z. X., Wang, Zhen, Wang, Zheng, Wei, D. M., Wei, J. J., Wei, Y. J., Wen, T., Wu, C. Y., Wu, H. R., Wu, S., Wu, X. F., Wu, Y. S., Xi, S. Q., Xia, J., Xia, J. J., Xiang, G. M., Xiao, D. X., Xiao, G., Xin, G. G., Xin, Y. L., Xing, Y., Xiong, Z., Xu, D. L., Xu, R. F., Xu, R. X., Xu, W. L., Xue, L., Yan, D. H., Yan, J. Z., Yan, T., Yang, C. W., Yang, F., Yang, F. F., Yang, H. W., Yang, J. Y., Yang, L. L., Yang, M. J., Yang, R. Z., Yang, S. B., Yao, Y. H., Yao, Z. G., Ye, Y. M., Yin, L. Q., Yin, N., You, X. H., You, Z. Y., Yu, Y. H., Yuan, Q., Yue, H., Zeng, H. D., Zeng, T. X., Zeng, W., Zha, M., Zhang, B. B., Zhang, F., Zhang, H. M., Zhang, H. Y., Zhang, J. L., Zhang, L. X., Zhang, Li, Zhang, P. F., Zhang, P. P., Zhang, R., Zhang, S. B., Zhang, S. R., Zhang, S. S., Zhang, X., Zhang, X. P., Zhang, Y. F., Zhang, Yi, Zhang, Yong, Zhao, B., Zhao, J., Zhao, L., Zhao, L. Z., Zhao, S. P., Zheng, F., Zheng, J. H., Zhou, B., Zhou, H., Zhou, J. N., Zhou, M., Zhou, P., Zhou, R., Zhou, X. X., Zhu, C. G., Zhu, F. R., Zhu, H., Zhu, K. J., Zou, Y. C., and Zuo, X.
- Subjects
Astrophysics - High Energy Astrophysical Phenomena - Abstract
We report the detection of an extended very-high-energy (VHE) gamma-ray source coincident with the locations of middle-aged (62.4~\rm kyr) pulsar PSR J0248+6021, by using the LHAASO-WCDA data of live 796 days and LHAASO-KM2A data of live 1216 days. A significant excess of \gray induced showers is observed both by WCDA in energy bands of 1-25~\rm TeV and KM2A in energy bands of $>$ 25~\rm TeV with 7.3 $\sigma$ and 13.5 $\sigma$, respectively. The best-fit position derived through WCDA data is R.A. = 42.06$^\circ \pm$ 0.12$^\circ$ and Dec. = 60.24$^\circ \pm $ 0.13$^\circ$ with an extension of 0.69$^\circ\pm$0.15$^\circ$ and that of the KM2A data is R.A.= 42.29$^\circ \pm $ 0.13$^\circ$ and Dec. = 60.38$^\circ \pm$ 0.07$^\circ$ with an extension of 0.37$^\circ\pm$0.07$^\circ$. No clear extended multiwavelength counterpart of this LHAASO source has been found from the radio band to the GeV band. The most plausible explanation of the VHE \gray emission is the inverse Compton process of highly relativistic electrons and positrons injected by the pulsar. These electrons/positrons are hypothesized to be either confined within the pulsar wind nebula or to have already escaped into the interstellar medium, forming a pulsar halo., Comment: 12 pages, 10 figures, Accepted by Sci. China-Phys. Mech. Astron
- Published
- 2024
20. Metadata Matters for Time Series: Informative Forecasting with Transformers
- Author
-
Dong, Jiaxiang, Wu, Haixu, Wang, Yuxuan, Zhang, Li, Wang, Jianmin, and Long, Mingsheng
- Subjects
Computer Science - Machine Learning ,Computer Science - Computation and Language - Abstract
Time series forecasting is prevalent in extensive real-world applications, such as financial analysis and energy planning. Previous studies primarily focus on time series modality, endeavoring to capture the intricate variations and dependencies inherent in time series. Beyond numerical time series data, we notice that metadata (e.g.~dataset and variate descriptions) also carries valuable information essential for forecasting, which can be used to identify the application scenario and provide more interpretable knowledge than digit sequences. Inspired by this observation, we propose a Metadata-informed Time Series Transformer (MetaTST), which incorporates multiple levels of context-specific metadata into Transformer forecasting models to enable informative time series forecasting. To tackle the unstructured nature of metadata, MetaTST formalizes them into natural languages by pre-designed templates and leverages large language models (LLMs) to encode these texts into metadata tokens as a supplement to classic series tokens, resulting in an informative embedding. Further, a Transformer encoder is employed to communicate series and metadata tokens, which can extend series representations by metadata information for more accurate forecasting. This design also allows the model to adaptively learn context-specific patterns across various scenarios, which is particularly effective in handling large-scale, diverse-scenario forecasting tasks. Experimentally, MetaTST achieves state-of-the-art compared to advanced time series models and LLM-based methods on widely acknowledged short- and long-term forecasting benchmarks, covering both single-dataset individual and multi-dataset joint training settings.
- Published
- 2024
21. Releasing the Parameter Latency of Neural Representation for High-Efficiency Video Compression
- Author
-
Zhang, Gai, Zhang, Xinfeng, Tang, Lv, Li, Yue, Zhang, Kai, and Zhang, Li
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Multimedia - Abstract
For decades, video compression technology has been a prominent research area. Traditional hybrid video compression framework and end-to-end frameworks continue to explore various intra- and inter-frame reference and prediction strategies based on discrete transforms and deep learning techniques. However, the emerging implicit neural representation (INR) technique models entire videos as basic units, automatically capturing intra-frame and inter-frame correlations and obtaining promising performance. INR uses a compact neural network to store video information in network parameters, effectively eliminating spatial and temporal redundancy in the original video. However, in this paper, our exploration and verification reveal that current INR video compression methods do not fully exploit their potential to preserve information. We investigate the potential of enhancing network parameter storage through parameter reuse. By deepening the network, we designed a feasible INR parameter reuse scheme to further improve compression performance. Extensive experimental results show that our method significantly enhances the rate-distortion performance of INR video compression.
- Published
- 2024
22. Sensing-assisted Near-field Energy Beam Focusing with ELAA Over Non-stationary Channels
- Author
-
Zhang, Li, Ren, Zixiang, Fang, Yuan, Qiu, Ling, and Xu, Jie
- Subjects
Computer Science - Information Theory ,Computer Science - Emerging Technologies - Abstract
This paper studies a novel training-free energy beam focusing approach for a near-field wireless power transfer (WPT) system with extremely large-scale antenna array (ELAA). In particular, we focus on the setup with one access point (AP) equipped with an extremely large-scale uniform planar array (UPA) serving multiple single-antenna energy receivers (ERs), in which the line-of-sight (LoS) dominated wireless channels are dependent on the relative positions of ERs and exhibit spatial non-stationarity. Different from conventional designs relying on training and feedback, we present a novel energy beam focusing design assisted by wireless radar sensing based on a two-stage transmission protocol. In the first stage, the AP performs wireless radar sensing to identify the ERs' visibility regions (VRs) and estimate their three-dimension (3D) positions for constructing the corresponding channel state information (CSI). In the second stage, the AP implements the transmit energy beam focusing based on the constructed CSI to efficiently charge these ERs. Under this setup, we first minimize the sensing duration in the first stage, while guaranteeing a specific accuracy threshold for position estimation. Next, we optimize the energy beamformers at the AP in the second stage to maximize the weighted harvested energy among all ERs subject to the maximum transmit power constraint. In this approach, the time resource allocation between the two stages is properly designed to optimize the ultimate energy transfer performance. Numerical results show that the proposed design performs close to the performance upper bound with perfect VR and CSI and significantly outperforms other benchmark schemes., Comment: 6 pages, 5 figures, received by Globecom Workshop
- Published
- 2024
23. VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
- Author
-
Liu, Yifei, Wen, Jicheng, Wang, Yang, Ye, Shengyu, Zhang, Li Lyna, Cao, Ting, Li, Cheng, and Yang, Mao
- Subjects
Computer Science - Artificial Intelligence - Abstract
Scaling model size significantly challenges the deployment and inference of Large Language Models (LLMs). Due to the redundancy in LLM weights, recent research has focused on pushing weight-only quantization to extremely low-bit (even down to 2 bits). It reduces memory requirements, optimizes storage costs, and decreases memory bandwidth needs during inference. However, due to numerical representation limitations, traditional scalar-based weight quantization struggles to achieve such extreme low-bit. Recent research on Vector Quantization (VQ) for LLMs has demonstrated the potential for extremely low-bit model quantization by compressing vectors into indices using lookup tables. In this paper, we introduce Vector Post-Training Quantization (VPTQ) for extremely low-bit quantization of LLMs. We use Second-Order Optimization to formulate the LLM VQ problem and guide our quantization algorithm design by solving the optimization. We further refine the weights using Channel-Independent Second-Order Optimization for a granular VQ. In addition, by decomposing the optimization problem, we propose a brief and effective codebook initialization algorithm. We also extend VPTQ to support residual and outlier quantization, which enhances model accuracy and further compresses the model. Our experimental results show that VPTQ reduces model quantization perplexity by $0.01$-$0.34$ on LLaMA-2, $0.38$-$0.68$ on Mistral-7B, $4.41$-$7.34$ on LLaMA-3 over SOTA at 2-bit, with an average accuracy improvement of $0.79$-$1.5\%$ on LLaMA-2, $1\%$ on Mistral-7B, $11$-$22\%$ on LLaMA-3 on QA tasks on average. We only utilize $10.4$-$18.6\%$ of the quantization algorithm execution time, resulting in a $1.6$-$1.8\times$ increase in inference throughput compared to SOTA., Comment: EMNLP 2024, Main, Poster
- Published
- 2024
24. Lessons Learned from a Unifying Empirical Study of Parameter-Efficient Transfer Learning (PETL) in Visual Recognition
- Author
-
Mai, Zheda, Zhang, Ping, Tu, Cheng-Hao, Chen, Hong-You, Zhang, Li, and Chao, Wei-Lun
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Parameter-efficient transfer learning (PETL) has attracted significant attention lately, due to the increasing size of pre-trained models and the need to fine-tune (FT) them for superior downstream performance. This community-wide enthusiasm has sparked a plethora of approaches. Nevertheless, a systematic study to understand their performance and suitable application scenarios is lacking, leaving questions like when to apply PETL and which approach to use largely unanswered. In this paper, we conduct a unifying empirical study of representative PETL methods in the context of Vision Transformers. We systematically tune their hyper-parameters to fairly compare their accuracy on downstream tasks. Our study not only offers a valuable user guide but also unveils several new insights. First, if tuned carefully, different PETL methods can obtain similar accuracy in the low-shot benchmark VTAB-1K. This includes simple methods like FT the bias terms that were reported inferior. Second, though with similar accuracy, we find that PETL methods make different mistakes and high-confidence predictions, likely due to their different inductive biases. Such an inconsistency (or complementariness) opens up the opportunity for ensemble methods, and we make preliminary attempts at this. Third, going beyond the commonly used low-shot tasks, we find that PETL is also useful in many-shot regimes -- it achieves comparable and sometimes better accuracy than full FT, using much fewer learnable parameters. Last but not least, we investigate PETL's ability to preserve a pre-trained model's robustness to distribution shifts (e.g., a CLIP backbone). Perhaps not surprisingly, PETL methods outperform full FT alone. However, with weight-space ensembles, the fully fine-tuned model can better balance target (i.e., downstream) distribution and distribution shift performance, suggesting a future research direction for PETL., Comment: Code is available at https://github.com/OSU-MLB/PETL_Vision
- Published
- 2024
25. MobileViews: A Large-Scale Mobile GUI Dataset
- Author
-
Gao, Longxi, Zhang, Li, Wang, Shihe, Wang, Shangguang, Li, Yuanchun, and Xu, Mengwei
- Subjects
Computer Science - Human-Computer Interaction - Abstract
Mobile screen assistants help smartphone users by interpreting mobile screens and responding to user requests. The excessive private information on mobile screens necessitates small, on-device models to power these assistants. However, there is a lack of a comprehensive and large-scale mobile screen dataset with high diversity to train and enhance these models. To efficiently construct such a dataset, we utilize an LLM-enhanced automatic app traversal tool to minimize human intervention. We then employ two SoC clusters to provide high-fidelity mobile environments, including more than 200 Android instances to parallelize app interactions. By utilizing the system to collect mobile screens over 81,600 device-hours, we introduce MobileViews, the largest mobile screen dataset, which includes over 600K screenshot-view hierarchy pairs from more than 20K modern Android apps. We demonstrate the effectiveness of MobileViews by training SOTA multimodal LLMs that power mobile screen assistants on it and the Rico dataset, which was introduced seven years ago. Evaluation results on mobile screen tasks show that the scale and quality of mobile screens in MobileViews demonstrate significant advantages over Rico in augmenting mobile screen assistants., Comment: Dataset: https://huggingface.co/datasets/mllmTeam/MobileViews
- Published
- 2024
26. RRD-Bio: Building An Integrated Research Resource Database for Biomedicine
- Author
-
Zhang, Li, Sun, Mengting, Jiang, Chong, and Chen, Haihua
- Subjects
Computer Science - Digital Libraries - Abstract
Research resources (RRs) such as data, software, and tools are essential pillars of scientific research. The field of biomedicine, a critical scientific discipline, is witnessing a surge in research publications resulting in the accumulation of a substantial number of RRs. However, these resources are dispersed among various biomedical articles and can be challenging to locate and reuse due to their transient nature. In this paper, we report our recent progress in biomedical data curation - building a large research resource database for biomedicine (RRD-Bio), based on a collection of 40 million papers from two large biomedical literature databases, PubMed and PubMed Central. The database contains 2,555,116 RRs, each identified by a location on the Internet (URL) and descriptive information (Context). We made the RRD-Bio database publicly available (\url{https://zenodo.org/records/10526493}) to enhance the visibility of biomedical research resources, the ability to preserve important resources and the reproducibility of biomedical research.
- Published
- 2024
27. Thermodynamic topological classes of the rotating, accelerating black holes
- Author
-
Liu, Wentao, Zhang, Li, Wu, Di, and Wang, Jieci
- Subjects
High Energy Physics - Theory ,General Relativity and Quantum Cosmology - Abstract
In this paper, we extend our previous work [D. Wu, Phys. Rev. D 108, 084041 (2023)] to more general cases by including a rotation parameter. We investigate the topological numbers for the rotating, accelerating neutral black hole and its AdS extension, as well as the rotating, accelerating charged black hole and its AdS extension. We find that the topological number of an asymptotically flat accelerating black hole consistently differs by one from that of its non-accelerating counterpart. Furthermore, we show that for an asymptotically AdS accelerating black hole, the topological number is reduced by one compared to its non-accelerating AdS counterpart. In addition, we demonstrate that within the framework of general relativity, the acceleration parameter and the negative cosmological constant each independently add one to the topological number. However, when both factors are present, their effects neutralize each other, resulting in no overall change to the topological number., Comment: 28 pages, 9 figures, 1 table, JHEP3.cls
- Published
- 2024
28. Dynamical topological phase transition in cold Rydberg quantum gases
- Author
-
Zhang, Jun, Wang, Ya-Jun, Liu, Bang, Zhang, Li-Hua, Zhang, Zheng-Yuan, Shao, Shi-Yao, Li, Qing, Chen, Han-Chao, Ma, Yu, Han, Tian-Yu, Wang, Qi-Feng, Nan, Jia-Dou, Yin, Yi-Ming, Zhu, Dong-Yang, Shi, Bao-Sen, and Ding, Dong-Sheng
- Subjects
Condensed Matter - Quantum Gases ,Quantum Physics - Abstract
Study of phase transitions provide insights into how a many-body system behaves under different conditions, enabling us to understand the symmetry breaking, critical phenomena, and topological properties. Strong long-range interactions in highly excited Rydberg atoms create a versatile platform for exploring exotic emergent topological phases. Here, we report the experimental observation of dynamical topological phase transitions in cold Rydberg atomic gases under a microwave field driving. By measuring the system transmission curves while varying the probe intensity, we observe complex hysteresis trajectories characterized by distinct winding numbers as they cross the critical point. At the transition state, where the winding number flips, the topology of these hysteresis trajectories evolves into more non-trivial structures. The topological trajectories are shown to be robust against noise, confirming their rigidity in dynamic conditions. These findings contribute to the insights of emergence of complex dynamical topological phases in many-body systems.
- Published
- 2024
29. COSCO: A Sharpness-Aware Training Framework for Few-shot Multivariate Time Series Classification
- Author
-
Barreda, Jesus, Gomez, Ashley, Puga, Ruben, Zhou, Kaixiong, and Zhang, Li
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Neural and Evolutionary Computing - Abstract
Multivariate time series classification is an important task with widespread domains of applications. Recently, deep neural networks (DNN) have achieved state-of-the-art performance in time series classification. However, they often require large expert-labeled training datasets which can be infeasible in practice. In few-shot settings, i.e. only a limited number of samples per class are available in training data, DNNs show a significant drop in testing accuracy and poor generalization ability. In this paper, we propose to address these problems from an optimization and a loss function perspective. Specifically, we propose a new learning framework named COSCO consisting of a sharpness-aware minimization (SAM) optimization and a Prototypical loss function to improve the generalization ability of DNN for multivariate time series classification problems under few-shot setting. Our experiments demonstrate our proposed method outperforms the existing baseline methods. Our source code is available at: https://github.com/JRB9/COSCO., Comment: 5 pages, 5 figures, CIKM '24 Short Paper Track
- Published
- 2024
- Full Text
- View/download PDF
30. On Admissibility in Bipartite Incidence Graph Sampling
- Author
-
García-Segador, Pedro and Zhang, Li-Chun
- Subjects
Mathematics - Statistics Theory ,62D05 - Abstract
In bipartite incidence graph sampling, the target study units may be formed as connected population elements, which are distinct to the units of sampling and there may exist generally more than one way by which a given study unit can be observed via sampling units. This generalizes finite-population element or multistage sampling, where each element can only be sampled directly or via a single primary sampling unit. We study the admissibility of estimators in bipartite incidence graph sampling and identify other admissible estimators than the classic Horvitz-Thompson estimator. Our admissibility results encompass those for finite-population sampling., Comment: 20 pages, 7 figures
- Published
- 2024
31. Awaking the Slides: A Tuning-free and Knowledge-regulated AI Tutoring System via Language Model Coordination
- Author
-
Zhang-Li, Daniel, Zhang, Zheyuan, Yu, Jifan, Yin, Joy Lim Jia, Tu, Shangqing, Gong, Linlu, Wang, Haohua, Liu, Zhiyuan, Liu, Huiqin, Hou, Lei, and Li, Juanzi
- Subjects
Computer Science - Computation and Language ,Computer Science - Artificial Intelligence ,Computer Science - Human-Computer Interaction - Abstract
The vast pre-existing slides serve as rich and important materials to carry lecture knowledge. However, effectively leveraging lecture slides to serve students is difficult due to the multi-modal nature of slide content and the heterogeneous teaching actions. We study the problem of discovering effective designs that convert a slide into an interactive lecture. We develop Slide2Lecture, a tuning-free and knowledge-regulated intelligent tutoring system that can (1) effectively convert an input lecture slide into a structured teaching agenda consisting of a set of heterogeneous teaching actions; (2) create and manage an interactive lecture that generates responsive interactions catering to student learning demands while regulating the interactions to follow teaching actions. Slide2Lecture contains a complete pipeline for learners to obtain an interactive classroom experience to learn the slide. For teachers and developers, Slide2Lecture enables customization to cater to personalized demands. The evaluation rated by annotators and students shows that Slide2Lecture is effective in outperforming the remaining implementation. Slide2Lecture's online deployment has made more than 200K interaction with students in the 3K lecture sessions. We open source Slide2Lecture's implementation in https://anonymous.4open.science/r/slide2lecture-4210/.
- Published
- 2024
32. Multi-scale Feature Fusion with Point Pyramid for 3D Object Detection
- Author
-
Lu, Weihao, Zhao, Dezong, Premebida, Cristiano, Zhang, Li, Zhao, Wenjing, and Tian, Daxin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Robotics ,Electrical Engineering and Systems Science - Systems and Control - Abstract
Effective point cloud processing is crucial to LiDARbased autonomous driving systems. The capability to understand features at multiple scales is required for object detection of intelligent vehicles, where road users may appear in different sizes. Recent methods focus on the design of the feature aggregation operators, which collect features at different scales from the encoder backbone and assign them to the points of interest. While efforts are made into the aggregation modules, the importance of how to fuse these multi-scale features has been overlooked. This leads to insufficient feature communication across scales. To address this issue, this paper proposes the Point Pyramid RCNN (POP-RCNN), a feature pyramid-based framework for 3D object detection on point clouds. POP-RCNN consists of a Point Pyramid Feature Enhancement (PPFE) module to establish connections across spatial scales and semantic depths for information exchange. The PPFE module effectively fuses multi-scale features for rich information without the increased complexity in feature aggregation. To remedy the impact of inconsistent point densities, a point density confidence module is deployed. This design integration enables the use of a lightweight feature aggregator, and the emphasis on both shallow and deep semantics, realising a detection framework for 3D object detection. With great adaptability, the proposed method can be applied to a variety of existing frameworks to increase feature richness, especially for long-distance detection. By adopting the PPFE in the voxel-based and point-voxel-based baselines, experimental results on KITTI and Waymo Open Dataset show that the proposed method achieves remarkable performance even with limited computational headroom., Comment: 12 pages
- Published
- 2024
33. From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents
- Author
-
Yu, Jifan, Zhang, Zheyuan, Zhang-li, Daniel, Tu, Shangqing, Hao, Zhanxin, Li, Rui Miao, Li, Haoxuan, Wang, Yuanchun, Li, Hanming, Gong, Linlu, Cao, Jie, Lin, Jiayin, Zhou, Jinchang, Qin, Fei, Wang, Haohua, Jiang, Jianxiao, Deng, Lijun, Zhan, Yisi, Xiao, Chaojun, Dai, Xusheng, Yan, Xuan, Lin, Nianyi, Zhang, Nan, Ni, Ruixin, Dang, Yang, Hou, Lei, Zhang, Yu, Han, Xu, Li, Manli, Li, Juanzi, Liu, Zhiyuan, Liu, Huiqin, and Sun, Maosong
- Subjects
Computer Science - Computers and Society ,Computer Science - Computation and Language - Abstract
Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integrated into this learning format, resulting in a variety of educational AI applications such as educational recommendation and intelligent tutoring. The emergence of intelligence in large language models (LLMs) has allowed for these educational enhancements to be built upon a unified foundational model, enabling deeper integration. In this context, we propose MAIC (Massive AI-empowered Course), a new form of online education that leverages LLM-driven multi-agent systems to construct an AI-augmented classroom, balancing scalability with adaptivity. Beyond exploring the conceptual framework and technical innovations, we conduct preliminary experiments at Tsinghua University, one of China's leading universities. Drawing from over 100,000 learning records of more than 500 students, we obtain a series of valuable observations and initial analyses. This project will continue to evolve, ultimately aiming to establish a comprehensive open platform that supports and unifies research, technology, and applications in exploring the possibilities of online education in the era of large model AI. We envision this platform as a collaborative hub, bringing together educators, researchers, and innovators to collectively explore the future of AI-driven online education.
- Published
- 2024
34. Longitudinal associations between microRNAs and weight in the diabetes prevention program
- Author
-
Flowers, Elena, Stroebel, Benjamin, Lewis, Kimberly A, Aouizerat, Bradley E, Gadgil, Meghana, Kanaya, Alka M, Zhang, Li, and Gong, Xingyue
- Subjects
Biomedical and Clinical Sciences ,Clinical Sciences ,Behavioral and Social Science ,Clinical Research ,Nutrition ,Biotechnology ,Genetics ,Diabetes ,Obesity ,Prevention ,2.1 Biological and endogenous factors ,Cardiovascular ,Metabolic and endocrine ,Cancer ,Good Health and Well Being ,Nutrition and Dietetics ,Clinical sciences - Abstract
Objective: Circulating microRNAs show cross-sectional associations with overweight and obesity. Few studies provided data to differentiate between a snapshot perspective on these associations versus how microRNAs characterize prodromal risk from disease pathology and complications. This study assessed longitudinal relationships between circulating microRNAs and weight at multiple time-points in the Diabetes Prevention Program trial. Research design and methods: A subset of participants (n=150) from the Diabetes Prevention Program were included. MicroRNAs were measured from banked plasma using a Fireplex Assay. We used generalized linear mixed models to evaluate relationships between microRNAs and changes in weight at baseline, year-1, and year-2. Logistic regression was used to evaluate whether microRNAs at baseline were associated with weight change after 2 years. Results: In fully adjusted models that included relevant covariates, seven miRs (i.e., miR-126, miR-15a, miR-192, miR-23a, and miR-27a) were statistically associated with weight over 2 years. MiR-197 and miR-320a remained significant after adjustment for multiple comparisons. Baseline levels of let-7f, miR-17, and miR-320c were significantly associated with 3% weight loss after 2 years in fully adjusted models. Discussion: This study provided evidence for longitudinal relationships between circulating microRNAs and weight. Because microRNAs characterize the combined effects of genetic determinants and responses to behavioral determinants, they may provide insights about the etiology of overweight and obesity in the context or risk for common, complex diseases. Additional studies are needed to validate the potential genes and biological pathways that might be targeted by these microRNA biomarkers and have mechanistic implications for weight loss and disease prevention.
- Published
- 2024
35. Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes
- Author
-
Zhang, Li, Jindal, Basu, Alaa, Ahmed, Weinreb, Robert, Wilson, David, Segal, Eran, Zou, James, and Xie, Pengtao
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images are extremely limited, posing significant challenges for the generalization of conventional deep learning methods on test images. To address this, we introduce a generative deep learning framework, which uniquely generates high-quality paired segmentation masks and medical images, serving as auxiliary data for training robust models in data-scarce environments. Unlike traditional generative models that treat data generation and segmentation model training as separate processes, our method employs multi-level optimization for end-to-end data generation. This approach allows segmentation performance to directly influence the data generation process, ensuring that the generated data is specifically tailored to enhance the performance of the segmentation model. Our method demonstrated strong generalization performance across 9 diverse medical image segmentation tasks and on 16 datasets, in ultra-low data regimes, spanning various diseases, organs, and imaging modalities. When applied to various segmentation models, it achieved performance improvements of 10-20\% (absolute), in both same-domain and out-of-domain scenarios. Notably, it requires 8 to 20 times less training data than existing methods to achieve comparable results. This advancement significantly improves the feasibility and cost-effectiveness of applying deep learning in medical imaging, particularly in scenarios with limited data availability.
- Published
- 2024
36. Structured Event Reasoning with Large Language Models
- Author
-
Zhang, Li
- Subjects
Computer Science - Computation and Language - Abstract
Reasoning about real-life events is a unifying challenge in AI and NLP that has profound utility in a variety of domains, while fallacy in high-stake applications could be catastrophic. Able to work with diverse text in these domains, large language models (LLMs) have proven capable of answering questions and solving problems. However, I show that end-to-end LLMs still systematically fail to reason about complex events, and they lack interpretability due to their black-box nature. To address these issues, I propose three general approaches to use LLMs in conjunction with a structured representation of events. The first is a language-based representation involving relations of sub-events that can be learned by LLMs via fine-tuning. The second is a semi-symbolic representation involving states of entities that can be predicted and leveraged by LLMs via few-shot prompting. The third is a fully symbolic representation that can be predicted by LLMs trained with structured data and be executed by symbolic solvers. On a suite of event reasoning tasks spanning common-sense inference and planning, I show that each approach greatly outperforms end-to-end LLMs with more interpretability. These results suggest manners of synergy between LLMs and structured representations for event reasoning and beyond., Comment: PhD thesis
- Published
- 2024
37. CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence
- Author
-
Chen, Chaochao, Zhang, Jiaming, Zhang, Yizhao, Zhang, Li, Lyu, Lingjuan, Li, Yuyuan, Gong, Biao, and Yan, Chenggang
- Subjects
Computer Science - Information Retrieval ,Computer Science - Machine Learning - Abstract
With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in models, particularly in recommender systems where historical data contains sensitive user information. Despite recent advances in recommendation unlearning, evaluating unlearning methods comprehensively remains challenging due to the absence of a unified evaluation framework and overlooked aspects of deeper influence, e.g., fairness. To address these gaps, we propose CURE4Rec, the first comprehensive benchmark for recommendation unlearning evaluation. CURE4Rec covers four aspects, i.e., unlearning Completeness, recommendation Utility, unleaRning efficiency, and recommendation fairnEss, under three data selection strategies, i.e., core data, edge data, and random data. Specifically, we consider the deeper influence of unlearning on recommendation fairness and robustness towards data with varying impact levels. We construct multiple datasets with CURE4Rec evaluation and conduct extensive experiments on existing recommendation unlearning methods. Our code is released at https://github.com/xiye7lai/CURE4Rec.
- Published
- 2024
38. PAM: A Propagation-Based Model for Segmenting Any 3D Objects across Multi-Modal Medical Images
- Author
-
Chen, Zifan, Nan, Xinyu, Li, Jiazheng, Zhao, Jie, Li, Haifeng, Lin, Ziling, Li, Haoshen, Chen, Heyun, Liu, Yiting, Tang, Lei, Zhang, Li, and Dong, Bin
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Artificial Intelligence - Abstract
Volumetric segmentation is important in medical imaging, but current methods face challenges like requiring lots of manual annotations and being tailored to specific tasks, which limits their versatility. General segmentation models used for natural images don't perform well with the unique features of medical images. There's a strong need for an adaptable approach that can effectively handle different 3D medical structures and imaging modalities. In this study, we present PAM (Propagating Anything Model), a segmentation approach that uses a 2D prompt, like a bounding box or sketch, to create a complete 3D segmentation of medical image volumes. PAM works by modeling relationships between slices, maintaining information flow across the 3D structure. It combines a CNN-based UNet for processing within slices and a Transformer-based attention module for propagating information between slices, leading to better generalizability across various imaging modalities. PAM significantly outperformed existing models like MedSAM and SegVol, with an average improvement of over 18.1% in dice similarity coefficient (DSC) across 44 medical datasets and various object types. It also showed stable performance despite prompt deviations and different propagation setups, and faster inference speeds compared to other models. PAM's one-view prompt design made it more efficient, reducing interaction time by about 63.6% compared to two-view prompts. Thanks to its focus on structural relationships, PAM handled unseen and complex objects well, showing a unique ability to generalize to new situations. PAM represents an advancement in medical image segmentation, effectively reducing the need for extensive manual work and specialized training. Its adaptability makes it a promising tool for more automated and reliable analysis in clinical settings., Comment: 28 pages, 6 figures
- Published
- 2024
39. What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
- Author
-
Liu, Yilun, He, Minggui, Yao, Feiyu, Ji, Yuhe, Tao, Shimin, Du, Jingzhou, Li, Duan, Gao, Jian, Zhang, Li, Yang, Hao, Chen, Boxing, and Yoshie, Osamu
- Subjects
Computer Science - Artificial Intelligence - Abstract
The emergence of text-to-image synthesis (TIS) models has significantly influenced digital image creation by producing high-quality visuals from written descriptions. Yet these models heavily rely on the quality and specificity of textual prompts, posing a challenge for novice users who may not be familiar with TIS-model-preferred prompt writing. Existing solutions relieve this via automatic model-preferred prompt generation from user queries. However, this single-turn manner suffers from limited user-centricity in terms of result interpretability and user interactivity. To address these issues, we propose DialPrompt, a multi-turn dialogue-based TIS prompt generation model that emphasises user-centricity. DialPrompt is designed to follow a multi-turn guidance workflow, where in each round of dialogue the model queries user with their preferences on possible optimization dimensions before generating the final TIS prompt. To achieve this, we mined 15 essential dimensions for high-quality prompts from advanced users and curated a multi-turn dataset. Through training on this dataset, DialPrompt can improve interpretability by allowing users to understand the correlation between specific phrases and image attributes. Additionally, it enables greater user control and engagement in the prompt generation process, leading to more personalized and visually satisfying outputs. Experiments indicate that DialPrompt achieves a competitive result in the quality of synthesized images, outperforming existing prompt engineering approaches by 5.7%. Furthermore, in our user evaluation, DialPrompt outperforms existing approaches by 46.5% in user-centricity score and is rated 7.9/10 by 19 human reviewers.
- Published
- 2024
40. Enhancing Automated Program Repair with Solution Design
- Author
-
Zhao, Jiuang, Yang, Donghao, Zhang, Li, Lian, Xiaoli, Yang, Zitian, and Liu, Fang
- Subjects
Computer Science - Software Engineering ,Computer Science - Artificial Intelligence - Abstract
Automatic Program Repair (APR) endeavors to autonomously rectify issues within specific projects, which generally encompasses three categories of tasks: bug resolution, new feature development, and feature enhancement. Despite extensive research proposing various methodologies, their efficacy in addressing real issues remains unsatisfactory. It's worth noting that, typically, engineers have design rationales (DR) on solution-planed solutions and a set of underlying reasons-before they start patching code. In open-source projects, these DRs are frequently captured in issue logs through project management tools like Jira. This raises a compelling question: How can we leverage DR scattered across the issue logs to efficiently enhance APR? To investigate this premise, we introduce DRCodePilot, an approach designed to augment GPT-4-Turbo's APR capabilities by incorporating DR into the prompt instruction. Furthermore, given GPT-4's constraints in fully grasping the broader project context and occasional shortcomings in generating precise identifiers, we have devised a feedback-based self-reflective framework, in which we prompt GPT-4 to reconsider and refine its outputs by referencing a provided patch and suggested identifiers. We have established a benchmark comprising 938 issue-patch pairs sourced from two open-source repositories hosted on GitHub and Jira. Our experimental results are impressive: DRCodePilot achieves a full-match ratio that is a remarkable 4.7x higher than when GPT-4 is utilized directly. Additionally, the CodeBLEU scores also exhibit promising enhancements. Moreover, our findings reveal that the standalone application of DR can yield promising increase in the full-match ratio across CodeLlama, GPT-3.5, and GPT-4 within our benchmark suite. We believe that our DRCodePilot initiative heralds a novel human-in-the-loop avenue for advancing the field of APR., Comment: *These authors contributed equally to this work. {\dag}Corresponding author. Will appear in ase'24
- Published
- 2024
41. Folded multistability and hidden critical point in microwave-driven Rydberg atoms
- Author
-
Ma, Yu, Liu, Bang, Zhang, Li-Hua, Wang, Ya-Jun, Zhang, Zheng-Yuan, Shao, Shi-Yao, Li, Qing, Chen, Han-Chao, Zhang, Jun, Han, Tian-Yu, Wang, Qi-Feng, Nan, Jia-Dou, Yin, Yi-Ming, Zhu, Dong-Yang, Shi, Bao-Sen, and Ding, Dong-Sheng
- Subjects
Condensed Matter - Quantum Gases ,Quantum Physics - Abstract
The interactions between Rydberg atoms and microwave fields provide a valuable framework for studying the complex dynamics out of equilibrium, exotic phases, and critical phenomena in many-body physics. This unique interplay allows us to explore various regimes of nonlinearity and phase transitions. Here, we observe a phase transition from the state in the regime of bistability to that in multistability in strongly interacting Rydberg atoms by varying the microwave field intensity, accompanying with the breaking of Z3-symmetry. During the phase transition, the system experiences a hidden critical point, in which the multistable states are difficult to be identified. Through changing the initial state of system, we can identify a hidden multistable state and reveal a hidden trajectory of phase transition, allowing us to track to a hidden critical point. In addition, we observe multiple phase transitions in spectra, suggesting higher-order symmetry breaking. The reported results shed light on manipulating multistability in dissipative Rydberg atoms systems and hold promise in the applications of non-equilibrium many-body physics., Comment: 10 pages, 5 figures
- Published
- 2024
42. Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
- Author
-
Qi, Zhenting, Ma, Mingyuan, Xu, Jiahang, Zhang, Li Lyna, Yang, Fan, and Yang, Mao
- Subjects
Computer Science - Computation and Language - Abstract
This paper introduces rStar, a self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process. First, a target SLM augments the Monte Carlo Tree Search (MCTS) with a rich set of human-like reasoning actions to construct higher quality reasoning trajectories. Next, another SLM, with capabilities similar to the target SLM, acts as a discriminator to verify each trajectory generated by the target SLM. The mutually agreed reasoning trajectories are considered mutual consistent, thus are more likely to be correct. Extensive experiments across five SLMs demonstrate rStar can effectively solve diverse reasoning problems, including GSM8K, GSM-Hard, MATH, SVAMP, and StrategyQA. Remarkably, rStar boosts GSM8K accuracy from 12.51% to 63.91% for LLaMA2-7B, from 36.46% to 81.88% for Mistral-7B, from 74.53% to 91.13% for LLaMA3-8B-Instruct. Code will be available at https://github.com/zhentingqi/rStar.
- Published
- 2024
43. DeepInteraction++: Multi-Modality Interaction for Autonomous Driving
- Author
-
Yang, Zeyu, Song, Nan, Li, Wei, Zhu, Xiatian, Zhang, Li, and Torr, Philip H. S.
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
Existing top-performance autonomous driving systems typically rely on the multi-modal fusion strategy for reliable scene understanding. This design is however fundamentally restricted due to overlooking the modality-specific strengths and finally hampering the model performance. To address this limitation, in this work, we introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout, enabling their unique characteristics to be exploited during the whole perception pipeline. To demonstrate the effectiveness of the proposed strategy, we design DeepInteraction++, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Specifically, the encoder is implemented as a dual-stream Transformer with specialized attention operation for information exchange and integration between separate modality-specific representations. Our multi-modal representational learning incorporates both object-centric, precise sampling-based feature alignment and global dense information spreading, essential for the more challenging planning task. The decoder is designed to iteratively refine the predictions by alternately aggregating information from separate representations in a unified modality-agnostic manner, realizing multi-modal predictive interaction. Extensive experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks. Our code is available at https://github.com/fudan-zvg/DeepInteraction., Comment: Journal extension of NeurIPS 2022. arXiv admin note: text overlap with arXiv:2208.11112
- Published
- 2024
44. Exceptional point and hysteresis trajectories in cold Rydberg atomic gases
- Author
-
Zhang, Jun, Li, En-Ze, Wang, Ya-Jun, Liu, Bang, Zhang, Li-Hua, Zhang, Zheng-Yuan, Shao, Shi-Yao, Li, Qing, Chen, Han-Chao, Ma, Yu, Han, Tian-Yu, Wang, Qi-Feng, Nan, Jia-Dou, Ying, Yi-Ming, Zhu, Dong-Yang, Shi, Bao-Sen, and Ding, Dong-Sheng
- Subjects
Condensed Matter - Quantum Gases ,Quantum Physics - Abstract
The interplay between strong long-range interactions and the coherent driving contribute to the formation of complex patterns, symmetry, and novel phases of matter in many-body systems. However, long-range interactions may induce an additional dissipation channel, resulting in non-Hermitian many-body dynamics and the emergence of exceptional points in spectrum. Here, we report experimental observation of interaction-induced exceptional points in cold Rydberg atomic gases, revealing the breaking of charge-conjugation parity symmetry. By measuring the transmission spectrum under increasing and decreasing probe intensity, the interaction-induced hysteresis trajectories are observed, which give rise to non-Hermitian dynamics. We record the area enclosed by hysteresis loops and investigate the dynamics of hysteresis loops. The reported exceptional points and hysteresis trajectories in cold Rydberg atomic gases provide valuable insights into the underlying non-Hermitian physics in many-body systems, allowing us to study the interplay between long-range interactions and non-Hermiticity.
- Published
- 2024
45. Nighttime Pedestrian Detection Based on Fore-Background Contrast Learning
- Author
-
Yao, He, Zhang, Yongjun, Jian, Huachun, Zhang, Li, and Cheng, Ruzhong
- Subjects
Computer Science - Computer Vision and Pattern Recognition - Abstract
The significance of background information is frequently overlooked in contemporary research concerning channel attention mechanisms. This study addresses the issue of suboptimal single-spectral nighttime pedestrian detection performance under low-light conditions by incorporating background information into the channel attention mechanism. Despite numerous studies focusing on the development of efficient channel attention mechanisms, the relevance of background information has been largely disregarded. By adopting a contrast learning approach, we reexamine channel attention with regard to pedestrian objects and background information for nighttime pedestrian detection, resulting in the proposed Fore-Background Contrast Attention (FBCA). FBCA possesses two primary attributes: (1) channel descriptors form remote dependencies with global spatial feature information; (2) the integration of background information enhances the distinction between channels concentrating on low-light pedestrian features and those focusing on background information. Consequently, the acquired channel descriptors exhibit a higher semantic level and spatial accuracy. Experimental outcomes demonstrate that FBCA significantly outperforms existing methods in single-spectral nighttime pedestrian detection, achieving state-of-the-art results on the NightOwls and TJU-DHD-pedestrian datasets. Furthermore, this methodology also yields performance improvements for the multispectral LLVIP dataset. These findings indicate that integrating background information into the channel attention mechanism effectively mitigates detector performance degradation caused by illumination factors in nighttime scenarios.
- Published
- 2024
46. MoleNetwork: A tool for the generation of synthetic optical network topologies
- Author
-
Sánchez-Macián, Alfonso, Koneva, Nataliia, Quagliotti, Marco, Rivas-Moscoso, José M., Arpanaei, Farhad, Hernández, José Alberto, Fernández-Palacios, Juan P., Zhang, Li, and Riccardi, Emilio
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
Model networks and their underlying topologies have been used as a reference for techno-economic studies for several decades. Existing reference topologies for optical networks may cover different network segments such as backbone, metro core, metro aggregation, access and/or data center. While telco operators work on the optimization of their own existing deployed optical networks, the availability of different topologies is useful for researchers and technology developers to test their solutions in a variety of scenarios and validate the performance in terms of energy efficiency or cost reduction. This paper presents an open-source tool, MoleNetwork, to generate graphs inspired by real network topologies of telecommunication operators that can be used as benchmarks for techno-economic studies.
- Published
- 2024
47. Controllable Unlearning for Image-to-Image Generative Models via $\varepsilon$-Constrained Optimization
- Author
-
Feng, Xiaohua, Chen, Chaochao, Li, Yuyuan, and Zhang, Li
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence ,Computer Science - Computer Vision and Pattern Recognition - Abstract
While generative models have made significant advancements in recent years, they also raise concerns such as privacy breaches and biases. Machine unlearning has emerged as a viable solution, aiming to remove specific training data, e.g., containing private information and bias, from models. In this paper, we study the machine unlearning problem in Image-to-Image (I2I) generative models. Previous studies mainly treat it as a single objective optimization problem, offering a solitary solution, thereby neglecting the varied user expectations towards the trade-off between complete unlearning and model utility. To address this issue, we propose a controllable unlearning framework that uses a control coefficient $\varepsilon$ to control the trade-off. We reformulate the I2I generative model unlearning problem into a $\varepsilon$-constrained optimization problem and solve it with a gradient-based method to find optimal solutions for unlearning boundaries. These boundaries define the valid range for the control coefficient. Within this range, every yielded solution is theoretically guaranteed with Pareto optimality. We also analyze the convergence rate of our framework under various control functions. Extensive experiments on two benchmark datasets across three mainstream I2I models demonstrate the effectiveness of our controllable unlearning framework., Comment: 40 pages, 54 figures
- Published
- 2024
48. DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis
- Author
-
Pan, Yue, Liu, Qile, Liu, Qing, Zhang, Li, Huang, Gan, Chen, Xin, Li, Fali, Xu, Peng, and Liang, Zhen
- Subjects
Computer Science - Human-Computer Interaction ,Computer Science - Artificial Intelligence - Abstract
Affective brain-computer interfaces (aBCIs) are increasingly recognized for their potential in monitoring and interpreting emotional states through electroencephalography (EEG) signals. Current EEG-based emotion recognition methods perform well with short segments of EEG data. However, these methods encounter significant challenges in real-life scenarios where emotional states evolve over extended periods. To address this issue, we propose a Dual Attentive (DuA) transformer framework for long-term continuous EEG emotion analysis. Unlike segment-based approaches, the DuA transformer processes an entire EEG trial as a whole, identifying emotions at the trial level, referred to as trial-based emotion analysis. This framework is designed to adapt to varying signal lengths, providing a substantial advantage over traditional methods. The DuA transformer incorporates three key modules: the spatial-spectral network module, the temporal network module, and the transfer learning module. The spatial-spectral network module simultaneously captures spatial and spectral information from EEG signals, while the temporal network module detects temporal dependencies within long-term EEG data. The transfer learning module enhances the model's adaptability across different subjects and conditions. We extensively evaluate the DuA transformer using a self-constructed long-term EEG emotion database, along with two benchmark EEG emotion databases. On the basis of the trial-based leave-one-subject-out cross-subject cross-validation protocol, our experimental results demonstrate that the proposed DuA transformer significantly outperforms existing methods in long-term continuous EEG emotion analysis, with an average enhancement of 5.28%., Comment: 11 pages, 3 figures
- Published
- 2024
49. MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
- Author
-
Ren, Yulin, Li, Xin, Li, Bingchen, Wang, Xingrui, Guo, Mengxi, Zhao, Shijie, Zhang, Li, and Chen, Zhibo
- Subjects
Electrical Engineering and Systems Science - Image and Video Processing ,Computer Science - Computer Vision and Pattern Recognition - Abstract
We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develops the powerful mixture-of-experts (MoE) prompt module, where some basic prompts cooperate to excavate the task-customized diffusion priors from Stable Diffusion (SD) for each compression task. Moreover, the degradation-aware routing mechanism is proposed to enable the flexible assignment of basic prompts. To activate and reuse the cross-modality generation prior of SD, we design the visual-to-text adapter for MoE-DiffIR, which aims to adapt the embedding of low-quality images from the visual domain to the textual domain as the textual guidance for SD, enabling more consistent and reasonable texture generation. We also construct one comprehensive benchmark dataset for universal CIR, covering 21 types of degradations from 7 popular traditional and learned codecs. Extensive experiments on universal CIR have demonstrated the excellent robustness and texture restoration capability of our proposed MoE-DiffIR. The project can be found at https://renyulin-f.github.io/MoE-DiffIR.github.io/., Comment: Accepted by ECCV 2024
- Published
- 2024
50. Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
- Author
-
Zhang, Li, Jiang, Ning, Wang, Qing, Li, Yue, Lu, Quan, and Xie, Lei
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Trained on 680,000 hours of massive speech data, Whisper is a multitasking, multilingual speech foundation model demonstrating superior performance in automatic speech recognition, translation, and language identification. However, its applicability in speaker verification (SV) tasks remains unexplored, particularly in low-data-resource scenarios where labeled speaker data in specific domains are limited. To fill this gap, we propose a lightweight adaptor framework to boost SV with Whisper, namely Whisper-SV. Given that Whisper is not specifically optimized for SV tasks, we introduce a representation selection module to quantify the speaker-specific characteristics contained in each layer of Whisper and select the top-k layers with prominent discriminative speaker features. To aggregate pivotal speaker-related features while diminishing non-speaker redundancies across the selected top-k distinct layers of Whisper, we design a multi-layer aggregation module in Whisper-SV to integrate multi-layer representations into a singular, compacted representation for SV. In the multi-layer aggregation module, we employ convolutional layers with shortcut connections among different layers to refine speaker characteristics derived from multi-layer representations from Whisper. In addition, an attention aggregation layer is used to reduce non-speaker interference and amplify speaker-specific cues for SV tasks. Finally, a simple classification module is used for speaker classification. Experiments on VoxCeleb1, FFSVC, and IMSV datasets demonstrate that Whisper-SV achieves EER/minDCF of 2.22%/0.307, 6.14%/0.488, and 7.50%/0.582, respectively, showing superior performance in low-data-resource SV scenarios.
- Published
- 2024
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.