Author: "Chen, Jian" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Chen, Jian"' showing total 15,221 results

Start Over Author "Chen, Jian" Publication Year Range Last 3 years

15,221 results on '"Chen, Jian"'

1. A Survey of Medical Vision-and-Language Applications and Their Techniques

Author: Chen, Qi, Zhao, Ruoshan, Wang, Sinuo, Phan, Vu Minh Hieu, Hengel, Anton van den, Verjans, Johan, Liao, Zhibin, To, Minh-Son, Xia, Yong, Chen, Jian, Xie, Yutong, and Wu, Qi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data. Their applications are versatile and have the potential to improve diagnostic accuracy and decision-making for individual patients while also contributing to enhanced public health monitoring, disease surveillance, and policy-making through more efficient analysis of large data sets. MVLMS integrate natural language processing with medical images to enable a more comprehensive and contextual understanding of medical images alongside their corresponding textual information. Unlike general vision-and-language models trained on diverse, non-specialized datasets, MVLMs are purpose-built for the medical domain, automatically extracting and interpreting critical information from medical images and textual reports to support clinical decision-making. Popular clinical applications of MVLMs include automated medical report generation, medical visual question answering, medical multimodal segmentation, diagnosis and prognosis and medical image-text retrieval. Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied. We conduct a detailed analysis of various vision-and-language model architectures, focusing on their distinct strategies for cross-modal integration/exploitation of medical visual and textual features. We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics. Furthermore, we highlight potential challenges and summarize future research trends and directions. The full collection of papers and codes is available at: https://github.com/YtongXie/Medical-Vision-and-Language-Tasks-and-Methodologies-A-Survey.
Published: 2024

2. MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI

Author: Newlin, Nancy R., Schilling, Kurt, Koudoro, Serge, Chandio, Bramsh Qamar, Kanakaraj, Praitayini, Moyer, Daniel, Kelly, Claire E., Genc, Sila, Chen, Jian, Yang, Joseph Yuan-Mou, Wu, Ye, He, Yifei, Zhang, Jiawei, Zeng, Qingrun, Zhang, Fan, Adluru, Nagesh, Nath, Vishwesh, Pathak, Sudhir, Schneider, Walter, Gade, Anurag, Rathi, Yogesh, Hendriks, Tom, Vilanova, Anna, Chamberland, Maxime, Pieciak, Tomasz, Ciupek, Dominika, Vega, Antonio Tristán, Aja-Fernández, Santiago, Malawski, Maciej, Ouedraogo, Gani, Machnio, Julia, Ewert, Christian, Thompson, Paul M., Jahanshad, Neda, Garyfallidis, Eleftherios, and Landman, Bennett A.
Subjects: Physics - Medical Physics, Computer Science - Machine Learning
Abstract: White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. There is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences., Comment: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024/019
Published: 2024
Full Text: View/download PDF

3. LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding

Author: Chen, Jian, Zhang, Ruiyi, Zhou, Yufan, Yu, Tong, Dernoncourt, Franck, Gu, Jiuxiang, Rossi, Ryan A., Chen, Changyou, and Sun, Tong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large multimodal models (LMMs) have recently shown great progress in text-rich image understanding, yet they still struggle with complex, multi-page, visually-rich documents. Traditional methods using document parsers for retrieval-augmented generation suffer from performance and efficiency limitations, while directly presenting all pages to LMMs leads to inefficiencies, especially with lengthy documents. In this work, we present a novel framework named LoRA-Contextualizing Adaptation of Large multimodal models (LoCAL), which broadens the capabilities of any LMM to support long-document understanding. We demonstrate that LMMs can effectively serve as multimodal retrievers, fetching relevant pages to answer user questions based on these pages. LoCAL is implemented with two specific LMM adapters: one for evidence page retrieval and another for question answering. Empirical results show state-of-the-art performance on public benchmarks, demonstrating the effectiveness of LoCAL., Comment: Currently Under Review
Published: 2024

4. Scaled Proximal Gradient Methods for Multiobjective Optimization: Improved Linear Convergence and Nesterov's Acceleration

Author: Chen, Jian, Tang, Liping, and Yang, Xinmin
Subjects: Mathematics - Optimization and Control, 90C29, 90C30
Abstract: Over the past two decades, descent methods have received substantial attention within the multiobjective optimization field. Nonetheless, both theoretical analyses and empirical evidence reveal that existing first-order methods for multiobjective optimization converge slowly, even for well-conditioned problems, due to the objective imbalances. To address this limitation, we incorporate curvature information to scale each objective within the direction-finding subproblem, introducing a scaled proximal gradient method for multiobjective optimization (SPGMO). We demonstrate that the proposed method achieves improved linear convergence, exhibiting rapid convergence in well-conditioned scenarios. Furthermore, by applying small scaling to linear objectives, we prove that the SPGMO attains improved linear convergence for problems with multiple linear objectives. Additionally, integrating Nesterov's acceleration technique further enhances the linear convergence of SPGMO. To the best of our knowledge, this advancement in linear convergence is the first theoretical result that directly addresses objective imbalances in multiobjective first-order methods. Finally, we provide numerical experiments to validate the efficiency of the proposed methods and confirm the theoretical findings.
Published: 2024

5. A Survey of Small Language Models

Author: Van Nguyen, Chien, Shen, Xuan, Aponte, Ryan, Xia, Yu, Basu, Samyadeep, Hu, Zhengmian, Chen, Jian, Parmar, Mihir, Kunapuli, Sasidhar, Barrow, Joe, Wu, Junda, Singh, Ashish, Wang, Yu, Gu, Jiuxiang, Dernoncourt, Franck, Ahmed, Nesreen K., Lipka, Nedim, Zhang, Ruiyi, Chen, Xiang, Yu, Tong, Kim, Sungchul, Deilamsalehy, Hanieh, Park, Namyong, Rimer, Mike, Zhang, Zhehao, Yang, Huanrui, Rossi, Ryan A., and Nguyen, Thien Huu
Subjects: Computer Science - Computation and Language
Abstract: Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources, making them ideal for various settings including on-device, mobile, edge devices, among many others. In this article, we present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques. We propose a novel taxonomy for categorizing the methods used to optimize SLMs, including model compression, pruning, and quantization techniques. We summarize the benchmark datasets that are useful for benchmarking SLMs along with the evaluation metrics commonly used. Additionally, we highlight key open challenges that remain to be addressed. Our survey aims to serve as a valuable resource for researchers and practitioners interested in developing and deploying small yet efficient language models.
Published: 2024

6. Unraveling structural, electronic, and magnetic ambiguities in Pb1-{\delta}CrO3 with an insulating charge-transfer band structure

Author: Chen, Jian, Song, Guozhu, Ge, Han, Santos, Antonio M. Dos, Wu, Liusuo, Zhao, Yusheng, and Wang, Shanmin
Subjects: Condensed Matter - Materials Science
Abstract: As a recently-identified Mott system, PbCrO3 remains largely unexplored, especially for its band structure, leading to many contentious issues on its structural, electronic, and magnetic properties. Here we present a comprehensive study of two different Pb1-{\delta}CrO3 ({\delta} = 0 and 0.15) samples with involving atomic deficiency prepared under pressure. By means of the state-of-the-art diffraction techniques, crystal structure of PbCrO3 is definitively determined to adopt the pristine Pm-3m symmetry, rather than other previously misassigned structures of M2-Pm-3m and Pmnm. The two materials exhibit a similar charge-transfer-type insulating band structure, and the charge-transfer effect splits both Cr2p and Pb4f orbitals, rationalizing doublet splitting of the associated spectral lines. Nearly identical nominal cationic valence states of Cr4+ and Pb2+ are identified for this oxide system, hence calling into question the validity of recently-proposed charge disproportionation mechanisms. Besides, Pb0.85CrO3 exhibits an anomalously higher N\'eel temperature of ~240 K than that of PbCrO3 (i.e., ~200 K), likely due to the deficiency-induced enhancements of Cr: 3d-O:2p orbital overlap and magnetic exchange. These findings provide many solid evidences to look into the fundamental properties of this important material system.
Published: 2024

7. Intelligent Reflecting Surface-Assisted Symbiotic Radio Systems: A Double-Reflection Covert Communication Design

Author: Feng, Yunpeng, Chen, Jian, Lv, Lu, Zhou, Yuchen, Yang, Long, Al-Dhahir, Naofal, and Adachi, Fumiyuki
Subjects: Computer Science - Information Theory
Abstract: We investigate covert communication in an intelligent reflecting surface (IRS)-assisted symbiotic radio (SR) system under the parasitic SR (PSR) and the commensal SR (CSR) cases, where an IRS is exploited to create a double reflection link for legitimate users and degrade the detection performance of the warden (W). Specifically, we derive an analytical expression for the average detection error probability of W and design an optimal strategy to determine the transmit power and backscatter reflection coefficient. To further enhance the covert performance, the joint optimization of the source transmit power, backscatter device (BD) reflection coefficient, and IRS phase-shifter is formulated as an expectation-based quadratic-fractional (EQF) problem. By reformulating the original problem into a fraction-eliminated backscatter power leakage minimization problem, we further develop the phase alignment pursuit and the power leakage minimization algorithms for the PSR and the CSR cases, respectively. Numerical results confirm the accuracy of the derived results and the superiority of our proposed strategy in terms of covertness.
Published: 2024

8. TextLap: Customizing Language Models for Text-to-Layout Planning

Author: Chen, Jian, Zhang, Ruiyi, Zhou, Yufan, Healey, Jennifer, Gu, Jiuxiang, Xu, Zhiqiang, and Chen, Changyou
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Automatic generation of graphical layouts is crucial for many real-world applications, including designing posters, flyers, advertisements, and graphical user interfaces. Given the incredible ability of Large language models (LLMs) in both natural language understanding and generation, we believe that we could customize an LLM to help people create compelling graphical layouts starting with only text instructions from the user. We call our method TextLap (text-based layout planning). It uses a curated instruction-based layout planning dataset (InsLap) to customize LLMs as a graphic designer. We demonstrate the effectiveness of TextLap and show that it outperforms strong baselines, including GPT-4 based methods, for image generation and graphical design benchmarks., Comment: Accepted to the EMNLP Findings
Published: 2024

9. Gap Preserving Distillation by Building Bidirectional Mappings with A Dynamic Teacher

Author: Guo, Yong, Zhang, Shulian, Pan, Haolin, Liu, Jing, Zhang, Yulun, and Chen, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Knowledge distillation aims to transfer knowledge from a large teacher model to a compact student counterpart, often coming with a significant performance gap between them. We find that a too-large performance gap can hamper the training process, which is also verified in recent studies. To address this, we propose a Gap Preserving Distillation (GPD) method that trains an additional dynamic teacher model from scratch along with training the student to bridge this gap. In this way, it becomes possible to maintain a reasonable performance gap between teacher and student during the whole distillation process. To further strengthen distillation from the dynamic teacher to the student, we develop a hard strategy by enforcing them to share parameters and encouraging parameter inheritance. Besides hard strategy, we also build the soft bidirectional mappings between them which are built on an Inverse Reparameterization (IR) method and a Channel-Branch Reparameterization (CBR) strategy. We highlight that our IR is able to initialize a larger dynamic teacher with an arbitrary expansion ratio, while preserving exactly the same accuracy as the given student model. In this way, it guarantees that the dynamic teacher and student start from the same point and avoid a too large gap in early stage of training. As for our CBR, with parameter-sharing, it directly extracts an effective student model from the well-learned dynamic teacher without any post-training, making our method highly flexible for model deployment. In the experiments, GPD significantly outperforms existing distillation methods on top of both CNNs and transformers architectures, achieving up to 1.58% accuracy improvement. Interestingly, GPD also generalizes well to the scenarios without a pre-trained teacher, including training from scratch and fine-tuning, yielding a large improvement of 1.80% and 0.89% on ResNet18, respectively., Comment: 10 pages for the main paper
Published: 2024

10. Measuring the Diffuse Interstellar Bands at 5780, 5797, and 6614 {\AA} in Low-Resolution Spectra of Cool Stars from LAMOST

Author: Ma, Xiao-Xiao, Chen, Jian-Jun, Luo, A-Li, Zhao, He, Shi, Ji-Wei, Chen, Jing, Liang, Jun-Chao, Ma, Shu-Guo, Qu, Cai-Xia, and Jiang, Bi-Wei
Subjects: Astrophysics - Astrophysics of Galaxies, Astrophysics - Solar and Stellar Astrophysics, Physics - Data Analysis, Statistics and Probability
Abstract: We attempt to measure the DIBs $\lambda$5780, $\lambda$5797 and $\lambda$6614 in over two million low-resolution spectra of cool stars from LAMOST. Based on the DIB measurements, the correlation between DIBs and extinction, the kinematics of DIBs, and the Galactic distribution of DIBs are reviewed and investigated from the perspective of statistics. A pipeline is developed to measure the DIBs $\lambda$5780, $\lambda$5797 and $\lambda$6614 in the LAMOST low-resolution spectra. We obtain the DIB measurements of spectra of late-type stars from LAMOST, and screen out 176,831, 13,473 and 110,152 high-quality (HQ) measurements of the DIBs $\lambda$5780, $\lambda$5797 and $\lambda$6614, respectively, corresponding to 142,074, 11,480 and 85,301 unique sources. Utilizing these HQ measurements, we present the Galactic maps of the DIBs $\lambda$5780 and $\lambda$6614 in the northern sky for the first time. The central wavelengths of the DIBs $\lambda$5780, $\lambda$5797 and $\lambda$6614 in air are determined to be 5780.48 $\pm$ 0.01, 5796.94 $\pm$ 0.02 and 6613.64 $\pm$ 0.01 {\AA}, respectively, based on their kinematics. The equivalent widths of these three DIBs per unit extinction are statistically fitted to be 0.565, 0.176 and 0.256 {\AA}/mag. As a part of our work, three catalogs of the HQ measurements for the DIBs $\lambda$5780, $\lambda$5797 and $\lambda$6614 are provided online. To the best of our knowledge, this is the largest number of measurements of these three DIBs to date. It is also the first time that the Galactic maps of the DIBs $\lambda$5780 and $\lambda$6614 in the northern hemisphere are presented, and the central wavelengths of the DIBs $\lambda$5780, $\lambda$5797 and $\lambda$6614 are estimated from the kinematics., Comment: 19 pages, 5 tables, 13 figures, 1 appendix, accepted for publication in A&A
Published: 2024
Full Text: View/download PDF

11. New Measurements of the Deuteron to Proton F2 Structure Function Ratio

Author: Biswas, Debaditya, Gonzalez, Fernando Araiza, Henry, William, Karki, Abishek, Morean, Casey, Nadeeshani, Sooriyaarachchilage, Sun, Abel, Abrams, Daniel, Ahmed, Zafar, Aljawrneh, Bashar, Alsalmi, Sheren, Ambrose, George, Armstrong, Whitney, Asaturyan, Arshak, Assumin-Gyimah, Kofi, Gayoso, Carlos Ayerbe, Bandari, Anashe, Basnet, Samip, Berdnikov, Vladimir, Bhatt, Hem, Bhetuwal, Deepak, Boeglin, Werner, Bosted, Peter, Brash, Edward, Bukhari, Masroor, Chen, Haoyu, Chen, Jian-Ping, Chen, Mingyu, Christy, Michael Eric, Dusa, Silviu Covrig, Craycraft, Kayla, Danagoulian, Samuel, Day, Donal, Diefenthaler, Markus, Dlamini, Mongi, Dunne, James, Duran, Burcu, Dutta, Dipangkar, Ent, Rolf, Evans, Rory, Fenker, Howard, Fomin, Nadia, Fuchey, Eric, Gaskell, David, Gautam, Thir Narayan, Hansen, Jens-Ole, Hauenstein, Florian, Hernandez, A., Horn, Tanja, Huber, Garth, Jones, Mark, Joosten, Sylvester, Kabir, Md Latiful, Keppel, Cynthia, Khanal, Achyut, King, Paul, Kinney, Edward, Kohl, Michael, Lashley-Colthirst, Nathaniel, Li, Shujie, Li, Wenliang, Liyanage, Anusha Habarakada, Mack, David, Malace, Simona, Markowitz, Pete, Matter, John, Meekins, David, Michaels, Robert, Mkrtchyan, Arthur, Mkrtchyan, Hamlet, Moore, Zae, Nazeer, S. J., Nanda, Shirsendu, Niculescu, Gabriel, Niculescu, Maria, Nguyen, Huong, Nuruzzaman, Nuruzzaman, Pandey, Bishnu, Park, Sanghwa, Pooser, Eric, Puckett, Andrew, Rehfuss, Melanie, Reinhold, Joerg, Sawatzky, Bradley, Smith, G., Szumila-Vance, Holly, Tadepalli, Arun, Tadevosyan, Vardan, Trotta, Richard, Wood, Stephen, Yero, Carlos, and Zhang, Jinlong
Subjects: High Energy Physics - Experiment
Abstract: Nucleon structure functions, as measured in lepton-nucleon scattering, have historically provided a critical observable in the study of partonic dynamics within the nucleon. However, at very large parton momenta it is both experimentally and theoretically challenging to extract parton distributions due to the probable onset of non-perturbative contributions and the unavailability of high precision data at critical kinematics. Extraction of the neutron structure and the d-quark distribution have been further challenging due to the necessity of applying nuclear corrections when utilizing scattering data from a deuteron target to extract free neutron structure. However, a program of experiments has been carried out recently at the energy-upgraded Jefferson Lab electron accelerator aimed at significantly reducing the nuclear correction uncertainties on the d-quark distribution function at large partonic momentum. This allows leveraging the vast body of deuterium data covering a large kinematic range to be utilized for d-quark parton distribution function extraction. We present new data from experiment E12-10-002 carried out in Jefferson Lab Hall C on the deuteron to proton cross-section ratio at large BJorken-x. These results significantly improve the precision of existing data, and provide a first look at the expected impact on quark distributions extracted from global parton distribution function fits.
Published: 2024

12. FracGM: A Fast Fractional Programming Technique for Geman-McClure Robust Estimator

Author: Chen, Bang-Shien, Lin, Yu-Kai, Chen, Jian-Yu, Huang, Chih-Wei, Chern, Jann-Long, and Sun, Ching-Cherng
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Robotics, Mathematics - Optimization and Control
Abstract: Robust estimation is essential in computer vision, robotics, and navigation, aiming to minimize the impact of outlier measurements for improved accuracy. We present a fast algorithm for Geman-McClure robust estimation, FracGM, leveraging fractional programming techniques. This solver reformulates the original non-convex fractional problem to a convex dual problem and a linear equation system, iteratively solving them in an alternating optimization pattern. Compared to graduated non-convexity approaches, this strategy exhibits a faster convergence rate and better outlier rejection capability. In addition, the global optimality of the proposed solver can be guaranteed under given conditions. We demonstrate the proposed FracGM solver with Wahba's rotation problem and 3-D point-cloud registration along with relaxation pre-processing and projection post-processing. Compared to state-of-the-art algorithms, when the outlier rates increase from 20% to 80%, FracGM shows 53% and 88% lower rotation and translation increases. In real-world scenarios, FracGM achieves better results in 13 out of 18 outcomes, while having a 19.43% improvement in the computation time., Comment: 8 pages, 6 figures
Published: 2024
Full Text: View/download PDF

13. Generative Semantic Communication via Textual Prompts: Latency Performance Tradeoffs

Author: Ren, Mengmeng, Qiao, Li, Yang, Long, Gao, Zhen, Chen, Jian, Mashhadi, Mahdi Boloursaz, Xiao, Pei, Tafazolli, Rahim, and Bennis, Mehdi
Subjects: Computer Science - Information Theory, Computer Science - Computer Science and Game Theory
Abstract: This paper develops an edge-device collaborative Generative Semantic Communications (Gen SemCom) framework leveraging pre-trained Multi-modal/Vision Language Models (M/VLMs) for ultra-low-rate semantic communication via textual prompts. The proposed framework optimizes the use of M/VLMs on the wireless edge/device to generate high-fidelity textual prompts through visual captioning/question answering, which are then transmitted over a wireless channel for SemCom. Specifically, we develop a multi-user Gen SemCom framework using pre-trained M/VLMs, and formulate a joint optimization problem of prompt generation offloading, communication and computation resource allocation to minimize the latency and maximize the resulting semantic quality. Due to the nonconvex nature of the problem with highly coupled discrete and continuous variables, we decompose it as a two-level problem and propose a low-complexity swap/leaving/joining (SLJ)-based matching algorithm. Simulation results demonstrate significant performance improvements over the conventional semanticunaware/non-collaborative offloading benchmarks.
Published: 2024

14. Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding

Author: Liang, Xiaoyu, Yu, Jiayuan, Mu, Lianrui, Zhuang, Jiedong, Hu, Jiaqi, Yang, Yuchen, Ye, Jiangnan, Lu, Lu, Chen, Jian, and Hu, Haoji
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Although Visual-Language Models (VLMs) have shown impressive capabilities in tasks like visual question answering and image captioning, they still struggle with hallucinations. Analysis of attention distribution in these models shows that VLMs tend to processing textual tokens rather than visual tokens. This imbalance of attention distribution causes VLMs to favor textual knowledge in the case of multimodal knowledge conflicts, resulting in differences from the image information. In this paper, we propose Re-Balancing Contrastive Decoding (RBD) method, which employs textual and visual branches to recalibrate attention distribution in VLMs. Specifically, the textual branch injects image noise to stimulate the model's dependency on text, thereby reducing textual bias. Concurrently, the visual branch focuses on the selection of significant tokens, refining the attention mechanism to highlight the primary subject. This dual-branch strategy enables the RBD method to diminish textual bias while enhancing visual information. Experimental results demonstrate that our method, RBD, outperforms the existing methods by the CHAIR and POPE metrics, mitigate hallucinations without reducing the model's general capabilities., Comment: PRCV
Published: 2024

15. Symmetry Breaking in Neural Network Optimization: Insights from Input Dimension Expansion

Author: Zhang, Jun-Jie, Cheng, Nan, Li, Fu-Peng, Wang, Xiu-Cheng, Chen, Jian-Nan, Pang, Long-Gang, and Meng, Deyu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Mathematical Physics
Abstract: Understanding the mechanisms behind neural network optimization is crucial for improving network design and performance. While various optimization techniques have been developed, a comprehensive understanding of the underlying principles that govern these techniques remains elusive. Specifically, the role of symmetry breaking, a fundamental concept in physics, has not been fully explored in neural network optimization. This gap in knowledge limits our ability to design networks that are both efficient and effective. Here, we propose the symmetry breaking hypothesis to elucidate the significance of symmetry breaking in enhancing neural network optimization. We demonstrate that a simple input expansion can significantly improve network performance across various tasks, and we show that this improvement can be attributed to the underlying symmetry breaking mechanism. We further develop a metric to quantify the degree of symmetry breaking in neural networks, providing a practical approach to evaluate and guide network design. Our findings confirm that symmetry breaking is a fundamental principle that underpins various optimization techniques, including dropout, batch normalization, and equivariance. By quantifying the degree of symmetry breaking, our work offers a practical technique for performance enhancement and a metric to guide network design without the need for complete datasets and extensive training processes., Comment: 29 pages, 8 figures
Published: 2024

16. Eliminating Timing Anomalies in Scheduling Periodic Segmented Self-Suspending Tasks with Release Jitter

Author: Lin, Ching-Chi, Günzel, Mario, Shi, Junjie, Seidl, Tristan Taylan, Chen, Kuan-Hsun, and Chen, Jian-Jia
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: Ensuring timing guarantees for every individual tasks is critical in real-time systems. Even for periodic tasks, providing timing guarantees for tasks with segmented self-suspending behavior is challenging due to timing anomalies, i.e., the reduction of execution or suspension time of some jobs increases the response time of another job. The release jitter of tasks can add further complexity to the situation, affecting the predictability and timing guarantees of real-time systems. The existing worst-case response time analyses for sporadic self-suspending tasks are only over-approximations and lead to overly pessimistic results. In this work, we address timing anomalies without compromising the worst-case response time (WCRT) analysis when scheduling periodic segmented self-suspending tasks with release jitter. We propose two treatments: segment release time enforcement and segment priority modification, and prove their effectiveness in eliminating timing anomalies. Our evaluation demonstrates that the proposed treatments achieve higher acceptance ratios in terms of schedulability compared to state-of-the-art scheduling algorithms. Additionally, we implement the segment-level fixed-priority scheduling mechanism on RTEMS and verify the validity of our segment priority modification treatment. This work expands our previous conference publication at the 29th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2023), which considers only periodic segmented self-suspending tasks without release jitter., Comment: This is an extension from a previous conference publication at the 29th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2023)
Published: 2024

17. MMR: Evaluating Reading Ability of Large Multimodal Models

Author: Chen, Jian, Zhang, Ruiyi, Zhou, Yufan, Rossi, Ryan, Gu, Jiuxiang, and Chen, Changyou
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images. Most existing text-rich image benchmarks are simple extraction-based question answering, and many LMMs now easily achieve high scores. This means that current benchmarks fail to accurately reflect performance of different models, and a natural idea is to build a new benchmark to evaluate their complex reasoning and spatial understanding abilities. In this work, we propose the Multi-Modal Reading (MMR) benchmark in 11 diverse tasks to evaluate LMMs for text-rich image understanding. MMR is the first text-rich image benchmark built on human annotations with the help of language models. By evaluating several state-of-the-art LMMs, including GPT-4o, it reveals the limited capabilities of existing LMMs underscoring the value of our benchmark.
Published: 2024

18. Investigating Language-Specific Calibration For Pruning Multilingual Large Language Models

Author: Kurz, Simon, Chen, Jian-Jia, Flek, Lucie, and Zhao, Zhixue
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recent advances in large language model (LLM) pruning have shown state-of-the-art (SotA) compression results in post-training and retraining-free settings while maintaining high predictive performance. However, previous research mainly considered calibrating based on English text, despite the multilingual nature of modern LLMs and their frequent use in non-English languages. In this paper, we set out to investigate calibrating the pruning of multilingual language models for monolingual applications. We present the first comprehensive empirical study, comparing different calibration languages for pruning multilingual models across diverse languages, tasks, models, and SotA pruning techniques. Our results offer practical suggestions, for example, calibrating in the target language can efficiently retain the language modeling capability but does not necessarily benefit downstream tasks. Through further analysis of latent subspaces, pruning masks, and individual neurons within pruned models, we find that while pruning generally preserves strong language-specific features, it may fail to retain language-specific neuron activation patterns and subtle, language-agnostic features associated with knowledge and reasoning that are needed for complex tasks.
Published: 2024

19. MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Author: Chen, Jian, Tiwari, Vashisth, Sadhukhan, Ranajoy, Chen, Zhuoming, Shi, Jinyuan, Yen, Ian En-Hsu, and Chen, Beidi
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have become more prevalent in long-context applications such as interactive chatbots, document analysis, and agent workflows, but it is challenging to serve long-context requests with low latency and high throughput. Speculative decoding (SD) is a widely used technique to reduce latency without sacrificing performance but the conventional wisdom suggests that its efficacy is limited to small batch sizes. In MagicDec, we show that surprisingly SD can achieve speedup even for a high throughput inference regime for moderate to long sequences. More interestingly, an intelligent drafting strategy can achieve better speedup with increasing batch size based on our rigorous analysis. MagicDec first identifies the bottleneck shifts with increasing batch size and sequence length, and uses these insights to deploy speculative decoding more effectively for high throughput inference. Then, it leverages draft models with sparse KV cache to address the KV bottleneck that scales with both sequence length and batch size. This finding underscores the broad applicability of speculative decoding in long-context serving, as it can enhance throughput and reduce latency without compromising accuracy. For moderate to long sequences, we demonstrate up to 2x speedup for LLaMA-2-7B-32K and 1.84x speedup for LLaMA-3.1-8B when serving batch sizes ranging from 32 to 256 on 8 NVIDIA A100 GPUs. The code is available at https://github.com/Infini-AI-Lab/MagicDec/.
Published: 2024

20. Two points are enough

Author: Liu, Hao, Zhao, Yanbin, Zheng, Huarong, Fan, Xiulin, Deng, Zhihua, Chen, Mengchi, Wang, Xingkai, Liu, Zhiyang, Lu, Jianguo, and Chen, Jian
Subjects: Condensed Matter - Materials Science, Physics - Data Analysis, Statistics and Probability
Abstract: Prognosis and diagnosis play an important role in accelerating the development of lithium-ion batteries, as well as reliable and long-life operation. In this work, we answer an important question: What is the minimum amount of data required to extract features for accurate battery prognosis and diagnosis? Based on the first principle, we successfully extracted the best two-point feature (BTPF) for accurate battery prognosis and diagnosis using the fewest data points (only two) and the simplest feature selection method (Pearson correlation coefficient). The BTPF extraction method is tested on 820 cells from 6 open-source datasets (covering five different chemistry types, seven manufacturers, and three data types). It achieves comparable accuracy to state-of-the-art features in both prognosis and diagnosis tasks. This work challenges the cognition of existing studies on the difficulty of battery prognosis and diagnosis tasks, subverts the fixed pattern of establishing prognosis and diagnosis methods for complex dynamic systems through deliberate feature engineering, highlights the promise of data-driven methods for field battery prognosis and diagnosis applications, and provides a new benchmark for future studies.
Published: 2024

21. Bridging the Gap between ROS~2 and Classical Real-Time Scheduling for Periodic Tasks

Author: Teper, Harun, Bell, Oren, Günzel, Mario, Gill, Chris, and Chen, Jian-Jia
Subjects: Computer Science - Robotics
Abstract: The Robot Operating System 2 (ROS~2) is a widely used middleware that provides software libraries and tools for developing robotic systems. In these systems, tasks are scheduled by ROS~2 executors. Since the scheduling behavior of the default ROS~2 executor is inherently different from classical real-time scheduling theory, dedicated analyses or alternative executors, requiring substantial changes to ROS~2, have been required. In 2023, the events executor, which features an events queue and allows the possibility to make scheduling decisions immediately after a job completes, was introduced into ROS~2. In this paper, we show that, with only minor modifications of the events executor, a large body of research results from classical real-time scheduling theory becomes applicable. Hence, this enables analytical bounds on the worst-case response time and the end-to-end latency, outperforming bounds for the default ROS 2 executor in many scenarios. Our solution is easy to integrate into existing ROS 2 systems since it requires only minor backend modifications of the events executor, which is natively included in ROS 2. The evaluation results show that our ROS~2 events executor with minor modifications can have significant improvement in terms of dropped jobs, worst-case response time, end-to-end latency, and performance compared to the default ROS~2 executor.
Published: 2024

22. LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

Author: Zhang, Ruiyi, Zhou, Yufan, Chen, Jian, Gu, Jiuxiang, Chen, Changyou, and Sun, Tong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Large multimodal language models have demonstrated impressive capabilities in understanding and manipulating images. However, many of these models struggle with comprehending intensive textual contents embedded within the images, primarily due to the limited text recognition and layout understanding ability. To understand the sources of these limitations, we perform an exploratory analysis showing the drawbacks of classical visual encoders on visual text understanding. Hence, we present LLaVA-Read, a multimodal large language model that utilizes dual visual encoders along with a visual text encoder. Our model surpasses existing state-of-the-art models in various text-rich image understanding tasks, showcasing enhanced comprehension of textual content within images. Together, our research suggests visual text understanding remains an open challenge and an efficient visual text encoder is crucial for future successful multimodal systems., Comment: NeurIPS 2024 Under Review
Published: 2024

23. Descent Methods for Vector Optimization Problems: A Majorization-Minimization Perspective

Author: Chen, Jian, Tang, Liping, and Yang, Xinmin
Subjects: Mathematics - Optimization and Control, 90C29, 90C30
Abstract: In this paper, we develop a unified framework and convergence analysis of descent methods for vector optimization problems (VOPs) from a majorization-minimization perspective. By choosing different surrogate functions, the generic method reduces to some existing descent methods with and without line search, respectively. The unified convergence analysis shows that the slow convergence of the steepest descent method is mainly due to the large gap between surrogate and objective functions. As a result, the performance of descent methods can be improved by narrowing the surrogate gap. Interestingly, we observe that selecting a tighter surrogate function is equivalent to using an appropriate base of the dual cone in the direction-finding subproblem. Furthermore, we use Barzilai-Borwein method to narrow the surrogate gap and devise a Barzilai-Borwein descent method for VOPs with polyhedral cone. By reformulating the subproblem, we provide a new insight into the Barzilai-Borwein descent method and bridge it to the steepest descent method. Finally, several numerical experiments confirm the efficiency of the proposed method.
Published: 2024

24. Nematic Ising superconductivity with hidden magnetism in few-layer 6R-TaS2

Author: Liu, Shao-Bo, Tian, Congkuan, Fang, Yuqiang, Rong, Hongtao, Cao, Lu, Wei, Xinjian, Cui, Hang, Chen, Mantang, Chen, Di, Song, Yuanjun, Cui, Jian, Li, Jiankun, Guan, Shuyue, Jia, Shuang, Chen, Chaoyu, He, Wenyu, Huang, Fuqiang, Jiang, Yuhang, Mao, Jinhai, Xie, X. C., Law, K. T., and Chen, Jian-Hao
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Materials Science, Condensed Matter - Superconductivity
Abstract: In van der Waals heterostructures (vdWHs), the manipulation of interlayer stacking/coupling allows for the construction of customizable quantum systems exhibiting exotic physics. An illustrative example is the diverse range of states of matter achieved through varying the proximity coupling between two-dimensional (2D) quantum spin liquid (QSL) and superconductors within the TaS2 family. This study presents a demonstration of the intertwined physics of spontaneous rotational symmetry breaking, hidden magnetism, and Ising superconductivity in the three-fold rotationally symmetric, non-magnetic natural vdWHs 6R-TaS2. A distinctive phase emerges in 6R-TaS2 below a characteristic temperature (T*) of approximately 30 K, which is characterized by a remarkable set of features, including a giant extrinsic anomalous Hall effect (AHE), Kondo screening, magnetic field-tunable thermal hysteresis, and nematic magneto-resistance. At lower temperatures, a coexistence of nematicity and Kondo screening with Ising superconductivity is observed, providing compelling evidence of hidden magnetism within a superconductor. This research not only sheds light on unexpected emergent physics resulting from the coupling of itinerant electrons and localized/correlated electrons in natural vdWHs but also emphasizes the potential for tailoring exotic quantum states through the manipulation of interlayer interactions., Comment: 16 pages, 4 figures
Published: 2024

25. BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

Author: Lin, Haohong, Ding, Wenhao, Chen, Jian, Shi, Laixi, Zhu, Jiacheng, Li, Bo, and Zhao, Ding
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data for MBRL. Subsequently, we introduce \textbf{B}ilin\textbf{E}ar \textbf{CAUS}al r\textbf{E}presentation~(BECAUSE), an algorithm to capture causal representation for both states and actions to reduce the influence of the distribution shift, thus mitigating the objective mismatch problem. Comprehensive evaluations on 18 tasks that vary in data quality and environment context demonstrate the superior performance of BECAUSE over existing offline RL algorithms. We show the generalizability and robustness of BECAUSE under fewer samples or larger numbers of confounders. Additionally, we offer theoretical analysis of BECAUSE to prove its error bound and sample efficiency when integrating causal representation into offline MBRL.
Published: 2024

26. Extraction of fissile isotope antineutrino spectra using feedforward neural network

Author: Chen, Jian, Wang, Jun, Wang, Wei, and Wei, Yuehuan
Subjects: High Energy Physics - Phenomenology
Abstract: Precise measurement of antineutrino spectra produced by isotope fission in reactors is of great significance for studying neutrino oscillations, refining nuclear databases, and addressing the reactor antineutrino anomaly. This work reports a method utilizing a feedforward neural network (FNN) model to decompose the reconstructed measured prompt energy spectrum observed by a short-baseline reactor neutrino experiment and extract the antineutrino spectra produced by the fission of major isotopes such as $^{235}$U, $^{238}$U, $^{239}$Pu, and $^{241}$Pu in a nuclear reactor. We present two training strategies for this model and compare them with the traditional $\chi^2$ minimization method, analyzing the same set of pseudo-data for a total exposure of $(2.9\times 5\times 1800)~\rm{GW_{th}\cdot tons\cdot days}$. The results show that the FNN model not only converges faster and better during the fitting process but also achieves relative errors in the extracted spectra within 1\% in the $2-8$ MeV range, outperforming the $\chi^2$ minimization method. The feasibility and superiority of this method have been validated in this study.
Published: 2024

27. Particle-In-Cell simulations of filamentation process in magnetized plasma of capacitively-coupled radio-frequency discharge

Author: Huang, Huidong, Chen, Jian, and Wang, Zhibin
Subjects: Physics - Plasma Physics
Abstract: In the uniform raido-frequency capacitively-coupled plasma (RF-CCP) between a large electrode pair, adding an axial magnetic field induces diverse longitudinal filaments. This phenomenon, termed 'filamentation', challenges conventional understanding and remains poorly understood to date. To reveal its pattern dynamics, we conduct 2D Particle-In-Cell simulations to comprehensively examine whole process of filamentation, identifying two distinct stages. Initially, standing waves grows with a modulational instability, forming regular filaments. Subsequently, when initial wavenumber matching relation breaks, the plasma shifts towards dynamic regime governed by competition between Lorentz and thermal pressure forces, characterized by filaments' chaotic evolution. These novel clues pave the way to theoretically understanding the filamentation instability, and provides essential references in effectively manipulating the magnetized plasmas.
Published: 2024

28. Enhanced Long-Tailed Recognition with Contrastive CutMix Augmentation

Author: Pan, Haolin, Guo, Yong, Yu, Mianjie, and Chen, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Real-world data often follows a long-tailed distribution, where a few head classes occupy most of the data and a large number of tail classes only contain very limited samples. In practice, deep models often show poor generalization performance on tail classes due to the imbalanced distribution. To tackle this, data augmentation has become an effective way by synthesizing new samples for tail classes. Among them, one popular way is to use CutMix that explicitly mixups the images of tail classes and the others, while constructing the labels according to the ratio of areas cropped from two images. However, the area-based labels entirely ignore the inherent semantic information of the augmented samples, often leading to misleading training signals. To address this issue, we propose a Contrastive CutMix (ConCutMix) that constructs augmented samples with semantically consistent labels to boost the performance of long-tailed recognition. Specifically, we compute the similarities between samples in the semantic space learned by contrastive learning, and use them to rectify the area-based labels. Experiments show that our ConCutMix significantly improves the accuracy on tail classes as well as the overall performance. For example, based on ResNeXt-50, we improve the overall accuracy on ImageNet-LT by 3.0% thanks to the significant improvement of 3.3% on tail classes. We highlight that the improvement also generalizes well to other benchmarks and models. Our code and pretrained models are available at https://github.com/PanHaulin/ConCutMix., Comment: 16 pages and 13 figures
Published: 2024

29. Graph Linear Canonical Transform: Definition, Vertex-Frequency Analysis and Filter Design

Author: Chen, Jian Yi and Li, Bing Zhao
Subjects: Mathematics - General Mathematics
Abstract: This paper proposes a graph linear canonical transform (GLCT) by decomposing the linear canonical parameter matrix into fractional Fourier transform, scale transform, and chirp modulation for graph signal processing. The GLCT enables adjustable smoothing modes, enhancing alignment with graph signals. Leveraging traditional fractional domain time-frequency analysis, we investigate vertex-frequency analysis in the graph linear canonical domain, aiming to overcome limitations in capturing local information. Filter design methods, including optimal design and learning with stochastic gradient descent, are analyzed and applied to image classification tasks. The proposed GLCT and vertex-frequency analysis present innovative approaches to signal processing challenges, with potential applications in various fields., Comment: 13 pages, 11 figures
Published: 2024

30. New Spin Structure Constraints on Hyperfine Splitting and Proton Size

Author: Ruth, David, Slifer, Karl, Chen, Jian-Ping, Carlson, Carl E., Hagelstein, Franziska, Pascalutsa, Vladimir, Deur, Alexandre, Kuhn, Sebastian, Ripani, Marco, Zheng, Xiaochao, Zielinski, Ryan, and Gu, Chao
Subjects: Nuclear Experiment, Nuclear Theory, Physics - Atomic Physics
Abstract: The 1S hyperfine splitting in hydrogen is measured to an impressive ppt precision and will soon be measured to ppm precision in muonic hydrogen. The latter measurement will rely on theoretical predictions, which are limited by knowledge of the proton polarizability effect $\Delta_\text{pol}$. Data-driven evaluations of $\Delta_\text{pol}$ have long been in significant tension with baryon chiral perturbation theory. Here we present improved results for $\Delta_\text{pol}$ driven by new spin structure data, reducing the long-standing tension between theory and experiment and halving the dominating uncertainty in hyperfine splitting calculations.
Published: 2024

31. Analysis of the Causes of Car Accidents in the United States of America in 2023: Gauge People Understanding of Data Visualisation

Author: Alhazmi, Hamoud, Morales, Marcelo, Jiang, Jiachen, Zhou, Jinxin, and Chen, Jian
Subjects: Computer Science - Human-Computer Interaction
Abstract: This paper presents a comprehensive examination of interactive data visualization tools and their efficacy in the context of United States car accident data for the year 2023. We developed interactive heatmaps, histograms, and pie charts to enhance the understanding of accident severity distribution over time and location. Our research included the creation and distribution of an online survey, consisting of nine questions designed to test participants comprehension of the presented data. Fifteen respondents were recruited to complete the survey, with the intent of assessing the effectiveness of both static and interactive versions of each visualization tool. The results indicated that participants using interactive heatmaps showed a greater understanding of the data, as compared to those using histograms and pie charts. In contrast, no notable difference in comprehension was observed between users of static and interactive histograms. Unexpectedly, static pie charts were found to be slightly more effective than their interactive counterparts. These findings suggest that while interactive visualizations can be powerful, their utility may vary depending on the type and complexity of the data presented. Future research is recommended to explore the influence of socioeconomic factors on the understanding of car accident data, potentially leading to more tailored and effective visualization strategies. This could provide deeper insights into the patterns and causes of car accidents, facilitating better-informed decision-making for stakeholders. Visit our website to explore our interactive plots and engage directly with the data for a more comprehensive understanding of our findings., Comment: 5 pages, 7 figures
Published: 2024

32. Graph Neural Networks for Job Shop Scheduling Problems: A Survey

Author: Smit, Igor G., Zhou, Jianan, Reijnen, Robbert, Wu, Yaoxin, Chen, Jian, Zhang, Cong, Bukhsh, Zaharah, Nuijten, Wim, and Zhang, Yingqian
Subjects: Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Job shop scheduling problems (JSSPs) represent a critical and challenging class of combinatorial optimization problems. Recent years have witnessed a rapid increase in the application of graph neural networks (GNNs) to solve JSSPs, albeit lacking a systematic survey of the relevant literature. This paper aims to thoroughly review prevailing GNN methods for different types of JSSPs and the closely related flow-shop scheduling problems (FSPs), especially those leveraging deep reinforcement learning (DRL). We begin by presenting the graph representations of various JSSPs, followed by an introduction to the most commonly used GNN architectures. We then review current GNN-based methods for each problem type, highlighting key technical elements such as graph representations, GNN architectures, GNN tasks, and training algorithms. Finally, we summarize and analyze the advantages and limitations of GNNs in solving JSSPs and provide potential future research opportunities. We hope this survey can motivate and inspire innovative approaches for more powerful GNN-based approaches in tackling JSSPs and other scheduling problems.
Published: 2024

33. TREE: Tree Regularization for Efficient Execution

Author: Schmid, Lena, Biebert, Daniel, Hakert, Christian, Chen, Kuan-Hsun, Lang, Michel, Pauly, Markus, and Chen, Jian-Jia
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time consumption for inference in order to optimally utilize the available resources. Random forests and decision trees are shown to be a suitable model for such a scenario, since they are not only heavily tunable towards the total model size, but also offer a high potential for optimizing their executions according to the underlying memory architecture. In addition to the straightforward strategy of enforcing shorter paths through decision trees and hence reducing the execution time for inference, hardware-aware implementations can optimize the execution time in an orthogonal manner. One particular hardware-aware optimization is to layout the memory of decision trees in such a way, that higher probably paths are less likely to be evicted from system caches. This works particularly well when splits within tree nodes are uneven and have a high probability to visit one of the child nodes. In this paper, we present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees at the cost of a minimal accuracy degradation. Specifically, we regularize the impurity computation of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria and hence offer a high optimization potential for a memory architecture-aware implementation. We show that especially for binary classification data sets and data sets with many samples, this form of regularization can lead to an reduction of up to approximately four times in the execution time with a minimal accuracy degradation.
Published: 2024

34. Three-dimensional quantum Griffiths singularity in bulk iron-pnictide superconductors

Author: Liu, Shao-Bo, Tian, Congkuan, Cai, Yongqing, Cui, Hang, Wei, Xinjian, Chen, Mantang, Zhao, Yang, Sui, Yuan, Guan, Shuyue, Jia, Shuang, Zhang, Yu, Feng, Ya, Li, Jiankun, Cui, Jian, Song, Yuanjun, Hao, Tingting, Chen, Chaoyu, and Chen, Jian-Hao
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Superconductivity
Abstract: The quantum Griffiths singularity (QGS) is a phenomenon driven by quenched disorders that break conventional scaling invariance and result in a divergent dynamical critical exponent during quantum phase transitions (QPT). While this phenomenon has been well-documented in low-dimensional conventional superconductors and in three-dimensional (3D) magnetic metal systems, its presence in 3D superconducting systems and in unconventional high-temperature superconductors (high-Tc SCs) remains unclear. In this study, we report the observation of robust QGS in the superconductor-metal transition (SMT) of both quasi-2D and 3D anisotropic unconventional high-Tc superconductor CaFe1-xNixAsF (x < 5%) bulk single crystals, where the QGS states persist to up to 5.3 K. A comprehensive quantum phase diagram is established that delineates the 3D anisotropic QGS of SMT induced by perpendicular and parallel magnetic field. Our findings reveal the universality of QGS in 3D superconducting systems and unconventional high-Tc SCs, thereby substantially expanding the range of applicability of QGS., Comment: 17 pages, 4 figures
Published: 2024

35. Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps

Author: Chen, Jian, Zhou, Peilin, Hua, Yining, Chong, Dading, Cao, Meng, Li, Yaowei, Yuan, Zixuan, Zhu, Bing, and Liang, Junwei
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Real-time detection and prediction of extreme weather protect human lives and infrastructure. Traditional methods rely on numerical threshold setting and manual interpretation of weather heatmaps with Geographic Information Systems (GIS), which can be slow and error-prone. Our research redefines Extreme Weather Events Detection (EWED) by framing it as a Visual Question Answering (VQA) problem, thereby introducing a more precise and automated solution. Leveraging Vision-Language Models (VLM) to simultaneously process visual and textual data, we offer an effective aid to enhance the analysis process of weather heatmaps. Our initial assessment of general-purpose VLMs (e.g., GPT-4-Vision) on EWED revealed poor performance, characterized by low accuracy and frequent hallucinations due to inadequate color differentiation and insufficient meteorological knowledge. To address these challenges, we introduce ClimateIQA, the first meteorological VQA dataset, which includes 8,760 wind gust heatmaps and 254,040 question-answer pairs covering four question types, both generated from the latest climate reanalysis data. We also propose Sparse Position and Outline Tracking (SPOT), an innovative technique that leverages OpenCV and K-Means clustering to capture and depict color contours in heatmaps, providing ClimateIQA with more accurate color spatial location information. Finally, we present Climate-Zoo, the first meteorological VLM collection, which adapts VLMs to meteorological applications using the ClimateIQA dataset. Experiment results demonstrate that models from Climate-Zoo substantially outperform state-of-the-art general VLMs, achieving an accuracy increase from 0% to over 90% in EWED verification. The datasets and models in this study are publicly available for future climate science research: https://github.com/AlexJJJChen/Climate-Zoo.
Published: 2024

36. TRINS: Towards Multimodal Language Models that Can Read

Author: Zhang, Ruiyi, Zhang, Yanzhe, Chen, Jian, Zhou, Yufan, Gu, Jiuxiang, Chen, Changyou, and Sun, Tong
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Large multimodal language models have shown remarkable proficiency in understanding and editing images. However, a majority of these visually-tuned models struggle to comprehend the textual content embedded in images, primarily due to the limitation of training data. In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the multimodal large language model. TRINS is built upon LAION using hybrid data annotation strategies that include machine-assisted and human-assisted annotation processes. It contains 39,153 text-rich images, captions, and 102,437 questions. Specifically, we show that the number of words per annotation in TRINS is significantly longer than that of related datasets, providing new challenges. Furthermore, we introduce a simple and effective architecture, called a Language-vision Reading Assistant (LaRA), which is good at understanding textual content within images. LaRA outperforms existing state-of-the-art multimodal large language models on the TRINS dataset, as well as other classical benchmarks. Lastly, we conducted a comprehensive evaluation with TRINS on various text-rich image understanding and generation tasks, demonstrating its effectiveness., Comment: CVPR 2024
Published: 2024

37. Physics-informed Inverse Design of Multi-bit Programmable Metasurfaces

Author: Xu, Yucheng, Yang, Jia-Qi, Fan, Kebin, Wang, Sheng, Wu, Jingbo, Zhang, Caihong, Zhan, De-Chuan, Padilla, Willie J., Jin, Biaobing, Chen, Jian, and Wu, Peiheng
Subjects: Physics - Optics, Physics - Applied Physics
Abstract: Emerging reconfigurable metasurfaces offer various possibilities in programmatically manipulating electromagnetic waves across spatial, spectral, and temporal domains, showcasing great potential for enhancing terahertz applications. However, they are hindered by limited tunability, particularly evident in relatively small phase tuning over 270o, due to the design constraints with time-intensive forward design methodologies. Here, we demonstrate a multi-bit programmable metasurface capable of terahertz beam steering, facilitated by a developed physics-informed inverse design (PIID) approach. Through integrating a modified coupled mode theory (MCMT) into residual neural networks, our PIID algorithm not only significantly increases the design accuracy compared to conventional neural networks but also elucidates the intricate physical relations between the geometry and the modes. Without decreasing the reflection intensity, our method achieves the enhanced phase tuning as large as 300o. Additionally, we experimentally validate the inverse designed programmable beam steering metasurface, which is adaptable across 1-bit, 2-bit, and tri-state coding schemes, yielding a deflection angle up to 68o and broadened steering coverage. Our demonstration provides a promising pathway for rapidly exploring advanced metasurface devices, with potentially great impact on communication and imaging technologies.
Published: 2024

38. On the Inflation of KNN-Shapley Value

Author: Yang, Ziao, Yue, Han, Chen, Jian, and Liu, Hongfu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Shapley value-based data valuation methods, originating from cooperative game theory, quantify the usefulness of each individual sample by considering its contribution to all possible training subsets. Despite their extensive applications, these methods encounter the challenge of value inflation - while samples with negative Shapley values are detrimental, some with positive values can also be harmful. This challenge prompts two fundamental questions: the suitability of zero as a threshold for distinguishing detrimental from beneficial samples and the determination of an appropriate threshold. To address these questions, we focus on KNN-Shapley and propose Calibrated KNN-Shapley (CKNN-Shapley), which calibrates zero as the threshold to distinguish detrimental samples from beneficial ones by mitigating the negative effects of small-sized training subsets. Through extensive experiments, we demonstrate the effectiveness of CKNN-Shapley in alleviating data valuation inflation, detecting detrimental samples, and assessing data quality. We also extend our approach beyond conventional classification settings, applying it to diverse and practical scenarios such as learning with mislabeled data, online learning with stream data, and active learning for label annotation.
Published: 2024

39. Revisit, Extend, and Enhance Hessian-Free Influence Functions

Author: Yang, Ziao, Yue, Han, Chen, Jian, and Liu, Hongfu
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Influence functions serve as crucial tools for assessing sample influence in model interpretation, subset training set selection, noisy label detection, and more. By employing the first-order Taylor extension, influence functions can estimate sample influence without the need for expensive model retraining. However, applying influence functions directly to deep models presents challenges, primarily due to the non-convex nature of the loss function and the large size of model parameters. This difficulty not only makes computing the inverse of the Hessian matrix costly but also renders it non-existent in some cases. Various approaches, including matrix decomposition, have been explored to expedite and approximate the inversion of the Hessian matrix, with the aim of making influence functions applicable to deep models. In this paper, we revisit a specific, albeit naive, yet effective approximation method known as TracIn. This method substitutes the inverse of the Hessian matrix with an identity matrix. We provide deeper insights into why this simple approximation method performs well. Furthermore, we extend its applications beyond measuring model utility to include considerations of fairness and robustness. Finally, we enhance TracIn through an ensemble strategy. To validate its effectiveness, we conduct experiments on synthetic data and extensive evaluations on noisy label detection, sample selection for large language model fine-tuning, and defense against adversarial attacks.
Published: 2024

40. FinTextQA: A Dataset for Long-form Financial Question Answering

Author: Chen, Jian, Zhou, Peilin, Hua, Yining, Loh, Yingxin, Chen, Kehui, Li, Ziyuan, Zhu, Bing, and Liang, Junwei
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Accurate evaluation of financial question answering (QA) systems necessitates a comprehensive dataset encompassing diverse question types and contexts. However, current financial QA datasets lack scope diversity and question complexity. This work introduces FinTextQA, a novel dataset for long-form question answering (LFQA) in finance. FinTextQA comprises 1,262 high-quality, source-attributed QA pairs extracted and selected from finance textbooks and government agency websites.Moreover, we developed a Retrieval-Augmented Generation (RAG)-based LFQA system, comprising an embedder, retriever, reranker, and generator. A multi-faceted evaluation approach, including human ranking, automatic metrics, and GPT-4 scoring, was employed to benchmark the performance of different LFQA system configurations under heightened noisy conditions. The results indicate that: (1) Among all compared generators, Baichuan2-7B competes closely with GPT-3.5-turbo in accuracy score; (2) The most effective system configuration on our dataset involved setting the embedder, retriever, reranker, and generator as Ada2, Automated Merged Retrieval, Bge-Reranker-Base, and Baichuan2-7B, respectively; (3) models are less susceptible to noise after the length of contexts reaching a specific threshold.
Published: 2024

41. Integrated and DC-powered superconducting microcomb

Author: Wang, Chen-Guang, Xu, Wuyue, Li, Chong, Shi, Lili, Jiang, Junliang, Guo, Tingting, Yue, Wen-Cheng, Li, Tianyu, Zhang, Ping, Lyu, Yang-Yang, Pan, Jiazheng, Deng, Xiuhao, Dong, Ying, Tu, Xuecou, Dong, Sining, Cao, Chunhai, Zhang, Labao, Jia, Xiaoqing, Sun, Guozhu, Kang, Lin, Chen, Jian, Wang, Yong-Lei, Wang, Huabing, and Wu, Peiheng
Subjects: Condensed Matter - Superconductivity, Physics - Applied Physics, Quantum Physics
Abstract: Frequency combs, specialized laser sources emitting multiple equidistant frequency lines, have revolutionized science and technology with unprecedented precision and versatility. Recently, integrated frequency combs are emerging as scalable solutions for on-chip photonics. Here, we demonstrate a fully integrated superconducting microcomb that is easy to manufacture, simple to operate, and consumes ultra-low power. Our turnkey apparatus comprises a basic nonlinear superconducting device, a Josephson junction, directly coupled to a superconducting microstrip resonator. We showcase coherent comb generation through self-started mode-locking. Therefore, comb emission is initiated solely by activating a DC bias source, with power consumption as low as tens of picowatts. The resulting comb spectrum resides in the microwave domain and spans multiple octaves. The linewidths of all comb lines can be narrowed down to 1 Hz through a unique coherent injection-locking technique. Our work represents a critical step towards fully integrated microwave photonics and offers the potential for integrated quantum processors.
Published: 2024
Full Text: View/download PDF

42. Tunable superconducting resonators via on-chip control of local magnetic field

Author: Wang, Chen-Guang, Yue, Wen-Cheng, Tu, Xuecou, Chi, Tianyuan, Guo, Tingting, Lyu, Yang-Yang, Dong, Sining, Cao, Chunhai, Zhang, Labao, Jia, Xiaoqing, Sun, Guozhu, Kang, Lin, Chen, Jian, Wang, Yong-Lei, Wang, Huabing, and Wu, Peiheng
Subjects: Condensed Matter - Superconductivity, Physics - Applied Physics
Abstract: Superconducting microwave resonators play a pivotal role in superconducting quantum circuits. The ability to fine-tune their resonant frequencies provides enhanced control and flexibility. Here, we introduce a frequency-tunable superconducting coplanar waveguide resonator. By applying electrical currents through specifically designed ground wires, we achieve the generation and control of a localized magnetic field on the central line of the resonator, enabling continuous tuning of its resonant frequency. We demonstrate a frequency tuning range of 54.85 MHz in a 6.21 GHz resonator. This integrated and tunable resonator holds great potential as a dynamically tunable filter and as a key component of communication buses and memory elements in superconducting quantum computing.
Published: 2024
Full Text: View/download PDF

43. Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

Author: Chen, Chen, Qiao, Kai, Yang, Jie, Chen, Jian, and Yan, Bin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Element segmentation is a key step in nondestructive testing of Printed Circuit Boards (PCB) based on Computed Tomography (CT) technology. In recent years, the rapid development of self-supervised pretraining technology can obtain general image features without labeled samples, and then use a small amount of labeled samples to solve downstream tasks, which has a good potential in PCB element segmentation. At present, Masked Image Modeling (MIM) pretraining model has been initially applied in PCB CT image element segmentation. However, due to the small and regular size of PCB elements such as vias, wires, and pads, the global visual field has redundancy for a single element reconstruction, which may damage the performance of the model. Based on this issue, we propose an efficient pretraining model based on multi-scale local visual field feature reconstruction for PCB CT image element segmentation (EMLR-seg). In this model, the teacher-guided MIM pretraining model is introduced into PCB CT image element segmentation for the first time, and a multi-scale local visual field extraction (MVE) module is proposed to reduce redundancy by focusing on local visual fields. At the same time, a simple 4-Transformer-blocks decoder is used. Experiments show that EMLR-seg can achieve 88.6% mIoU on the PCB CT image dataset we proposed, which exceeds 1.2% by the baseline model, and the training time is reduced by 29.6 hours, a reduction of 17.4% under the same experimental condition, which reflects the advantage of EMLR-seg in terms of performance and efficiency.
Published: 2024

44. The Douglas question on the Bergman and Fock spaces

Author: Chen, Jian-hua, Leng, Qianrui, and Zhao, Xianfeng
Subjects: Mathematics - Functional Analysis, 47B35
Abstract: Let $\mu$ be a positive Borel measure and $T_\mu$ be the bounded Toeplitz operator induced by $\mu$ on the Bergman or Fock space. In this paper, we mainly investigate the invertibility of the Toeplitz operator $T_\mu$ and the Douglas question on the Bergman and Fock spaces. In the Bergman-space setting, we obtain several necessary and sufficient conditions for the invertibility of $T_\mu$ in terms of the Berezin transform of $\mu$ and the reverse Carleson condition in two classical cases: (1) $\mu$ is absolutely continuous with respect to the normalized area measure on the open unit disk $\mathbb D$; (2) $\mu$ is the pull-back measure of the normalized area measure under an analytic self-mapping of $\mathbb D$. Nonetheless, we show that there exists a Carleson measure for the Bergman space such that its Berezin transform is bounded below but the corresponding Toeplitz operator is not invertible. On the Fock space, we show that $T_\mu$ is invertible if and only if $\mu$ is a reverse Carleson measure, but the invertibility of $T_\mu$ is not completely determined by the invertibility of the Berezin transform of $\mu$. These suggest that the answers to the Douglas question for Toeplitz operators induced by positive measures on the Bergman and Fock spaces are both negative in general cases., Comment: 17 pages
Published: 2024

45. Three-dimensional hidden phase probed by in-plane magnetotransport in kagome metal CsV$_3$Sb$_5$ thin flakes

Author: Wei, Xinjian, Tian, Congkuan, Cui, Hang, Zhai, Yuxin, Li, Yongkai, Liu, Shaobo, Song, Yuanjun, Feng, Ya, Huang, Miaoling, Wang, Zhiwei, Liu, Yi, Xiong, Qihua, Yao, Yugui, Xie, X. C., and Chen, Jian-Hao
Subjects: Condensed Matter - Strongly Correlated Electrons, Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: Transition metal compounds with kagome structure have been found to exhibit a variety of exotic structural, electronic, and magnetic orders. These orders are competing with energies very close to each other, resulting in complex phase transitions. Some of the phases are easily observable, such as the charge density wave (CDW) and the superconducting phase, while others are more challenging to identify and characterize. Here we present magneto-transport evidence of a new phase below ~35 K in the kagome topological metal CsV$_3$Sb$_5$ (CVS) thin flakes between the CDW and the superconducting transition temperatures. This phase is characterized by six-fold rotational symmetry in the in-plane magnetoresistance (MR) and is connected to the orbital current order in CVS. Furthermore, the phase is characterized by a large in-plane negative magnetoresistance, which suggests the existence of a three-dimensional, magnetic field-tunable orbital current ordered phase. Our results highlight the potential of magneto-transport to reveal the interactions between exotic quantum states of matter and to uncover the symmetry of such hidden phases.
Published: 2024

46. Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models

Author: Chhabra, Anshuman, Li, Bo, Chen, Jian, Mohapatra, Prasant, and Liu, Hongfu
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: A core data-centric learning challenge is the identification of training samples that are detrimental to model performance. Influence functions serve as a prominent tool for this task and offer a robust framework for assessing training data influence on model predictions. Despite their widespread use, their high computational cost associated with calculating the inverse of the Hessian matrix pose constraints, particularly when analyzing large-sized deep models. In this paper, we establish a bridge between identifying detrimental training samples via influence functions and outlier gradient detection. This transformation not only presents a straightforward and Hessian-free formulation but also provides insights into the role of the gradient in sample impact. Through systematic empirical evaluations, we first validate the hypothesis of our proposed outlier gradient analysis approach on synthetic datasets. We then demonstrate its effectiveness in detecting mislabeled samples in vision models and selecting data samples for improving performance of natural language processing transformer models. We also extend its use to influential sample identification for fine-tuning Large Language Models.
Published: 2024

47. A Subspace Minimization Barzilai-Borwein Method for Multiobjective Optimization Problems

Author: Chen, Jian and Yang, Liping Tang. Xinmin
Subjects: Mathematics - Optimization and Control, 90C29, 90C30
Abstract: Nonlinear conjugate gradient methods have recently garnered significant attention within the multiobjective optimization community. These methods aim to maintain consistency in conjugate parameters with their single-objective optimization counterparts. However, the preservation of the attractive conjugate property of search directions remains uncertain, even for quadratic cases, in multiobjective conjugate gradient methods. This loss of interpretability of the last search direction significantly limits the applicability of these methods. To shed light on the role of the last search direction, we introduce a novel approach called the subspace minimization Barzilai-Borwein method for multiobjective optimization problems (SMBBMO). In SMBBMO, each search direction is derived by optimizing a preconditioned Barzilai-Borwein subproblem within a two-dimensional subspace generated by the last search direction and the current Barzilai-Borwein descent direction. Furthermore, to ensure the global convergence of SMBBMO, we employ a modified Cholesky factorization on a transformed scale matrix, capturing the local curvature information of the problem within the two-dimensional subspace. Under mild assumptions, we establish both global and $Q$-linear convergence of the proposed method. Finally, comparative numerical experiments confirm the efficacy of SMBBMO, even when tackling large-scale and ill-conditioned problems., Comment: arXiv admin note: text overlap with arXiv:2309.06929
Published: 2024

48. Segmented Model-Based Hydrogen Delivery Control for PEM Fuel Cells: a Port-Hamiltonian Approach

Author: Kumar, Lalitesh, Chen, Jian, Wu, Chengshuai, Chen, Yuzhu, and van der Schaft, Arjan
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: This paper proposes an extended interconnection and damping assignment passivity-based control technique (IDA-PBC) to control the pressure dynamics in the fuel delivery subsystem (FDS) of proton exchange membrane fuel cells. The fuel cell stack is a distributed parameter model which can be modeled by partial differential equations PDEs). In this paper, the segmentation concept is used to approximate the PDEs model by ordinary differential equations (ODEs) model. Therefore, each segments are having multiple ODEs to obtain the lump-sum model of the segments. Subsequently, a generalized multi-input multi-output lumped parameters model is developed in port-Hamiltonian framework based on mass balance to minimize the modeling error. The modeling errors arises due to the difference between spatially distributed pressures in FDS segments, and also due to the difference between the actual stack pressure and the measured output pressure of the anode. The segments interconnection feasibilities are ensured by maintaining passivity of each segment. With consideration of re-circulation and bleeding of the anode in the modeling, an extended energy-shaping and output tracking IDA-PBC based state-feedback controller is proposed to control the spatially distributed pressure dynamics in the anode. Furthermore, a sliding mode observer of high order is designed to estimate the unmeasurable pressures in FDS with known disturbances. Performance recovery of output feedback control is accomplished with explicit stability analysis. The effectiveness of the proposed IDA-PBC approach is validated by the simulation results., Comment: 12 pages, 11 Figures
Published: 2024

49. Register Your Forests: Decision Tree Ensemble Optimization by Explicit CPU Register Allocation

Author: Biebert, Daniel, Hakert, Christian, Chen, Kuan-Hsun, and Chen, Jian-Jia
Subjects: Computer Science - Machine Learning
Abstract: Bringing high-level machine learning models to efficient and well-suited machine implementations often invokes a bunch of tools, e.g.~code generators, compilers, and optimizers. Along such tool chains, abstractions have to be applied. This leads to not optimally used CPU registers. This is a shortcoming, especially in resource constrained embedded setups. In this work, we present a code generation approach for decision tree ensembles, which produces machine assembly code within a single conversion step directly from the high-level model representation. Specifically, we develop various approaches to effectively allocate registers for the inference of decision tree ensembles. Extensive evaluations of the proposed method are conducted in comparison to the basic realization of C code from the high-level machine learning model and succeeding compilation. The results show that the performance of decision tree ensemble inference can be significantly improved (by up to $\approx1.6\times$), if the methods are applied carefully to the appropriate scenario.
Published: 2024

50. Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets

Author: Long, Zi, Tang, Zhenhao, Fu, Xianghua, Chen, Jian, Hou, Shilong, and Lyu, Jinze
Subjects: Computer Science - Computation and Language
Abstract: Recent research in the field of multimodal machine translation (MMT) has indicated that the visual modality is either dispensable or offers only marginal advantages. However, most of these conclusions are drawn from the analysis of experimental results based on a limited set of bilingual sentence-image pairs, such as Multi30k. In these kinds of datasets, the content of one bilingual parallel sentence pair must be well represented by a manually annotated image, which is different from the real-world translation scenario. In this work, we adhere to the universal multimodal machine translation framework proposed by Tang et al. (2022). This approach allows us to delve into the impact of the visual modality on translation efficacy by leveraging real-world translation datasets. Through a comprehensive exploration via probing tasks, we find that the visual modality proves advantageous for the majority of authentic translation datasets. Notably, the translation performance primarily hinges on the alignment and coherence between textual and visual contents. Furthermore, our results suggest that visual information serves a supplementary role in multimodal translation and can be substituted., Comment: bucc 2024 accepted
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

15,221 results on '"Chen, Jian"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources