Author: "An, Qingyu" / Publication Year Range: Last 50 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"An, Qingyu"' showing total 31,538 results

Start Over Author "An, Qingyu" Publication Year Range Last 50 years

31,538 results on '"An, Qingyu"'

1. RelationBooth: Towards Relation-Aware Customized Object Generation

Author: Shi, Qingyu, Qi, Lu, Wu, Jianzong, Bai, Jinbin, Wang, Jingbo, Tong, Yunhai, Li, Xiangtai, and Yang, Ming-Husang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Customized image generation is crucial for delivering personalized content based on user-provided image prompts, aligning large-scale text-to-image diffusion models with individual needs. However, existing models often overlook the relationships between customized objects in generated images. Instead, this work addresses that gap by focusing on relation-aware customized image generation, which aims to preserve the identities from image prompts while maintaining the predicate relations described in text prompts. Specifically, we introduce RelationBooth, a framework that disentangles identity and relation learning through a well-curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Then, we propose two key modules to tackle the two main challenges: generating accurate and natural relations, especially when significant pose adjustments are required, and avoiding object confusion in cases of overlap. First, we introduce a keypoint matching loss that effectively guides the model in adjusting object poses closely tied to their relationships. Second, we incorporate local features from the image prompts to better distinguish between objects, preventing confusion in overlapping cases. Extensive results on three benchmarks demonstrate the superiority of RelationBooth in generating precise relations while preserving object identities across a diverse set of objects and relations. The source code and trained models will be made available to the public.
Published: 2024

2. Shopping MMLU: A Massive Multi-Task Online Shopping Benchmark for Large Language Models

Author: Jin, Yilun, Li, Zheng, Zhang, Chenwei, Cao, Tianyu, Gao, Yifan, Jayarao, Pratik, Li, Mao, Liu, Xin, Sarkhel, Ritesh, Tang, Xianfeng, Wang, Haodong, Wang, Zhengyang, Xu, Wenju, Yang, Jingfeng, Yin, Qingyu, Li, Xian, Nigam, Priyanka, Xu, Yi, Chen, Kai, Yang, Qiang, Jiang, Meng, and Yin, Bing
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Online shopping is a complex multi-task, few-shot learning problem with a wide and evolving range of entities, relations, and tasks. However, existing models and benchmarks are commonly tailored to specific tasks, falling short of capturing the full complexity of online shopping. Large Language Models (LLMs), with their multi-task and few-shot learning abilities, have the potential to profoundly transform online shopping by alleviating task-specific engineering efforts and by providing users with interactive conversations. Despite the potential, LLMs face unique challenges in online shopping, such as domain-specific concepts, implicit knowledge, and heterogeneous user behaviors. Motivated by the potential and challenges, we propose Shopping MMLU, a diverse multi-task online shopping benchmark derived from real-world Amazon data. Shopping MMLU consists of 57 tasks covering 4 major shopping skills: concept understanding, knowledge reasoning, user behavior alignment, and multi-linguality, and can thus comprehensively evaluate the abilities of LLMs as general shop assistants. With Shopping MMLU, we benchmark over 20 existing LLMs and uncover valuable insights about practices and prospects of building versatile LLM-based shop assistants. Shopping MMLU can be publicly accessed at https://github.com/KL4805/ShoppingMMLU. In addition, with Shopping MMLU, we host a competition in KDD Cup 2024 with over 500 participating teams. The winning solutions and the associated workshop can be accessed at our website https://amazon-kddcup24.github.io/., Comment: NeurIPS 2024 Datasets and Benchmarks Track Accepted. Modified typos in Figure 9
Published: 2024

3. Demystifying Large Language Models for Medicine: A Primer

Author: Jin, Qiao, Wan, Nicholas, Leaman, Robert, Tian, Shubo, Wang, Zhizheng, Yang, Yifan, Wang, Zifeng, Xiong, Guangzhi, Lai, Po-Ting, Zhu, Qingqing, Hou, Benjamin, Sarfo-Gyamfi, Maame, Zhang, Gongbo, Gilson, Aidan, Bhasuran, Balu, He, Zhe, Zhang, Aidong, Sun, Jimeng, Weng, Chunhua, Summers, Ronald M., Chen, Qingyu, Peng, Yifan, and Lu, Zhiyong
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Large language models (LLMs) represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this primer paper, we propose an actionable guideline to help healthcare professionals more efficiently utilize LLMs in their work, along with a set of best practices. This approach consists of several main phases, including formulating the task, choosing LLMs, prompt engineering, fine-tuning, and deployment. We start with the discussion of critical considerations in identifying healthcare tasks that align with the core capabilities of LLMs and selecting models based on the selected task and data, performance requirements, and model interface. We then review the strategies, such as prompt engineering and fine-tuning, to adapt standard LLMs to specialized medical tasks. Deployment considerations, including regulatory compliance, ethical guidelines, and continuous monitoring for fairness and bias, are also discussed. By providing a structured step-by-step methodology, this tutorial aims to equip healthcare professionals with the tools necessary to effectively integrate LLMs into clinical practice, ensuring that these powerful technologies are applied in a safe, reliable, and impactful manner.
Published: 2024

4. MedINST: Meta Dataset of Biomedical Instructions

Author: Han, Wenhan, Fang, Meng, Zhang, Zihan, Yin, Yu, Song, Zirui, Chen, Ling, Pechenizkiy, Mykola, and Chen, Qingyu
Subjects: Computer Science - Computation and Language
Abstract: The integration of large language model (LLM) techniques in the field of medical analysis has brought about significant advancements, yet the scarcity of large, diverse, and well-annotated datasets remains a major challenge. Medical data and tasks, which vary in format, size, and other parameters, require extensive preprocessing and standardization for effective use in training LLMs. To address these challenges, we introduce MedINST, the Meta Dataset of Biomedical Instructions, a novel multi-domain, multi-task instructional meta-dataset. MedINST comprises 133 biomedical NLP tasks and over 7 million training samples, making it the most comprehensive biomedical instruction dataset to date. Using MedINST as the meta dataset, we curate MedINST32, a challenging benchmark with different task difficulties aiming to evaluate LLMs' generalization ability. We fine-tune several LLMs on MedINST and evaluate on MedINST32, showcasing enhanced cross-task generalization.
Published: 2024

5. Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning

Author: Yin, Qingyu, He, Xuzheng, Deng, Luoao, Leong, Chak Tou, Wang, Fan, Yan, Yanzhao, Shen, Xiaoyu, and Zhang, Qiang
Subjects: Computer Science - Machine Learning, Computer Science - Computation and Language
Abstract: Fine-tuning and in-context learning (ICL) are two prevalent methods in imbuing large language models with task-specific knowledge. It is commonly believed that fine-tuning can surpass ICL given sufficient training samples as it allows the model to adjust its internal parameters based on the data. However, this paper presents a counterintuitive finding: For tasks with implicit patterns, ICL captures these patterns significantly better than fine-tuning. We developed several datasets featuring implicit patterns, such as sequences determining answers through parity or identifying reducible terms in calculations. We then evaluated the models' understanding of these patterns under both fine-tuning and ICL across models ranging from 0.5B to 7B parameters. The results indicate that models employing ICL can quickly grasp deep patterns and significantly improve accuracy. In contrast, fine-tuning, despite utilizing thousands of times more training samples than ICL, achieved only limited improvements. We also proposed circuit shift theory from a mechanistic interpretability's view to explain why ICL wins., Comment: EMNLP'24 Findings
Published: 2024

6. Distinguishing Electronic Band Structure of Single-layer and Bilayer Ruddlesden-Popper Nickelates Probed by in-situ High Pressure X-ray Absorption Near-edge Spectroscopy

Author: Li, Mingtao, Wang, Yiming, Pei, Cuiying, Zhang, Mingxin, Li, Nana, Guan, Jiayi, Amboage, Monica, Adama, N-Diaye, Kong, Qingyu, Qi, Yanpeng, and Yang, Wenge
Subjects: Condensed Matter - Superconductivity, Condensed Matter - Strongly Correlated Electrons
Abstract: We report a comprehensive study of electronic band structure for single-layer (SL) and bilayer (BL) RP-nickelates probed by in-situ HP X-ray absorption near edge spectroscopy (XANES). At ambient pressure (AP), the energy splitting delta_E of d_3z^2-r^2 and d_x^2-y^2 bands are directly observed in La3Ni2O7 (BL-La327) but not in La2NiO4 (SL-La214) above E_F, underlining the critical role of inner apical O atoms. A combination of DFT-based electronic band structure and projected density of states (PDOS) calculations with simulated XANES enables us to explain the observed main XANES features labelled by a, A, B', B and C when considering the orbital hybridizations, crystal field splitting (CFS) and core-hole screening of different 3d configurations for SL-La214 and BL-La327 nickelates. At high pressure (HP), the delta_E values of pre-edge peak form a dome-like evolution above 7.7 GPa with the maximum locating at around 20 GPa for metallic BL-La327. Analysis of its integrated area and FWHM provides strong evidence that the bonding d_3z^2-r^2 band crosses E_F above about 7.7 GPa for the metallic BL-La327. Growth of integrated area of pre-edge peak and C peak further evidences pressure-induced hole doping effect. Meanwhile, the pressure dependent FWHM of pre-edge peak implies a nonmonotonic evolution of orbital-selective electronic correlation above 7.7 GPa with extrema emerging at about 20 GPa. Moreover, we estimate the relative hole doping level using the energy shift of pre-edge peak, yielding 0.074 hole per Ni site or equivalently 1.1*10^21 cm^-3 at 20 GPa for the metallic BL-La327, which is comparable to cuprates. Our results have timely examined the electronic band structures as obtained from theoretical calculations, emphasizing the essential role of both d_3z^2-r^2 and d_x^2-y^2 bands as well as the electronic correlation in superconducting pairing for pressurized La3Ni2O7.
Published: 2024

7. LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

Author: Qin, Zhenyue, Yin, Yu, Campbell, Dylan, Wu, Xuansheng, Zou, Ke, Tham, Yih-Chung, Liu, Ninghao, Zhang, Xiuzhen, and Chen, Qingyu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The prevalence of vision-threatening eye diseases is a significant global burden, with many cases remaining undiagnosed or diagnosed too late for effective treatment. Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans, thereby reducing the burden on clinicians and improving access to eye care. However, limited benchmarks are available to assess LVLMs' performance in ophthalmology-specific applications. In this study, we introduce LMOD, a large-scale multimodal ophthalmology benchmark consisting of 21,993 instances across (1) five ophthalmic imaging modalities: optical coherence tomography, color fundus photographs, scanning laser ophthalmoscopy, lens photographs, and surgical scenes; (2) free-text, demographic, and disease biomarker information; and (3) primary ophthalmology-specific applications such as anatomical information understanding, disease diagnosis, and subgroup analysis. In addition, we benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains. The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains. Systematic error analysis further identified six major failure modes: misclassification, failure to abstain, inconsistent reasoning, hallucination, assertions without justification, and lack of domain-specific knowledge. In contrast, supervised neural networks specifically trained on these tasks as baselines demonstrated high accuracy. These findings underscore the pressing need for benchmarks in the development and validation of ophthalmology-specific LVLMs., Comment: Project Page: https://kfzyqin.github.io/lmod/
Published: 2024

8. Spectral Graph Sample Weighting for Interpretable Sub-cohort Analysis in Predictive Models for Neuroimaging

Author: Paschali, Magdalini, Jiang, Yu Hang, Siegel, Spencer, Gonzalez, Camila, Pohl, Kilian M., Chaudhari, Akshay, and Zhao, Qingyu
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Machine Learning
Abstract: Recent advancements in medicine have confirmed that brain disorders often comprise multiple subtypes of mechanisms, developmental trajectories, or severity levels. Such heterogeneity is often associated with demographic aspects (e.g., sex) or disease-related contributors (e.g., genetics). Thus, the predictive power of machine learning models used for symptom prediction varies across subjects based on such factors. To model this heterogeneity, one can assign each training sample a factor-dependent weight, which modulates the subject's contribution to the overall objective loss function. To this end, we propose to model the subject weights as a linear combination of the eigenbases of a spectral population graph that captures the similarity of factors across subjects. In doing so, the learned weights smoothly vary across the graph, highlighting sub-cohorts with high and low predictability. Our proposed sample weighting scheme is evaluated on two tasks. First, we predict initiation of heavy alcohol drinking in young adulthood from imaging and neuropsychological measures from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA). Next, we detect Dementia vs. Mild Cognitive Impairment (MCI) using imaging and demographic measurements in subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Compared to existing sample weighting schemes, our sample weights improve interpretability and highlight sub-cohorts with distinct characteristics and varying model accuracy.
Published: 2024

9. Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model

Author: Gilson, Aidan, Ai, Xuguang, Xie, Qianqian, Srinivasan, Sahana, Pushpanathan, Krithi, Singer, Maxwell B., Huang, Jimin, Kim, Hyunjae, Long, Erping, Wan, Peixing, Del Priore, Luciano V., Ohno-Machado, Lucila, Xu, Hua, Liu, Dianbo, Adelman, Ron A., Tham, Yih-Chung, and Chen, Qingyu
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) are poised to revolutionize healthcare. Ophthalmology-specific LLMs remain scarce and underexplored. We introduced an open-source, specialized LLM for ophthalmology, termed Language Enhanced Model for Eye (LEME). LEME was initially pre-trained on the Llama2 70B framework and further fine-tuned with a corpus of ~127,000 non-copyrighted training instances curated from ophthalmology-specific case reports, abstracts, and open-source study materials. We benchmarked LEME against eight other LLMs, namely, GPT-3.5, GPT-4, three Llama2 models (7B, 13B, 70B), PMC-LLAMA 13B, Meditron 70B, and EYE-Llama (another ophthalmology-specific LLM). Evaluations included four internal validation tasks: abstract completion, fill-in-the-blank, multiple-choice questions (MCQ), and short-answer QA. External validation tasks encompassed long-form QA, MCQ, patient EHR summarization, and clinical QA. Evaluation metrics included Rouge-L scores, accuracy, and expert evaluation of correctness, completeness, and readability. In internal validations, LEME consistently outperformed its counterparts, achieving Rouge-L scores of 0.20 in abstract completion (all p<0.05), 0.82 in fill-in-the-blank (all p<0.0001), and 0.22 in short-answer QA (all p<0.0001, except versus GPT-4). In external validations, LEME excelled in long-form QA with a Rouge-L of 0.19 (all p<0.0001), ranked second in MCQ accuracy (0.68; all p<0.0001), and scored highest in EHR summarization and clinical QA (ranging from 4.24 to 4.83 out of 5 for correctness, completeness, and readability). LEME's emphasis on robust fine-tuning and the use of non-copyrighted data represents a breakthrough in open-source ophthalmology-specific LLMs, offering the potential to revolutionize execution of clinical tasks while democratizing research collaboration.
Published: 2024

10. Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning

Author: Wu, Zixuan, Zaidi, Zulfiqar, Patil, Adithya, Xiao, Qingyu, and Gombolay, Matthew
Subjects: Computer Science - Robotics
Abstract: In this paper, we propose a novel and generalizable zero-shot knowledge transfer framework that distills expert sports navigation strategies from web videos into robotic systems with adversarial constraints and out-of-distribution image trajectories. Our pipeline enables diffusion-based imitation learning by reconstructing the full 3D task space from multiple partial views, warping it into 2D image space, closing the planning loop within this 2D space, and transfer constrained motion of interest back to task space. Additionally, we demonstrate that the learned policy can serve as a local planner in conjunction with position control. We apply this framework in the wheelchair tennis navigation problem to guide the wheelchair into the ball-hitting region. Our pipeline achieves a navigation success rate of 97.67% in reaching real-world recorded tennis ball trajectories with a physical robot wheelchair, and achieve a success rate of 68.49% in a real-world, real-time experiment on a full-sized tennis court., Comment: This manuscript has been submitted to 2025 IEEE International Conference on Robotics & Automation (ICRA)
Published: 2024

11. SymILO: A Symmetry-Aware Learning Framework for Integer Linear Optimization

Author: Chen, Qian, Zhang, Tianjian, Yang, Linxin, Han, Qingyu, Wang, Akang, Sun, Ruoyu, Luo, Xiaodong, and Chang, Tsung-Hui
Subjects: Mathematics - Optimization and Control
Abstract: Integer linear programs (ILPs) are commonly employed to model diverse practical problems such as scheduling and planning. Recently, machine learning techniques have been utilized to solve ILPs. A straightforward idea is to train a model via supervised learning, with an ILP as the input and an optimal solution as the label. An ILP is symmetric if its variables can be permuted without changing the problem structure, resulting in numerous equivalent and optimal solutions. Randomly selecting an optimal solution as the label can introduce variability in the training data, which may hinder the model from learning stable patterns. In this work, we incorporate the intrinsic symmetry of ILPs and propose a novel training framework called SymILO. Specifically, we modify the learning task by introducing solution permutation along with neural network weights as learnable parameters and then design an alternating algorithm to jointly optimize the loss function. We conduct extensive experiments on ILPs involving different symmetries and the computational results demonstrate that our symmetry-aware approach significantly outperforms three existing methods -- achieving $50.3\%$, $66.5\%$, and $45.4\%$ average improvements, respectively.
Published: 2024

12. Learning Dynamics of a Ball with Differentiable Factor Graph and Roto-Translational Invariant Representations

Author: Xiao, Qingyu, Wu, Zixuan, and Gombolay, Matthew
Subjects: Computer Science - Robotics
Abstract: Robots in dynamic environments need fast, accurate models of how objects move in their environments to support agile planning. In sports such as ping pong, analytical models often struggle to accurately predict ball trajectories with spins due to complex aerodynamics, elastic behaviors, and the challenges of modeling sliding and rolling friction. On the other hand, despite the promise of data-driven methods, machine learning struggles to make accurate, consistent predictions without precise input. In this paper, we propose an end-to-end learning framework that can jointly train a dynamics model and a factor graph estimator. Our approach leverages a Gram-Schmidt (GS) process to extract roto-translational invariant representations to improve the model performance, which can further reduce the validation error compared to data augmentation method. Additionally, we propose a network architecture that enhances nonlinearity by using self-multiplicative bypasses in the layer connections. By leveraging these novel methods, our proposed approach predicts the ball's position with an RMSE of 37.2 mm of the paddle radius at the apex after the first bounce, and 71.5 mm after the second bounce., Comment: ICRA 2025
Published: 2024

13. Learning Diverse Robot Striking Motions with Diffusion Models and Kinematically Constrained Gradient Guidance

Author: Lee, Kin Man, Ye, Sean, Xiao, Qingyu, Wu, Zixuan, Zaidi, Zulfiqar, D'Ambrosio, David B., Sanketi, Pannag R., and Gombolay, Matthew
Subjects: Computer Science - Robotics, Computer Science - Machine Learning
Abstract: Advances in robot learning have enabled robots to generate skills for a variety of tasks. Yet, robot learning is typically sample inefficient, struggles to learn from data sources exhibiting varied behaviors, and does not naturally incorporate constraints. These properties are critical for fast, agile tasks such as playing table tennis. Modern techniques for learning from demonstration improve sample efficiency and scale to diverse data, but are rarely evaluated on agile tasks. In the case of reinforcement learning, achieving good performance requires training on high-fidelity simulators. To overcome these limitations, we develop a novel diffusion modeling approach that is offline, constraint-guided, and expressive of diverse agile behaviors. The key to our approach is a kinematic constraint gradient guidance (KCGG) technique that computes gradients through both the forward kinematics of the robot arm and the diffusion model to direct the sampling process. KCGG minimizes the cost of violating constraints while simultaneously keeping the sampled trajectory in-distribution of the training data. We demonstrate the effectiveness of our approach for time-critical robotic tasks by evaluating KCGG in two challenging domains: simulated air hockey and real table tennis. In simulated air hockey, we achieved a 25.4% increase in block rate, while in table tennis, we saw a 17.3% increase in success rate compared to imitation learning baselines.
Published: 2024

14. Towards Accountable AI-Assisted Eye Disease Diagnosis: Workflow Design, External Validation, and Continual Learning

Author: Chen, Qingyu, Keenan, Tiarnan D L, Agron, Elvira, Allot, Alexis, Guan, Emily, Duong, Bryant, Elsawy, Amr, Hou, Benjamin, Xue, Cancan, Bhandari, Sanjeeb, Broadhead, Geoffrey, Cousineau-Krieger, Chantal, Davis, Ellen, Gensheimer, William G, Grasic, David, Gupta, Seema, Haddock, Luis, Konstantinou, Eleni, Lamba, Tania, Maiberger, Michele, Mantopoulos, Dimosthenis, Mehta, Mitul C, Nahri, Ayman G, AL-Nawaflh, Mutaz, Oshinsky, Arnold, Powell, Brittany E, Purt, Boonkit, Shin, Soo, Stiefel, Hillary, Thavikulwat, Alisa T, Wroblewski, Keith James, Chung, Tham Yih, Cheung, Chui Ming Gemmy, Cheng, Ching-Yu, Chew, Emily Y, Hribar, Michelle R., Chiang, Michael F., and Lu, Zhiyong
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Timely disease diagnosis is challenging due to increasing disease burdens and limited clinician availability. AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical workflows and diverse populations. This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and severity classification. We designed and implemented an AI-assisted diagnostic workflow for AMD, comparing diagnostic performance with and without AI assistance among 24 clinicians from 12 institutions with real patient data sampled from the Age-Related Eye Disease Study (AREDS). Additionally, we demonstrated continual enhancement of an existing AI model by incorporating approximately 40,000 additional medical images (named AREDS2 dataset). The improved model was then systematically evaluated using both AREDS and AREDS2 test sets, as well as an external test set from Singapore. AI assistance markedly enhanced diagnostic accuracy and classification for 23 out of 24 clinicians, with the average F1-score increasing by 20% from 37.71 (Manual) to 45.52 (Manual + AI) (P-value < 0.0001), achieving an improvement of over 50% in some cases. In terms of efficiency, AI assistance reduced diagnostic times for 17 out of the 19 clinicians tracked, with time savings of up to 40%. Furthermore, a model equipped with continual learning showed robust performance across three independent datasets, recording a 29% increase in accuracy, and elevating the F1-score from 42 to 54 in the Singapore population.
Published: 2024

15. MentalImager: Exploring Generative Images for Assisting Support-Seekers' Self-Disclosure in Online Mental Health Communities

Author: Zhang, Han, Zhang, Jiaqi, Zhou, Yuxiang, Louie, Ryan, Kim, Taewook, Guo, Qingyu, Li, Shuailin, and Peng, Zhenhui
Subjects: Computer Science - Human-Computer Interaction
Abstract: Support-seekers' self-disclosure of their suffering experiences, thoughts, and feelings in the post can help them get needed peer support in online mental health communities (OMHCs). However, such mental health self-disclosure could be challenging. Images can facilitate the manifestation of relevant experiences and feelings in the text; yet, relevant images are not always available. In this paper, we present a technical prototype named MentalImager and validate in a human evaluation study that it can generate topical- and emotional-relevant images based on the seekers' drafted posts or specified keywords. Two user studies demonstrate that MentalImager not only improves seekers' satisfaction with their self-disclosure in their posts but also invokes support-providers' empathy for the seekers and willingness to offer help. Such improvements are credited to the generated images, which help seekers express their emotions and inspire them to add more details about their experiences and feelings. We report concerns on MentalImager and discuss insights for supporting self-disclosure in OMHCs.
Published: 2024

16. Identify As A Human Does: A Pathfinder of Next-Generation Anti-Cheat Framework for First-Person Shooter Games

Author: Zhang, Jiayi, Sun, Chenxin, Gu, Yue, Zhang, Qingyu, Lin, Jiayi, Du, Xiaojiang, and Qian, Chenxiong
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction, Computer Science - Machine Learning
Abstract: The gaming industry has experienced substantial growth, but cheating in online games poses a significant threat to the integrity of the gaming experience. Cheating, particularly in first-person shooter (FPS) games, can lead to substantial losses for the game industry. Existing anti-cheat solutions have limitations, such as client-side hardware constraints, security risks, server-side unreliable methods, and both-sides suffer from a lack of comprehensive real-world datasets. To address these limitations, the paper proposes HAWK, a server-side FPS anti-cheat framework for the popular game CS:GO. HAWK utilizes machine learning techniques to mimic human experts' identification process, leverages novel multi-view features, and it is equipped with a well-defined workflow. The authors evaluate HAWK with the first large and real-world datasets containing multiple cheat types and cheating sophistication, and it exhibits promising efficiency and acceptable overheads, shorter ban times compared to the in-use anti-cheat, a significant reduction in manual labor, and the ability to capture cheaters who evaded official inspections.
Published: 2024

17. MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators

Author: Lu, Qingyu, Ding, Liang, Zhang, Kanjian, Zhang, Jinxia, and Tao, Dacheng
Subjects: Computer Science - Computation and Language
Abstract: Large Language Models (LLMs) have shown significant potential as judges for Machine Translation (MT) quality assessment, providing both scores and fine-grained feedback. Although approaches such as GEMBA-MQM has shown SOTA performance on reference-free evaluation, the predicted errors do not align well with those annotated by human, limiting their interpretability as feedback signals. To enhance the quality of error annotations predicted by LLM evaluators, we introduce a universal and training-free framework, $\textbf{MQM-APE}$, based on the idea of filtering out non-impactful errors by Automatically Post-Editing (APE) the original translation based on each error, leaving only those errors that contribute to quality improvement. Specifically, we prompt the LLM to act as 1) $\textit{evaluator}$ to provide error annotations, 2) $\textit{post-editor}$ to determine whether errors impact quality improvement and 3) $\textit{pairwise quality verifier}$ as the error filter. Experiments show that our approach consistently improves both the reliability and quality of error spans against GEMBA-MQM, across eight LLMs in both high- and low-resource languages. Orthogonal to trained approaches, MQM-APE complements translation-specific evaluators such as Tower, highlighting its broad applicability. Further analysis confirm the effectiveness of each module and offer valuable insights into evaluator design and LLMs selection. The code will be released to facilitate the community., Comment: Under Review
Published: 2024

18. Federated Graph Learning with Adaptive Importance-based Sampling

Author: Li, Anran, Chen, Yuanyuan, Ren, Chao, Wang, Wenhan, Hu, Ming, Li, Tianlin, Yu, Han, and Chen, Qingyu
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: For privacy-preserving graph learning tasks involving distributed graph datasets, federated learning (FL)-based GCN (FedGCN) training is required. A key challenge for FedGCN is scaling to large-scale graphs, which typically incurs high computation and communication costs when dealing with the explosively increasing number of neighbors. Existing graph sampling-enhanced FedGCN training approaches ignore graph structural information or dynamics of optimization, resulting in high variance and inaccurate node embeddings. To address this limitation, we propose the Federated Adaptive Importance-based Sampling (FedAIS) approach. It achieves substantial computational cost saving by focusing the limited resources on training important nodes, while reducing communication overhead via adaptive historical embedding synchronization. The proposed adaptive importance-based sampling method jointly considers the graph structural heterogeneity and the optimization dynamics to achieve optimal trade-off between efficiency and accuracy. Extensive evaluations against five state-of-the-art baselines on five real-world graph datasets show that FedAIS achieves comparable or up to 3.23% higher test accuracy, while saving communication and computation costs by 91.77% and 85.59%.
Published: 2024

19. CONGRA: Benchmarking Automatic Conflict Resolution

Author: Zhang, Qingyu, Su, Liangcai, Ye, Kai, and Qian, Chenxiong
Subjects: Computer Science - Software Engineering, Computer Science - Machine Learning, D.2, D.3
Abstract: Resolving conflicts from merging different software versions is a challenging task. To reduce the overhead of manual merging, researchers develop various program analysis-based tools which only solve specific types of conflicts and have a limited scope of application. With the development of language models, researchers treat conflict code as text, which theoretically allows for addressing almost all types of conflicts. However, the absence of effective conflict difficulty grading methods hinders a comprehensive evaluation of large language models (LLMs), making it difficult to gain a deeper understanding of their limitations. Furthermore, there is a notable lack of large-scale open benchmarks for evaluating the performance of LLMs in automatic conflict resolution. To address these issues, we introduce ConGra, a CONflict-GRAded benchmarking scheme designed to evaluate the performance of software merging tools under varying complexity conflict scenarios. We propose a novel approach to classify conflicts based on code operations and use it to build a large-scale evaluation dataset based on 44,948 conflicts from 34 real-world projects. We evaluate state-of-the-art LLMs on conflict resolution tasks using this dataset. By employing the dataset, we assess the performance of multiple state-of-the-art LLMs and code LLMs, ultimately uncovering two counterintuitive yet insightful phenomena. ConGra will be released at https://github.com/HKU-System-Security-Lab/ConGra.
Published: 2024

20. Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology

Author: Gilson, Aidan, Ai, Xuguang, Arunachalam, Thilaka, Chen, Ziyou, Cheong, Ki Xiong, Dave, Amisha, Duic, Cameron, Kibe, Mercy, Kaminaka, Annette, Prasad, Minali, Siddig, Fares, Singer, Maxwell, Wong, Wendy, Jin, Qiao, Keenan, Tiarnan D. L., Hu, Xia, Chew, Emily Y., Lu, Zhiyong, Xu, Hua, Adelman, Ron A., Tham, Yih-Chung, and Chen, Qingyu
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG in downstream domain-specific applications. We developed a RAG pipeline with 70,000 ophthalmology-specific documents that retrieve relevant documents to augment LLMs during inference time. In a case study on long-form consumer health questions, we systematically evaluated the responses including over 500 references of LLMs with and without RAG on 100 questions with 10 healthcare professionals. The evaluation focuses on factuality of evidence, selection and ranking of evidence, attribution of evidence, and answer accuracy and completeness. LLMs without RAG provided 252 references in total. Of which, 45.3% hallucinated, 34.1% consisted of minor errors, and 20.6% were correct. In contrast, LLMs with RAG significantly improved accuracy (54.5% being correct) and reduced error rates (18.8% with minor hallucinations and 26.7% with errors). 62.5% of the top 10 documents retrieved by RAG were selected as the top references in the LLM response, with an average ranking of 4.9. The use of RAG also improved evidence attribution (increasing from 1.85 to 2.49 on a 5-point scale, P<0.001), albeit with slight decreases in accuracy (from 3.52 to 3.23, P=0.03) and completeness (from 3.47 to 3.27, P=0.17). The results demonstrate that LLMs frequently exhibited hallucinated and erroneous evidence in the responses, raising concerns for downstream applications in the medical domain. RAG substantially reduced the proportion of such evidence but encountered challenges.
Published: 2024

21. Brain-Cognition Fingerprinting via Graph-GCCA with Contrastive Learning

Author: Wang, Yixin, Peng, Wei, Zhang, Yu, Adeli, Ehsan, Zhao, Qingyu, and Pohl, Kilian M.
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Many longitudinal neuroimaging studies aim to improve the understanding of brain aging and diseases by studying the dynamic interactions between brain function and cognition. Doing so requires accurate encoding of their multidimensional relationship while accounting for individual variability over time. For this purpose, we propose an unsupervised learning model (called \underline{\textbf{Co}}ntrastive Learning-based \underline{\textbf{Gra}}ph Generalized \underline{\textbf{Ca}}nonical Correlation Analysis (CoGraCa)) that encodes their relationship via Graph Attention Networks and generalized Canonical Correlational Analysis. To create brain-cognition fingerprints reflecting unique neural and cognitive phenotype of each person, the model also relies on individualized and multimodal contrastive learning. We apply CoGraCa to longitudinal dataset of healthy individuals consisting of resting-state functional MRI and cognitive measures acquired at multiple visits for each participant. The generated fingerprints effectively capture significant individual differences and outperform current single-modal and CCA-based multimodal models in identifying sex and age. More importantly, our encoding provides interpretable interactions between those two modalities.
Published: 2024

22. Directional WPT Charging for Routing-Asymmetric WRSNs with a Mobile Charger

Author: Gao, Zhenguo, Zhang, Qi, Gao, Qingyu, Zhao, Yunlong, and Wu, Hsiao-Chun
Subjects: Computer Science - Networking and Internet Architecture
Abstract: Mobile Charge Scheduling for wirelessly charging nodes in Wireless Rechargeable Sensor Networks (WRSNs) is a promising but still evolving research area. Existing research mostly assumes a symmetric environment, where the routing costs in opposite directions between two locations are considered identical. However, various factors such as terrain restrictions and wind or water flows may invalidate the routing-symmetric assumption in practical environments, thereby significantly limiting the performance of these solutions in routing-asymmetric WRSNs (RA-WRSNs). To address the routing-asymmetric challenges in mobile charge scheduling for WRSNs, this paper systematically investigates the underlying Asymmetric Directional Mobile Charger (DMC) Charge Scheduling (ADMCCS) problem, aiming to minimize energy loss while satisfying the charging demands of the network nodes. The DMC model is assumed because its results can be easily applied to the specialized case of an Omnidirectional Mobile Charger (OMC). To solve the ADMCCS problem, we propose a four-step framework. First, a minimum-size efficient charging position set is selected using our designed K-means-based Charging Position Generation (KCPG) algorithm, addressing the challenge of the unlimited charging position selection space. Next, minimum-size functional-equivalent direction sets at these positions are determined using an optimal algorithm, tackling the challenge of infinite charging directions. Subsequently, the optimal energy transmission time lengths for all directions at the positions are obtained by formulating and solving a Nonlinear Program (NLP) problem. Finally, the Lin-Kernighan Heuristic (LKH) algorithm for the Asymmetric Traveling Salesman Problem is adapted to obtain a highly probable optimal loop tour, addressing the routing-asymmetric challenge., Comment: 15 pages, 5 figures
Published: 2024

23. RNR: Teaching Large Language Models to Follow Roles and Rules

Author: Wang, Kuan, Bukharin, Alexander, Jiang, Haoming, Yin, Qingyu, Wang, Zhengyang, Zhao, Tuo, Shang, Jingbo, Zhang, Chao, Yin, Bing, Li, Xian, Chen, Jianshu, and Li, Shiyang
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
Abstract: Instruction fine-tuning (IFT) elicits instruction following capabilities and steers the behavior of large language models (LLMs) via supervised learning. However, existing models trained on open-source IFT datasets only have the ability to follow instructions from users, and often fail to follow complex role and rules specified by developers, a.k.a. system prompts. The ability to follow these roles and rules is essential for deployment, as it ensures that the model safely interacts with users within developer defined guidelines. To improve such role and rule following ability, we propose \model, an automated data generation pipeline that generates diverse roles and rules from existing IFT instructions, along with corresponding responses. This data can then be used to train models that follow complex system prompts. The models are evaluated on our newly created benchmarks for role and rule following ability, as well as standard instruction-following benchmarks and general NLP tasks. Our framework significantly improves role and rule following capability in LLMs, as evidenced by over 25% increase in pass-rate on rule adherence, i.e. following all requirements, in our experiments with the Alpaca and Ultrachat datasets. Moreover, our models achieves this increase without any regression on popular instruction following benchmarks.
Published: 2024

24. Latent 3D Brain MRI Counterfactual

Author: Peng, Wei, Xia, Tian, Ribeiro, Fabio De Sousa, Bosschieter, Tomas, Adeli, Ehsan, Zhao, Qingyu, Glocker, Ben, and Pohl, Kilian M.
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The number of samples in structural brain MRI studies is often too small to properly train deep learning models. Generative models show promise in addressing this issue by effectively learning the data distribution and generating high-fidelity MRI. However, they struggle to produce diverse, high-quality data outside the distribution defined by the training data. One way to address the issue is using causal models developed for 3D volume counterfactuals. However, accurately modeling causality in high-dimensional spaces is a challenge so that these models generally generate 3D brain MRIS of lower quality. To address these challenges, we propose a two-stage method that constructs a Structural Causal Model (SCM) within the latent space. In the first stage, we employ a VQ-VAE to learn a compact embedding of the MRI volume. Subsequently, we integrate our causal model into this latent space and execute a three-step counterfactual procedure using a closed-form Generalized Linear Model (GLM). Our experiments conducted on real-world high-resolution MRI data (1mm) demonstrate that our method can generate high-quality 3D MRI counterfactuals.
Published: 2024

25. Epidemiological characteristics of overseas-imported infectious diseases identified through airport health-screening measures: A case study on Fuzhou, China

Author: Li, Hong, Yang, Yan, Chen, Jiake, Li, Qingyu, Chen, Yifeng, Zhang, Yilin, Cai, Shaojian, Zhan, Meirong, Wu, Chuancheng, Lin, Xinwu, and Xiang, Jianjun
Published: 2024

26. ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Author: Liu, Qingyu, Song, Longfei, Xu, Dongxing, and Long, Yanhua
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research., Comment: 11 pages, 6 figures
Published: 2024

27. Visualizing $p$-orbital texture in the charge-density-wave state of CeSbTe

Author: Que, Xinglu, He, Qingyu, Zhou, Lihui, Lei, Shiming, Schoop, Leslie, Huang, Dennis, and Takagi, Hidenori
Subjects: Condensed Matter - Mesoscale and Nanoscale Physics
Abstract: The collective reorganization of electrons into a charge density wave (CDW) inside a crystal has long served as a textbook example of an ordered phase in condensed matter physics. Two-dimensional square lattices with $p$ electrons are well-suited to the realization of CDW, due to the anisotropy of the $p$ orbitals and the resulting one dimensionality of the electronic structure. In spite of a long history of study of CDW in square-lattice systems, few reports have recognized the existence and significance of a hidden orbital degree of freedom. The degeneracy of $p_x$ and $p_y$ electrons inherent to a square lattice may give rise to nontrivial orbital patterns in real space that endow the CDW with additional broken symmetries or unusual order parameters. Using scanning tunneling microscopy, we visualize signatures of $p$-orbital texture in the CDW state of the topological semimetal candidate CeSbTe, which contains Sb square lattices with 5$p$ electrons. We image atomic-sized, anisotropic lobes of charge density with periodically modulating anisotropy, that ultimately can be mapped onto a microscopic pattern of $p_x$ and $p_y$ bond density waves. Our results show that even delocalized $p$ orbitals can reorganize into unexpected and emergent electronic states of matter., Comment: 19 pages, 6 figures
Published: 2024

28. Large Models for Aerial Edges: An Edge-Cloud Model Evolution and Communication Paradigm

Author: Zhang, Shuhang, Liu, Qingyu, Chen, Ke, Di, Boya, Zhang, Hongliang, Yang, Wenhan, Niyato, Dusit, Han, Zhu, and Poor, H. Vincent
Subjects: Computer Science - Networking and Internet Architecture, Electrical Engineering and Systems Science - Signal Processing
Abstract: The future sixth-generation (6G) of wireless networks is expected to surpass its predecessors by offering ubiquitous coverage through integrated air-ground facility deployments in both communication and computing domains. In this network, aerial facilities, such as unmanned aerial vehicles (UAVs), conduct artificial intelligence (AI) computations based on multi-modal data to support diverse applications including surveillance and environment construction. However, these multi-domain inference and content generation tasks require large AI models, demanding powerful computing capabilities, thus posing significant challenges for UAVs. To tackle this problem, we propose an integrated edge-cloud model evolution framework, where UAVs serve as edge nodes for data collection and edge model computation. Through wireless channels, UAVs collaborate with ground cloud servers, providing cloud model computation and model updating for edge UAVs. With limited wireless communication bandwidth, the proposed framework faces the challenge of information exchange scheduling between the edge UAVs and the cloud server. To tackle this, we present joint task allocation, transmission resource allocation, transmission data quantization design, and edge model update design to enhance the inference accuracy of the integrated air-ground edge-cloud model evolution framework by mean average precision (mAP) maximization. A closed-form lower bound on the mAP of the proposed framework is derived, and the solution to the mAP maximization problem is optimized accordingly. Simulations, based on results from vision-based classification experiments, consistently demonstrate that the mAP of the proposed framework outperforms both a centralized cloud model framework and a distributed edge model framework across various communication bandwidths and data sizes.
Published: 2024

29. Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico Trial

Author: Zhong, Xian, Salahuddin, Zohaib, Chen, Yi, Woodruff, Henry C, Long, Haiyi, Peng, Jianyun, Udawatte, Nuwan, Casale, Roberto, Mokhtari, Ayoub, Zhang, Xiaoer, Huang, Jiayao, Wu, Qingyu, Tan, Li, Chen, Lili, Li, Dongming, Xie, Xiaoyan, Lin, Manxia, and Lambin, Philippe
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Artificial intelligence (AI)-based decision support systems have demonstrated value in predicting post-hepatectomy liver failure (PHLF) in hepatocellular carcinoma (HCC). However, they often lack transparency, and the impact of model explanations on clinicians' decisions has not been thoroughly evaluated. Building on prior research, we developed a variational autoencoder-multilayer perceptron (VAE-MLP) model for preoperative PHLF prediction. This model integrated counterfactuals and layerwise relevance propagation (LRP) to provide insights into its decision-making mechanism. Additionally, we proposed a methodological framework for evaluating the explainability of AI systems. This framework includes qualitative and quantitative assessments of explanations against recognized biomarkers, usability evaluations, and an in silico clinical trial. Our evaluations demonstrated that the model's explanation correlated with established biomarkers and exhibited high usability at both the case and system levels. Furthermore, results from the three-track in silico clinical trial showed that clinicians' prediction accuracy and confidence increased when AI explanations were provided.
Published: 2024

30. HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline

Author: Guo, Qingyu, Wan, Jiayong, Xu, Songqiang, Li, Meng, and Wang, Yuan
Subjects: Computer Science - Hardware Architecture, Computer Science - Artificial Intelligence, 68T07
Abstract: Vision Transformer (ViT) acceleration with field programmable gate array (FPGA) is promising but challenging. Existing FPGA-based ViT accelerators mainly rely on temporal architectures, which process different operators by reusing the same hardware blocks and suffer from extensive memory access overhead. Pipelined architectures, either coarse-grained or fine-grained, unroll the ViT computation spatially for memory access efficiency. However, they usually suffer from significant hardware resource constraints and pipeline bubbles induced by the global computation dependency of ViT. In this paper, we introduce HG-PIPE, a pipelined FPGA accelerator for high-throughput and low-latency ViT processing. HG-PIPE features a hybrid-grained pipeline architecture to reduce on-chip buffer cost and couples the computation dataflow and parallelism design to eliminate the pipeline bubbles. HG-PIPE further introduces careful approximations to implement both linear and non-linear operators with abundant Lookup Tables (LUTs), thus alleviating resource constraints. On a ZCU102 FPGA, HG-PIPE achieves 2.78 times better throughput and 2.52 times better resource efficiency than the prior-art accelerators, e.g., AutoViTAcc. With a VCK190 FPGA, HG-PIPE realizes end-to-end ViT acceleration on a single device and achieves 7118 images/s, which is 2.81 times faster than a V100 GPU., Comment: Accepted by ICCAD 2024
Published: 2024

31. MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models

Author: Gan, Chengguang, Yin, Qingyu, He, Xinyang, Wei, Hanjun, Liang, Yunhao, Lim, Younghun, Wang, Shijian, Huang, Hexiang, Zhang, Qinghao, Ni, Shiwen, and Mori, Tatsunori
Subjects: Computer Science - Computation and Language
Abstract: The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that encompasses 21 sub-datasets in English, Japanese, and Chinese. In this paper, we also propose a method for dataset translation assisted by Large Language Models (LLMs), which significantly reduces the manual annotation time required for dataset construction by leveraging LLMs to translate the original Japanese datasets. Additionally, we have enriched the dataset by incorporating open-domain Named Entity Recognition (NER) and sentence classification tasks. Utilizing this expanded dataset, we developed a unified input-output framework to train an Open-domain Information Extraction Large Language Model (OIELLM). The OIELLM model demonstrates the capability to effectively process novel MMM datasets, exhibiting significant improvements in performance., Comment: Under Review. 11 pages, 5 Figure
Published: 2024

32. Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load

Author: Pamshetti, Vijay Babu, Zhang, Wei, Ng, Andy Man-Fai, Yan, Qingyu, and Tan, Kuan Tak
Subjects: Electrical Engineering and Systems Science - Systems and Control
Abstract: Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the network performance through energy losses and voltage deviation. We propose Bach for battery degradation-aware cheduling based on e-constraint and fuzzy logic methods. Bach is implemented for the IEEE 33-bus network for an experimental study. The results show the effectiveness of Bach in optimizing costs and performance simultaneously with battery degradation awareness and demonstrate the flexibility of further customization., Comment: 3 figures
Published: 2024

33. The condition for constructing a finite element from a superspline

Author: Hu, Jun, Lin, Ting, Wu, Qingyu, and Yuan, Beihui
Subjects: Mathematics - Numerical Analysis, 65N30, 65D07
Abstract: This paper addresses the sufficient and necessary conditions for constructing $C^r$ conforming finite element spaces from a superspline spaces on general simplicial triangulations. We introduce the concept of extendability for the pre-element spaces, which encompasses both the superspline space and the finite element space. By examining the extendability condition for both types of spaces, we provide an answer to the conditions regarding the construction. A corollary of our results is that constructing $C^r$ conforming elements in $d$ dimensions should in general require an extra $C^{2^{s}r}$ continuity on $s$-codimensional simplices, and the polynomial degree is at least $(2^d r + 1)$., Comment: 22 pages, 4 figures
Published: 2024

34. WizardMerge -- Save Us From Merging Without Any Clues

Author: Zhang, Qingyu, Li, Junzhe, Lin, Jiayi, Ding, Jie, Lin, Lanteng, and Qian, Chenxiong
Subjects: Computer Science - Software Engineering, Computer Science - Emerging Technologies, Computer Science - Programming Languages, D.2, D.3
Abstract: Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, developers remain struggling to resolve the conflicts and fix incorrectly modified code without clues. We present WizardMerge, an auxiliary tool that leverages merging results from Git to retrieve code block dependency on text and LLVM-IR level and provide suggestions for developers to resolve errors introduced by textual merging. Through the evaluation, we subjected WizardMerge to testing on 227 conflicts within five large-scale projects. The outcomes demonstrate that WizardMerge diminishes conflict merging time costs, achieving a 23.85% reduction. Beyond addressing conflicts, WizardMerge provides merging suggestions for over 70% of the code blocks potentially affected by the conflicts. Notably, WizardMerge exhibits the capability to identify conflict-unrelated code blocks that require manual intervention yet are harmfully applied by Git during the merging., Comment: 22 pages
Published: 2024

35. CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Author: Ye, Jingheng, Xu, Zishan, Li, Yinghui, Cheng, Xuxin, Song, Linlin, Zhou, Qingyu, Zheng, Hai-Tao, Shen, Ying, and Su, Xin
Subjects: Computer Science - Computation and Language
Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute to revealing the critical characteristics and locating drawbacks of GEC systems. Evaluating systems by Combining these dimensions leads to high human consistency over other reference-based and reference-less metrics. Extensive experiments on 2 human judgement datasets and 6 reference datasets demonstrate the effectiveness and robustness of our method. All the codes will be released after the peer review., Comment: 16 pages, 8 tables, 2 figures. Under review
Published: 2024

36. MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework

Author: Xu, Xusheng, Cui, Jiangyu, Cui, Zidong, He, Runhong, Li, Qingyu, Li, Xiaowei, Lin, Yanling, Liu, Jiale, Liu, Wuxin, Lu, Jiale, Luo, Maolin, Lyu, Chufan, Pan, Shijie, Pavel, Mosharev, Shu, Runqiu, Tang, Jialiang, Xu, Ruoqian, Xu, Shu, Yang, Kang, Yu, Fan, Zeng, Qingguo, Zhao, Haiying, Zheng, Qiang, Zhou, Junyuan, Zhou, Xu, Zhu, Yikang, Zou, Zuoheng, Bayat, Abolfazl, Cao, Xi, Cui, Wei, Li, Zhendong, Long, Guilu, Su, Zhaofeng, Wang, Xiaoting, Wang, Zizhu, Wei, Shijie, Wu, Re-Bing, Zhang, Pan, and Yung, Man-Hong
Subjects: Quantum Physics
Abstract: We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit mapping, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance.
Published: 2024

37. Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios

Author: Huang, Tao, Chen, Ziyang, Meng, Jiayang, Huang, Qingyu, Yang, Xu, Yi, Xun, and Khalil, Ibrahim
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: In the context of machine unlearning, the primary challenge lies in effectively removing traces of private data from trained models while maintaining model performance and security against privacy attacks like membership inference attacks. Traditional gradient-based unlearning methods often rely on extensive historical gradients, which becomes impractical with high unlearning ratios and may reduce the effectiveness of unlearning. Addressing these limitations, we introduce Mini-Unlearning, a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction mapping. Our method, Mini-Unlearning, utilizes a minimal subset of historical gradients and leverages this contraction mapping to facilitate scalable, efficient unlearning. This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks. Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security, offering a promising solution for applications requiring robust unlearning capabilities.
Published: 2024

38. Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

Author: Ying, Xinyi, Xiao, Chao, Li, Ruojing, He, Xu, Li, Boyang, Li, Zhaoxu, Wang, Yingqian, Hu, Mingyuan, Xu, Qingyu, Lin, Zaiping, Li, Miao, Zhou, Shilin, An, Wei, Sheng, Weidong, and Liu, Li
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large target size cannot provide an impartial benchmark to evaluate multi-category visible-thermal small object detection (RGBT SOD) algorithms. In this paper, we build the first large-scale benchmark with high diversity for RGBT SOD (namely RGBT-Tiny), including 115 paired sequences, 93K frames and 1.2M manual annotations. RGBT-Tiny contains abundant targets (7 categories) and high-diversity scenes (8 types that cover different illumination and density variations). Note that, over 81% of targets are smaller than 16x16, and we provide paired bounding box annotations with tracking ID to offer an extremely challenging benchmark with wide-range applications, such as RGBT fusion, detection and tracking. In addition, we propose a scale adaptive fitness (SAFit) measure that exhibits high robustness on both small and large targets. The proposed SAFit can provide reasonable performance evaluation and promote detection performance. Based on the proposed RGBT-Tiny dataset and SAFit measure, extensive evaluations have been conducted, including 23 recent state-of-the-art algorithms that cover four different types (i.e., visible generic detection, visible SOD, thermal SOD and RGBT object detection). Project is available at https://github.com/XinyiYing24/RGBT-Tiny.
Published: 2024

39. MedCalc-Bench: Evaluating Large Language Models for Medical Calculations

Author: Khandekar, Nikhil, Jin, Qiao, Xiong, Guangzhi, Dunn, Soren, Applebaum, Serina S, Anwar, Zain, Sarfo-Gyamfi, Maame, Safranek, Conrad W, Anwar, Abid A, Zhang, Andrew, Gilson, Aidan, Singer, Maxwell B, Dave, Amisha, Taylor, Andrew, Zhang, Aidong, Chen, Qingyu, and Lu, Zhiyong
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative equations and rule-based reasoning paradigms for evidence-based decision support. To this end, we propose MedCalc-Bench, a first-of-its-kind dataset focused on evaluating the medical calculation capability of LLMs. MedCalc-Bench contains an evaluation set of over 1000 manually reviewed instances from 55 different medical calculation tasks. Each instance in MedCalc-Bench consists of a patient note, a question requesting to compute a specific medical value, a ground truth answer, and a step-by-step explanation showing how the answer is obtained. While our evaluation results show the potential of LLMs in this area, none of them are effective enough for clinical settings. Common issues include extracting the incorrect entities, not using the correct equation or rules for a calculation task, or incorrectly performing the arithmetic for the computation. We hope our study highlights the quantitative knowledge and reasoning gaps in LLMs within medical settings, encouraging future improvements of LLMs for various clinical calculation tasks., Comment: Github link: https://github.com/ncbi-nlp/MedCalc-Bench HuggingFace link: https://huggingface.co/datasets/nsk7153/MedCalc-Bench
Published: 2024

40. MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding

Author: Xu, Baixuan, Wang, Weiqi, Shi, Haochen, Ding, Wenxuan, Jing, Huihao, Fang, Tianqing, Bai, Jiaxin, Liu, Xin, Yu, Changlong, Li, Zheng, Luo, Chen, Yin, Qingyu, Yin, Bing, Chen, Long, and Song, Yangqiu
Subjects: Computer Science - Computation and Language
Abstract: Improving user experience and providing personalized search results in E-commerce platforms heavily rely on understanding purchase intention. However, existing methods for acquiring large-scale intentions bank on distilling large language models with human annotation for verification. Such an approach tends to generate product-centric intentions, overlook valuable visual information from product images, and incurs high costs for scalability. To address these issues, we introduce MIND, a multimodal framework that allows Large Vision-Language Models (LVLMs) to infer purchase intentions from multimodal product metadata and prioritize human-centric ones. Using Amazon Review data, we apply MIND and create a multimodal intention knowledge base, which contains 1,264,441 million intentions derived from 126,142 co-buy shopping records across 107,215 products. Extensive human evaluations demonstrate the high plausibility and typicality of our obtained intentions and validate the effectiveness of our distillation framework and filtering mechanism. Additional experiments reveal that our obtained intentions significantly enhance large language models in two intention comprehension tasks., Comment: EMNLP 2024 main conference
Published: 2024

41. Augmenting Biomedical Named Entity Recognition with General-domain Resources

Author: Yin, Yu, Kim, Hyunjae, Xiao, Xiao, Wei, Chih Hsuan, Kang, Jaewoo, Lu, Zhiyong, Xu, Hua, Fang, Meng, and Chen, Qingyu
Subjects: Computer Science - Computation and Language
Abstract: Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets. In this paper, we proposed GERBERA, a simple-yet-effective method that utilized a general-domain NER dataset for training. Specifically, we performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset. We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with multiple additional BioNER datasets. Specifically, our models consistently outperformed the baselines in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight biomedical entity types sourced from five different corpora. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset., Comment: We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera
Published: 2024
Full Text: View/download PDF

42. IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce

Author: Ding, Wenxuan, Wang, Weiqi, Kwok, Sze Heng Douglas, Liu, Minghao, Fang, Tianqing, Bai, Jiaxin, Liu, Xin, Yu, Changlong, Li, Zheng, Luo, Chen, Yin, Qingyu, Yin, Bing, He, Junxian, and Song, Yangqiu
Subjects: Computer Science - Computation and Language
Abstract: Enhancing Language Models' (LMs) ability to understand purchase intentions in E-commerce scenarios is crucial for their effective assistance in various downstream tasks. However, previous approaches that distill intentions from LMs often fail to generate meaningful and human-centric intentions applicable in real-world E-commerce contexts. This raises concerns about the true comprehension and utilization of purchase intentions by LMs. In this paper, we present IntentionQA, a double-task multiple-choice question answering benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce. Specifically, LMs are tasked to infer intentions based on purchased products and utilize them to predict additional purchases. IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline to ensure scalability on large E-commerce platforms. Human evaluations demonstrate the high quality and low false-negative rate of our benchmark. Extensive experiments across 19 language models show that they still struggle with certain scenarios, such as understanding products and intentions accurately, jointly reasoning with products and intentions, and more, in which they fall far behind human performances. Our code and data are publicly available at https://github.com/HKUST-KnowComp/IntentionQA., Comment: Findings of EMNLP 2024
Published: 2024

43. Identifying high school risk factors that forecast heavy drinking onset in understudied young adults

Author: Zhao, Qingyu, Paschali, Magdalini, Dehoney, Joseph, Baker, Fiona C, de Zambotti, Massimiliano, De Bellis, Michael D, Goldston, David B, Nooner, Kate B, Clark, Duncan B, Luna, Beatriz, Nagel, Bonnie J, Brown, Sandra A, Tapert, Susan F, Eberson, Sonja, Thompson, Wesley K, Pfefferbaum, Adolf, Sullivan, Edith V, and Pohl, Kilian M
Subjects: Biomedical and Clinical Sciences, Biological Psychology, Clinical and Health Psychology, Neurosciences, Psychology, Pediatric, Health Disparities, Underage Drinking, Alcoholism, Alcohol Use and Health, Minority Health, Prevention, Clinical Research, Behavioral and Social Science, Social Determinants of Health, Substance Misuse, 2.3 Psychological, social and economic factors, Good Health and Well Being, Alcohol, Forecasting, Young adult, Adolescence, College, Humans, Risk Factors, Longitudinal Studies, Alcohol Drinking, Schools, Students, Adolescent, Adult, United States, Female, Male, Young Adult, Clinical Sciences, Cognitive Sciences, Biological psychology, Clinical and health psychology
Abstract: Heavy alcohol drinking is a major, preventable problem that adversely impacts the physical and mental health of US young adults. Studies seeking drinking risk factors typically focus on young adults who enrolled in 4-year residential college programs (4YCP) even though most high school graduates join the workforce, military, or community colleges. We examined 106 of these understudied young adults (USYA) and 453 4YCPs from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA) by longitudinally following their drinking patterns for 8 years from adolescence to young adulthood. All participants were no-to-low drinkers during high school. Whereas 4YCP individuals were more likely to initiate heavy drinking during college years, USYA participants did so later. Using mental health metrics recorded during high school, machine learning forecasted individual-level risk for initiating heavy drinking after leaving high school. The risk factors differed between demographically matched USYA and 4YCP individuals and between sexes. Predictors for USYA drinkers were sexual abuse, physical abuse for girls, and extraversion for boys, whereas 4YCP drinkers were predicted by the ability to recognize facial emotion and, for boys, greater openness. Thus, alcohol prevention programs need to give special consideration to those joining the workforce, military, or community colleges, who make up the majority of this age group.
Published: 2024

44. Base of RoPE Bounds Context Length

Author: Men, Xin, Xu, Mingyu, Wang, Bingning, Zhang, Qingyu, Lin, Hongyu, Han, Xianpei, and Chen, Weipeng
Subjects: Computer Science - Computation and Language
Abstract: Position embedding is a core component of current Large Language Models (LLMs). Rotary position embedding (RoPE), a technique that encodes the position information with a rotation matrix, has been the de facto choice for position embedding in many LLMs, such as the Llama series. RoPE has been further utilized to extend long context capability, which is roughly based on adjusting the \textit{base} parameter of RoPE to mitigate out-of-distribution (OOD) problems in position embedding. However, in this paper, we find that LLMs may obtain a superficial long-context ability based on the OOD theory. We revisit the role of RoPE in LLMs and propose a novel property of long-term decay, we derive that the \textit{base of RoPE bounds context length}: there is an absolute lower bound for the base value to obtain certain context length capability. Our work reveals the relationship between context length and RoPE base both theoretically and empirically, which may shed light on future long context training., Comment: 17 pages
Published: 2024

45. Large Language Models in the Clinic: A Comprehensive Benchmark

Author: Liu, Fenglin, Li, Zheng, Zhou, Hongjian, Yin, Qingyu, Yang, Jingfeng, Tang, Xianfeng, Luo, Chen, Zeng, Ming, Jiang, Haoming, Gao, Yifan, Nigam, Priyanka, Nag, Sreyashi, Yin, Bing, Hua, Yining, Zhou, Xuan, Rohanian, Omid, Thakur, Anshul, Clifton, Lei, and Clifton, David A.
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and clinical tasks that are complex but common in real-world practice, e.g., open-ended decision-making, long document processing, and emerging drug analysis. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs. The benchmark data is available at https://github.com/AI-in-Health/ClinicBench., Comment: Accepted at EMNLP 2024 Main Conference
Published: 2024

46. Superoscillations in High Energy Physics and Gravity

Author: Addazi, Andrea and Gan, Qingyu
Subjects: High Energy Physics - Theory, General Relativity and Quantum Cosmology, High Energy Physics - Phenomenology, Quantum Physics
Abstract: We explore superoscillations within the context of classical and quantum field theories, presenting novel solutions to Klein-Gordon's, Dirac's, Maxwell's and Einstein's equations. In particular, we illustrate a procedure of second quantization of fields and how to construct a Fock space which encompasses Superoscillating states. Furthermore, we extend the application of superoscillations to quantum tunnelings, scatterings and mixings of particles, squeezed states and potential advancements in laser interferometry, which could open new avenues for experimental tests of Quantum Gravity effects. By delving into the relationship among superoscillations and phenomena such as Hawking radiation, the Black Hole (BH) information and the Firewall paradox, we propose an alternative mechanism for information transfer across the BH event horizon.
Published: 2024

47. GlobalBuildingMap -- Unveiling the Mystery of Global Buildings

Author: Zhu, Xiao Xiang, Li, Qingyu, Shi, Yilei, Wang, Yuanyuan, Stewart, Adam, and Prexl, Jonathan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Understanding how buildings are distributed globally is crucial to revealing the human footprint on our home planet. This built environment affects local climate, land surface albedo, resource distribution, and many other key factors that influence well-being and human health. Despite this, quantitative and comprehensive data on the distribution and properties of buildings worldwide is lacking. To this end, by using a big data analytics approach and nearly 800,000 satellite images, we generated the highest resolution and highest accuracy building map ever created: the GlobalBuildingMap (GBM). A joint analysis of building maps and solar potentials indicates that rooftop solar energy can supply the global energy consumption need at a reasonable cost. Specifically, if solar panels were placed on the roofs of all buildings, they could supply 1.1-3.3 times -- depending on the efficiency of the solar device -- the global energy consumption in 2020, which is the year with the highest consumption on record. We also identified a clear geospatial correlation between building areas and key socioeconomic variables, which indicates our global building map can serve as an important input to modeling global socioeconomic needs and drivers.
Published: 2024

48. Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation

Author: Hwa, Jensen, Zhao, Qingyu, Lahiri, Aditya, Masood, Adnan, Salimi, Babak, and Adeli, Ehsan
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Conditional independence (CI) constraints are critical for defining and evaluating fairness in machine learning, as well as for learning unconfounded or causal representations. Traditional methods for ensuring fairness either blindly learn invariant features with respect to a protected variable (e.g., race when classifying sex from face images) or enforce CI relative to the protected attribute only on the model output (e.g., the sex label). Neither of these methods are effective in enforcing CI in high-dimensional feature spaces. In this paper, we focus on a nascent approach characterizing the CI constraint in terms of two Jensen-Shannon divergence terms, and we extend it to high-dimensional feature spaces using a novel dynamic sampling strategy. In doing so, we introduce a new training paradigm that can be applied to any encoder architecture. We are able to enforce conditional independence of the diffusion autoencoder latent representation with respect to any protected attribute under the equalized odds constraint and show that this approach enables causal image generation with controllable latent spaces. Our experimental results demonstrate that our approach can achieve high accuracy on downstream tasks while upholding equality of odds., Comment: To appear at the 2024 IEEE CVPR Workshop on Fair, Data-Efficient, and Trusted Computer Vision
Published: 2024

49. Practical Battery Health Monitoring using Uncertainty-Aware Bayesian Neural Network

Author: Zhao, Yunyi, Wei, Zhang, Yan, Qingyu, Ng, Man-Fai, Sivaneasan, B., and Xiang, Cheng
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Emerging Technologies
Abstract: Battery health monitoring and prediction are critically important in the era of electric mobility with a huge impact on safety, sustainability, and economic aspects. Existing research often focuses on prediction accuracy but tends to neglect practical factors that may hinder the technology's deployment in real-world applications. In this paper, we address these practical considerations and develop models based on the Bayesian neural network for predicting battery end-of-life. Our models use sensor data related to battery health and apply distributions, rather than single-point, for each parameter of the models. This allows the models to capture the inherent randomness and uncertainty of battery health, which leads to not only accurate predictions but also quantifiable uncertainty. We conducted an experimental study and demonstrated the effectiveness of our proposed models, with a prediction error rate averaging 13.9%, and as low as 2.9% for certain tested batteries. Additionally, all predictions include quantifiable certainty, which improved by 66% from the initial to the mid-life stage of the battery. This research has practical values for battery technologies and contributes to accelerating the technology adoption in the industry., Comment: 6 pages
Published: 2024

50. BatSort: Enhanced Battery Classification with Transfer Learning for Battery Sorting and Recycling

Author: Zhao, Yunyi, Zhang, Wei, Hu, Erhai, Yan, Qingyu, Xiang, Cheng, Tseng, King Jet, and Niyato, Dusit
Subjects: Computer Science - Computational Engineering, Finance, and Science, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Battery recycling is a critical process for minimizing environmental harm and resource waste for used batteries. However, it is challenging, largely because sorting batteries is costly and hardly automated to group batteries based on battery types. In this paper, we introduce a machine learning-based approach for battery-type classification and address the daunting problem of data scarcity for the application. We propose BatSort which applies transfer learning to utilize the existing knowledge optimized with large-scale datasets and customizes ResNet to be specialized for classifying battery types. We collected our in-house battery-type dataset of small-scale to guide the knowledge transfer as a case study and evaluate the system performance. We conducted an experimental study and the results show that BatSort can achieve outstanding accuracy of 92.1% on average and up to 96.2% and the performance is stable for battery-type classification. Our solution helps realize fast and automated battery sorting with minimized cost and can be transferred to related industry applications with insufficient data.
Published: 2024

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

31,538 results on '"An, Qingyu"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources