Author: "Liu, Jiang" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Liu, Jiang"' showing total 12,479 results

Start Over Author "Liu, Jiang"

12,479 results on '"Liu, Jiang"'

1. DeepFlow: Serverless Large Language Model Serving at Scale

Author: Hu, Junhao, Xu, Jiang, Liu, Zhixia, He, Yulong, Chen, Yuetao, Xu, Hao, Liu, Jiang, Zhang, Baoquan, Wan, Shining, Dan, Gengyuan, Dong, Zhiyu, Ren, Zhihao, Meng, Jie, He, Chao, Liu, Changhong, Xie, Tao, Lin, Dayun, Zhang, Qin, Yu, Yue, Feng, Hao, Chen, Xusheng, and Shan, Yizhou
Subjects: Computer Science - Distributed, Parallel, and Cluster Computing
Abstract: This paper introduces DeepFlow, a scalable and serverless AI platform designed to efficiently serve large language models (LLMs) at scale in cloud environments. DeepFlow addresses key challenges such as resource allocation, serving efficiency, and cold start latencies through four main design components. First, it uses a simple serverless abstraction called the request-job-task model, which helps manage AI workloads across post-training and model serving tasks. Second, it builds an in-house serving engine FlowServe using a microkernel-inspired design, NPU-centric execution, and SPMD-based parallelism to optimize LLM serving. The system also includes novel scheduling policies tailored for both PD-disaggregated and PD-colocated configurations. With optimizations like pre-warmed pods, DRAM pre-loading, and NPU-fork, DeepFlow can scale up to 64 instances in seconds. DeepFlow has been in production for over a year, operating on a large Ascend NPU cluster and providing industrystandard APIs for fine-tuning, agent serving, and model serving to our customers.
Published: 2025

2. Fundus Image Quality Assessment and Enhancement: a Systematic Review

Author: Li, Heng, Li, Haojin, Ou, Mingyang, Yu, Xiangyang, Zhang, Xiaoqing, Niu, Ke, Fu, Huazhu, and Liu, Jiang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition
Abstract: As an affordable and convenient eye scan, fundus photography holds the potential for preventing vision impairment, especially in resource-limited regions. However, fundus image degradation is common under intricate imaging environments, impacting following diagnosis and treatment. Consequently, image quality assessment (IQA) and enhancement (IQE) are essential for ensuring the clinical value and reliability of fundus images. While existing reviews offer some overview of this field, a comprehensive analysis of the interplay between IQA and IQE, along with their clinical deployment challenges, is lacking. This paper addresses this gap by providing a thorough review of fundus IQA and IQE algorithms, research advancements, and practical applications. We outline the fundamentals of the fundus photography imaging system and the associated interferences, and then systematically summarize the paradigms in fundus IQA and IQE. Furthermore, we discuss the practical challenges and solutions in deploying IQA and IQE, as well as offer insights into potential future research directions.
Published: 2025

3. Agent Laboratory: Using LLM Agents as Research Assistants

Author: Schmidgall, Samuel, Su, Yusheng, Wang, Ze, Sun, Ximeng, Wu, Jialian, Yu, Xiaodong, Liu, Jiang, Liu, Zicheng, and Barsoum, Emad
Subjects: Computer Science - Human-Computer Interaction, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Historically, scientific discovery has been a lengthy and costly process, demanding substantial time and resources from initial conception to final results. To accelerate scientific discovery, reduce research costs, and improve research quality, we introduce Agent Laboratory, an autonomous LLM-based framework capable of completing the entire research process. This framework accepts a human-provided research idea and progresses through three stages--literature review, experimentation, and report writing to produce comprehensive research outputs, including a code repository and a research report, while enabling users to provide feedback and guidance at each stage. We deploy Agent Laboratory with various state-of-the-art LLMs and invite multiple researchers to assess its quality by participating in a survey, providing human feedback to guide the research process, and then evaluate the final paper. We found that: (1) Agent Laboratory driven by o1-preview generates the best research outcomes; (2) The generated machine learning code is able to achieve state-of-the-art performance compared to existing methods; (3) Human involvement, providing feedback at each stage, significantly improves the overall quality of research; (4) Agent Laboratory significantly reduces research expenses, achieving an 84% decrease compared to previous autonomous research methods. We hope Agent Laboratory enables researchers to allocate more effort toward creative ideation rather than low-level coding and writing, ultimately accelerating scientific discovery.
Published: 2025

4. AIF-SFDA: Autonomous Information Filter-driven Source-Free Domain Adaptation for Medical Image Segmentation

Author: Li, Haojin, Li, Heng, Chen, Jianyu, Zhong, Rihan, Niu, Ke, Fu, Huazhu, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Decoupling domain-variant information (DVI) from domain-invariant information (DII) serves as a prominent strategy for mitigating domain shifts in the practical implementation of deep learning algorithms. However, in medical settings, concerns surrounding data collection and privacy often restrict access to both training and test data, hindering the empirical decoupling of information by existing methods. To tackle this issue, we propose an Autonomous Information Filter-driven Source-free Domain Adaptation (AIF-SFDA) algorithm, which leverages a frequency-based learnable information filter to autonomously decouple DVI and DII. Information Bottleneck (IB) and Self-supervision (SS) are incorporated to optimize the learnable frequency filter. The IB governs the information flow within the filter to diminish redundant DVI, while SS preserves DII in alignment with the specific task and image modality. Thus, the autonomous information filter can overcome domain shifts relying solely on target data. A series of experiments covering various medical image modalities and segmentation tasks were conducted to demonstrate the benefits of AIF-SFDA through comparisons with leading algorithms and ablation studies. The code is available at https://github.com/JingHuaMan/AIF-SFDA., Comment: 9 pages total (7 pages main text, 2 pages references), 6 figures, accepted by AAAI 2025
Published: 2025

5. COph100: A comprehensive fundus image registration dataset from infants constituting the 'RIDIRP' database

Author: Hu, Yan, Gong, Mingdao, Qiu, Zhongxi, Liu, Jiabao, Shen, Hongli, Yuan, Mingzhen, Zhang, Xiaoqing, Li, Heng, Lu, Hai, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computational Engineering, Finance, and Science
Abstract: Retinal image registration is vital for diagnostic therapeutic applications within the field of ophthalmology. Existing public datasets, focusing on adult retinal pathologies with high-quality images, have limited number of image pairs and neglect clinical challenges. To address this gap, we introduce COph100, a novel and challenging dataset known as the Comprehensive Ophthalmology Retinal Image Registration dataset for infants with a wide range of image quality issues constituting the public "RIDIRP" database. COph100 consists of 100 eyes, each with 2 to 9 examination sessions, amounting to a total of 491 image pairs carefully selected from the publicly available dataset. We manually labeled the corresponding ground truth image points and provided automatic vessel segmentation masks for each image. We have assessed COph100 in terms of image quality and registration outcomes using state-of-the-art algorithms. This resource enables a robust comparison of retinal registration methodologies and aids in the analysis of disease progression in infants, thereby deepening our understanding of pediatric ophthalmic conditions., Comment: 12 pages, 7 figures
Published: 2025

6. Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework

Author: Liu, Jiang, Li, Bolin, Li, Haoyuan, Lin, Tianwei, Zhang, Wenqiao, Zhong, Tao, Yu, Zhelun, Wei, Jinghao, Cheng, Hao, Jiang, Hao, Lv, Zheqi, Li, Juncheng, Tang, Siliang, and Zhuang, Yueting
Subjects: Computer Science - Artificial Intelligence
Abstract: Efficient multimodal large language models (EMLLMs), in contrast to multimodal large language models (MLLMs), reduce model size and computational costs and are often deployed on resource-constrained devices. However, due to data privacy concerns, existing open-source EMLLMs rarely have access to private domain-specific data during the pre-training process, making them difficult to directly apply in device-specific domains, such as certain business scenarios. To address this weakness, this paper focuses on the efficient adaptation of EMLLMs to private domains, specifically in two areas: 1) how to reduce data requirements, and 2) how to avoid parameter fine-tuning. Specifically, we propose a tun\textbf{\underline{I}}ng-free, a\textbf{\underline{D}}aptiv\textbf{\underline{E}}, univers\textbf{\underline{AL}} \textbf{\underline{Prompt}} Optimization Framework, abbreviated as \textit{\textbf{\ourmethod{}}} which consists of two stages: 1) Predefined Prompt, based on the reinforcement searching strategy, generate a prompt optimization strategy tree to acquire optimization priors; 2) Prompt Reflection initializes the prompt based on optimization priors, followed by self-reflection to further search and refine the prompt. By doing so, \ourmethod{} elegantly generates the ``ideal prompts'' for processing private domain-specific data. Note that our method requires no parameter fine-tuning and only a small amount of data to quickly adapt to the data distribution of private data. Extensive experiments across multiple tasks demonstrate that our proposed \ourmethod{} significantly improves both efficiency and performance compared to baselines.
Published: 2024

7. Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Author: Sun, Yanpeng, Hao, Jing, Zhu, Ke, Liu, Jiang-Jiang, Zhao, Yuxiang, Li, Xiaofan, Zhang, Gang, Li, Zechao, and Wang, Jingdong
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Training Large Multimodality Models (LMMs) relies on descriptive image caption that connects image and language. Existing methods either distill the caption from the LMM models or construct the captions from the internet images or by human. We propose to leverage off-the-shelf visual specialists, which were trained from annotated images initially not for image captioning, for enhancing the image caption. Our approach, named DCE, explores object low-level and fine-grained attributes (e.g., depth, emotion and fine-grained categories) and object relations (e.g., relative location and human-object-interaction (HOI)), and combine the attributes into the descriptive caption. Experiments demonstrate that such visual specialists are able to improve the performance for visual understanding tasks as well as reasoning that benefits from more accurate visual understanding. We will release the source code and the pipeline so that other visual specialists are easily combined into the pipeline. The complete source code of DCE pipeline and datasets will be available at \url{https://github.com/syp2ysy/DCE}., Comment: An open-source data engine for generating detailed image captions
Published: 2024

8. SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer

Author: Chen, Hao, Wang, Ze, Li, Xiang, Sun, Ximeng, Chen, Fangyi, Liu, Jiang, Wang, Jindong, Raj, Bhiksha, Liu, Zicheng, and Barsoum, Emad
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Efficient image tokenization with high compression ratios remains a critical challenge for training generative models. We present SoftVQ-VAE, a continuous image tokenizer that leverages soft categorical posteriors to aggregate multiple codewords into each latent token, substantially increasing the representation capacity of the latent space. When applied to Transformer-based architectures, our approach compresses 256x256 and 512x512 images using as few as 32 or 64 1-dimensional tokens. Not only does SoftVQ-VAE show consistent and high-quality reconstruction, more importantly, it also achieves state-of-the-art and significantly faster image generation results across different denoising-based generative models. Remarkably, SoftVQ-VAE improves inference throughput by up to 18x for generating 256x256 images and 55x for 512x512 images while achieving competitive FID scores of 1.78 and 2.21 for SiT-XL. It also improves the training efficiency of the generative models by reducing the number of training iterations by 2.3x while maintaining comparable performance. With its fully-differentiable design and semantic-rich latent space, our experiment demonstrates that SoftVQ-VAE achieves efficient tokenization without compromising generation quality, paving the way for more efficient generative models. Code and model are released., Comment: Code and model: https://github.com/Hhhhhhao/continuous_tokenizer
Published: 2024

9. M$^{3}$D: A Multimodal, Multilingual and Multitask Dataset for Grounded Document-level Information Extraction

Author: Liu, Jiang, Li, Bobo, Yang, Xinran, Yang, Na, Fei, Hao, Zhang, Mingyao, Li, Fei, and Ji, Donghong
Subjects: Computer Science - Computation and Language
Abstract: Multimodal information extraction (IE) tasks have attracted increasing attention because many studies have shown that multimodal information benefits text information extraction. However, existing multimodal IE datasets mainly focus on sentence-level image-facilitated IE in English text, and pay little attention to video-based multimodal IE and fine-grained visual grounding. Therefore, in order to promote the development of multimodal IE, we constructed a multimodal multilingual multitask dataset, named M$^{3}$D, which has the following features: (1) It contains paired document-level text and video to enrich multimodal information; (2) It supports two widely-used languages, namely English and Chinese; (3) It includes more multimodal IE tasks such as entity recognition, entity chain extraction, relation extraction and visual grounding. In addition, our dataset introduces an unexplored theme, i.e., biography, enriching the domains of multimodal IE resources. To establish a benchmark for our dataset, we propose an innovative hierarchical multimodal IE model. This model effectively leverages and integrates multimodal information through a Denoised Feature Fusion Module (DFFM). Furthermore, in non-ideal scenarios, modal information is often incomplete. Thus, we designed a Missing Modality Construction Module (MMCM) to alleviate the issues caused by missing modalities. Our model achieved an average performance of 53.80% and 53.77% on four tasks in English and Chinese datasets, respectively, which set a reasonable standard for subsequent research. In addition, we conducted more analytical experiments to verify the effectiveness of our proposed module. We believe that our work can promote the development of the field of multimodal IE., Comment: 14 pages, 9 figures, 6 tables
Published: 2024

10. Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry

Author: Hou, Wenjun, Cheng, Yi, Xu, Kaishuai, Hu, Yan, Li, Wenjie, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Comprehensively understanding surgical scenes in Surgical Visual Question Answering (Surgical VQA) requires reasoning over multiple objects. Previous approaches address this task using cross-modal fusion strategies to enhance reasoning ability. However, these methods often struggle with limited scene understanding and question comprehension, and some rely on external resources (e.g., pre-extracted object features), which can introduce errors and generalize poorly across diverse surgical environments. To address these challenges, we propose SCAN, a simple yet effective memory-augmented framework that leverages Multimodal LLMs to improve surgical context comprehension via Self-Contained Inquiry. SCAN operates autonomously, generating two types of memory for context augmentation: Direct Memory (DM), which provides multiple candidates (or hints) to the final answer, and Indirect Memory (IM), which consists of self-contained question-hint pairs to capture broader scene context. DM directly assists in answering the question, while IM enhances understanding of the surgical scene beyond the immediate query. Reasoning over these object-aware memories enables the model to accurately interpret images and respond to questions. Extensive experiments on three publicly available Surgical VQA datasets demonstrate that SCAN achieves state-of-the-art performance, offering improved accuracy and robustness across various surgical scenarios.
Published: 2024

11. Gravitational Wave-Sensitive Photonic-Like Electronic Transport in Graphene for Efficient High-Frequency Gravitational Wave Detection

Author: Shen, Shen, Lin, Liangzhong, Li, Linfu, Liu, Jiang-Tao, Wu, Xin, and Wu, Zhenhua
Subjects: Physics - Instrumentation and Detectors, Astrophysics - Instrumentation and Methods for Astrophysics, General Relativity and Quantum Cosmology
Abstract: High-frequency gravitational waves are crucial for understanding the very early universe and distinguishing between various cosmological models, but detecting them remains a significant challenge. We investigated the effects of high-frequency gravitational waves on photonic-like electronic transport in graphene. The results show that, unlike the influence of gravitational waves on the propagation of light, the influence of gravitational waves on photonic-like electronic transport can accumulate not only in real space but also in $k$-space. This makes photonic-like electronic transport under gravitational waves similar to the propagation of light in a medium where the refractive index varies dramatically due to gravitational waves, and with shorter wavelengths. As a result, the relative intensity variation in photonic-like electronic transport under gravitational waves exceeds that of a laser interferometer with the same arm length by six orders of magnitude. At low temperatures, the influence of phonons on photon-like transport in the context of high-frequency gravitational waves can be ignored. These findings indicate a strong interaction between gravitational waves and electron transport, which helps to deepen the understanding of the interaction between gravitational waves and matter, and provides a different method for detecting high-frequency gravitational waves.
Published: 2024

12. Design and Experimental Application of a Radon Diffusion Chamber for Determining Diffusion Coefficients in Membrane Materials

Author: Wu, Liang-Yu, Si, Lin, Wu, Yuan, Gao, Zhi-Xing, Heng, Yue-Kun, Li, Yuan, Liu, Jiang-Lai, Luo, Xiao-Lan, Ma, Fei, Meng, Yue, Qian, Xiao-Hui, Qian, Zhi-Cheng, Wang, Hao, Yun, You-Hui, Zhang, Gao-Feng, and Zhao, Jie
Subjects: Physics - Instrumentation and Detectors, High Energy Physics - Experiment
Abstract: In recent years, the issue of radon emanation and diffusion has become a critical concern for rare decay experiments, such as JUNO and PandaX-4T. This paper introduces a detector design featuring a symmetric radon detector cavity for the quantitative assessment of membrane materials' radon blocking capabilities. The performance of this design is evaluated through the application of Fick's Law and the diffusion equation considering material solubility. Our detector has completed measurements of radon diffusion coefficients for four types of membrane materials currently used in experiments, which also confirms the rationality of this detector design. The findings are instrumental in guiding the selection and evaluation of optimal materials for radon shielding to reduce radon background, contributing to boost sensitivities of rare event research., Comment: 7 pages, 10 figures and 2 tables
Published: 2024

13. Accelerating Non-Maximum Suppression: A Graph Theory Perspective

Author: Si, King-Siong, Sun, Lu, Zhang, Weizhan, Gong, Tieliang, Wang, Jiahao, Liu, Jiang, and Sun, Hao
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Non-maximum suppression (NMS) is an indispensable post-processing step in object detection. With the continuous optimization of network models, NMS has become the ``last mile'' to enhance the efficiency of object detection. This paper systematically analyzes NMS from a graph theory perspective for the first time, revealing its intrinsic structure. Consequently, we propose two optimization methods, namely QSI-NMS and BOE-NMS. The former is a fast recursive divide-and-conquer algorithm with negligible mAP loss, and its extended version (eQSI-NMS) achieves optimal complexity of $\mathcal{O}(n\log n)$. The latter, concentrating on the locality of NMS, achieves an optimization at a constant level without an mAP loss penalty. Moreover, to facilitate rapid evaluation of NMS methods for researchers, we introduce NMS-Bench, the first benchmark designed to comprehensively assess various NMS methods. Taking the YOLOv8-N model on MS COCO 2017 as the benchmark setup, our method QSI-NMS provides $6.2\times$ speed of original NMS on the benchmark, with a $0.1\%$ decrease in mAP. The optimal eQSI-NMS, with only a $0.3\%$ mAP decrease, achieves $10.7\times$ speed. Meanwhile, BOE-NMS exhibits $5.1\times$ speed with no compromise in mAP.
Published: 2024

14. Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation

Author: Huang, Hongzhe, Liu, Jiang, Yu, Zhewen, Cai, Li, Jiao, Dian, Zhang, Wenqiao, Tang, Siliang, Li, Juncheng, Jiang, Hao, Li, Haoyuan, and Zhuang, Yueting
Subjects: Computer Science - Artificial Intelligence
Abstract: Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning. Such automatic instruction collection pipelines, however, inadvertently introduce significant variability in data quality. This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment, to compress this vast corpus of machine-generated multimodal instructions to a compact and high-quality form: (i) For human preference alignment, we have collected a machine-generated multimodal instruction dataset and established a comprehensive set of both subjective and objective criteria to guide the data quality assessment critically from human experts. By doing so, a reward model was trained on the annotated dataset to internalize the nuanced human understanding of instruction alignment. (ii) For LLM preference alignment, given the instruction selected by the reward model, we propose leveraging the inner LLM used in MLLM to align the writing style of visual instructions with that of the inner LLM itself, resulting in LLM-aligned instruction improvement. Extensive experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%. Impressively, by aggressively reducing the training instructions from 158k to 14k (9$\times$ smaller), our model consistently outperforms its full-size dataset counterpart across various MLLM benchmarks. Our project is available at https://github.com/DCDmllm/Align2LLaVA.
Published: 2024

15. Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving

Author: Xiao, Lingyu, Liu, Jiang-Jiang, Yang, Sen, Li, Xiaofan, Ye, Xiaoqing, Yang, Wankou, and Wang, Jingdong
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the feasibility of deriving decisions from an autoregressive world model by addressing these challenges through the formulation of multiple probabilistic hypotheses. We propose LatentDriver, a framework models the environment's next states and the ego vehicle's possible actions as a mixture distribution, from which a deterministic control signal is then derived. By incorporating mixture modeling, the stochastic nature of decisionmaking is captured. Additionally, the self-delusion problem is mitigated by providing intermediate actions sampled from a distribution to the world model. Experimental results on the recently released close-loop benchmark Waymax demonstrate that LatentDriver surpasses state-of-the-art reinforcement learning and imitation learning methods, achieving expert-level performance. The code and models will be made available at https://github.com/Sephirex-X/LatentDriver.
Published: 2024

16. Statistical Characteristics of the Proton Isotropy Boundary

Author: Wilkins, Colin, Angelopoulos, Vassilis, Artemyev, Anton, Runov, Andrei, Zhang, Xiao-Jia, Liu, Jiang, and Tsai, Ethan
Subjects: Physics - Space Physics, Physics - Geophysics, Physics - Plasma Physics
Abstract: Using particle data from the ELFIN satellites, we present a statistical study of 284 proton isotropy boundary events on the nightside magnetosphere, characterizing their occurrence and distribution in local time, latitude (L-shell), energy, and precipitating energy flux, as a function of geomagnetic activity. For a given charged particle species and energy, its isotropy boundary (IB) is the magnetic latitude poleward of which persistently isotropic pitch-angle distributions ($J_{prec}/J_{perp}\sim 1$) are first observed to occur. This isotropization is interpreted as resulting from magnetic field-line curvature (FLC) scattering in the equatorial magnetosphere. We find that proton IBs are observed under all observed activity levels, spanning 16 to 05 MLT with $\sim$100% occurrence between 19 and 03 MLT, trending toward 60% at dawn/dusk. These results are also compared with electron IB properties observed using ELFIN, where we find similar trends across local time and activity, with the onset in $\geq$50 keV proton IB occurring on average 2 L-shells lower, and providing between 3 and 10 times as much precipitating power. Proton IBs typically span $64^\circ$-$66^\circ$ in magnetic latitude (5-6 in L-shell), corresponding to the outer edge of the ring current, tending toward lower IGRF latitudes as geomagnetic activity increases. The IBs were found to commonly occur 0.3-2.1 Re beyond the plasmapause. Proton IBs typically span $<$50 keV to $\sim$1 MeV in energy, maximizing near 22 MLT, and decreasing to a typical upper limit of 300-400 keV toward dawn and dusk, with peak observed isotropic energy increasing by $\sim$500 keV during active intervals. These results suggest that FLC in the vicinity of IBs can provide a substantial depletion mechanism for energetic protons, with the total nightside precipitating power from FLC-scattering found to be on the order of 100 MW, at times $\geq$10 GW.
Published: 2024

17. EasyChauffeur: A Baseline Advancing Simplicity and Efficiency on Waymax

Author: Xiao, Lingyu, Liu, Jiang-Jiang, Ye, Xiaoqing, Yang, Wankou, and Wang, Jingdong
Subjects: Computer Science - Robotics
Abstract: Recent advancements in deep-learning-based driving planners have primarily focused on elaborate network engineering, yielding limited improvements. This paper diverges from conventional approaches by exploring three fundamental yet underinvestigated aspects: training policy, data efficiency, and evaluation robustness. We introduce EasyChauffeur, a reproducible and effective planner for both imitation learning (IL) and reinforcement learning (RL) on Waymax, a GPU-accelerated simulator. Notably, our findings indicate that the incorporation of on-policy RL significantly boosts performance and data efficiency. To further enhance this efficiency, we propose SNE-Sampling, a novel method that selectively samples data from the encoder's latent space, substantially improving EasyChauffeur's performance with RL. Additionally, we identify a deficiency in current evaluation methods, which fail to accurately assess the robustness of different planners due to significant performance drops from minor changes in the ego vehicle's initial state. In response, we propose Ego-Shifting, a new evaluation setting for assessing planners' robustness. Our findings advocate for a shift from a primary focus on network architectures to adopting a holistic approach encompassing training strategies, data efficiency, and robust evaluation methods.
Published: 2024

18. TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

Author: Lin, Tianwei, Liu, Jiang, Zhang, Wenqiao, Li, Zhaocheng, Dai, Yang, Li, Haoyuan, Yu, Zhelun, He, Wanggui, Li, Juncheng, Jiang, Hao, Tang, Siliang, and Zhuang, Yueting
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straightforward solution is to introduce task-specific LoRA modules as domain experts, leveraging the modeling of multiple experts' capabilities and thus enhancing the general capability of multi-task learning. Despite promising, these additional components often add complexity to the training and inference process, contravening the efficient characterization of PEFT designed for. Considering this, we introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts, and thus achieving the right balance of effectiveness and efficiency: (i) For collaboration, a novel knowledge-sharing and -organizing mechanism is devised to appropriately reduce the scale of matrix operations, thereby boosting the training and inference speed. (ii) For competition, we propose leveraging a game-theoretic interaction mechanism for experts, encouraging experts to transfer their domain-specific knowledge while facing diverse downstream tasks, and thus enhancing the performance. By doing so, TeamLoRA elegantly connects the experts as a "Team" with internal collaboration and competition, enabling a faster and more accurate PEFT paradigm for multi-task learning. To validate the superiority of TeamLoRA, we curate a comprehensive multi-task evaluation(CME) benchmark to thoroughly assess the capability of multi-task learning. Experiments conducted on our CME and other benchmarks indicate the effectiveness and efficiency of TeamLoRA. Our project is available at https://github.com/Lin-Tianwei/TeamLoRA.
Published: 2024

19. MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation

Author: Xiao, Zunjie, Zhang, Xiaoqing, Higashita, Risa, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Ophthalmic image segmentation serves as a critical foundation for ocular disease diagnosis. Although fully convolutional neural networks (CNNs) are commonly employed for segmentation, they are constrained by inductive biases and face challenges in establishing long-range dependencies. Transformer-based models address these limitations but introduce substantial computational overhead. Recently, a simple yet efficient Multilayer Perceptron (MLP) architecture was proposed for image classification, achieving competitive performance relative to advanced transformers. However, its effectiveness for ophthalmic image segmentation remains unexplored. In this paper, we introduce MM-UNet, an efficient Mixed MLP model tailored for ophthalmic image segmentation. Within MM-UNet, we propose a multi-scale MLP (MMLP) module that facilitates the interaction of features at various depths through a grouping strategy, enabling simultaneous capture of global and local information. We conducted extensive experiments on both a private anterior segment optical coherence tomography (AS-OCT) image dataset and a public fundus image dataset. The results demonstrated the superiority of our MM-UNet model in comparison to state-of-the-art deep segmentation networks., Comment: OMIA2024
Published: 2024

20. Technical Application of Recovery Nickel and Molybdenum from Nickel-Molybdenum Ore by Shearing-Enhanced Ammonia Leaching: Technical Application of Recovery Nickel and Molybdenum from Nickel-Molybdenum Ore by Shearing-Enhanced Ammonia Leaching

Author: Tang, Shiyang, Yang, Jianguang, Long, Wei, Nan, Tianxiang, Zhu, Qiang, and Liu, Jiang
Published: 2025
Full Text: View/download PDF

21. G3BP2 promotes tumor progression and gemcitabine resistance in PDAC via regulating PDIA3-DKC1-hENT in a stress granules-dependent manner

Author: Xing, Fa-liang, Li, Bo-rui, Fang, Ying-jin, Liang, Chen, Liu, Jiang, Wang, Wei, Xu, Jin, Yu, Xian-jun, Qin, Yi, and Zhang, Bo
Published: 2025
Full Text: View/download PDF

22. Reaction Mechanism and Technical Application of Recovery FeAs, Sb, and Na2CO3 from Arsenic-Alkali Residue

Author: Tang, Shiyang, Yang, Jian-guang, Nan, Tian-xiang, Zhu, Qiang, Liu, Jiang, Zhu, Rong-bo, Su, An-bang, Fan, Xiao-bin, and Tang, Chao-bo
Published: 2025
Full Text: View/download PDF

23. Restricted Gröbner Basis Theory for Normalization of Indexed Differential Riemann Metric Tensor Polynomials

Author: Liu, Jiang, Wang, Tao, Hou, Pingjing, Ni, Feng, Zhu, Kun, and Zhang, Leyi
Published: 2025
Full Text: View/download PDF

24. A Study of an Active Noise Control System with Continuous Tracking of the Human Ear and Noise Segmentation Control: A Study of An Active Noise Control System

Author: Su, Hehua, Liu, Jiang, Liu, Anqing, and Li, Baogang
Published: 2025
Full Text: View/download PDF

25. eIF4A1 exacerbates myocardial ischemia-reperfusion injury in mice by promoting nuclear translocation of transgelin/p53

Author: Li, Dan-yang, Hu, Xiao-xi, Tian, Zhong-rui, Ning, Qi-wen, Liu, Jiang-qi, Yue, Ying, Yuan, Wei, Meng, Bo, Li, Jia-liang, Zhang, Yang, Pan, Zhen-wei, Zhuang, Yu-ting, and Lu, Yan-jie
Published: 2025
Full Text: View/download PDF

26. Dissolution Characteristics and Behavior of Antimony, Sulfur, and Antimony Sulfide in Molten NaCl–KCl

Author: Zhu, Qiang, Yang, Jianguang, Zhou, Wei, Nan, Tianxiang, Tang, Shiyang, Liu, Jiang, Huang, Jiadeng, and Tang, Chaobo
Published: 2024
Full Text: View/download PDF

27. White matter microstructure damage measured by automated fiber quantification correlates with pain symptoms in lung cancer patients

Author: Ran, Li, Liu, Jiang, Lan, Xiaosong, Zhou, Xiaoyu, Tan, Yong, Zhang, Jing, Tang, Yu, Tang, Lin, Zhang, Jiuquan, and Liu, Daihong
Published: 2024
Full Text: View/download PDF

28. Factors associated with the distribution of brain metastases in lung cancer: a retrospective study

Author: Hu, Yixin, Lei, Weiwei, Xin, Enhui, Cheng, Tan, Liu, Jiang, Tang, Yu, Lai, Yong, Yu, Hong, Tan, Yong, Yang, Jing, Huang, Junhao, Liu, Daihong, and Zhang, Jiuquan
Published: 2024
Full Text: View/download PDF

29. Physico-Chemical Properties of Molten NaCl–KCl–Na2S System for Sulfide Electrolytic Desulfurization

Author: Zhu, Qiang, Yang, Jianguang, Ding, Ruize, Nan, Tianxiang, Tang, Shiyang, Liu, Jiang, and Tang, Chaobo
Published: 2024
Full Text: View/download PDF

30. Design of 3D Printing Control System Based on Embedded Web

Author: Chen, Ting, Zhang, Fan, Wang, HongLi, Shen, Hong, Liu, Jiang, Howlett, Robert J., Series Editor, Jain, Lakhmi C., Series Editor, Kountcheva, Roumiana, editor, Nakamatsu, Kazumi, editor, and Patnaik, Srikanta, editor
Published: 2025
Full Text: View/download PDF

31. Data Heterogeneity-Aware Personalized Federated Learning for Diagnosis

Author: Lin, Huiyan, Li, Heng, Li, Haojin, Yu, Xiangyang, Yu, Kuai, Liang, Chenhao, Fu, Huazhu, Liu, Jiang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhavna, Antony, editor, Chen, Hao, editor, Fang, Huihui, editor, Fu, Huazhu, editor, and Lee, Cecilia S., editor
Published: 2025
Full Text: View/download PDF

32. MM-UNet: A Mixed MLP Architecture for Improved Ophthalmic Image Segmentation

Author: Xiao, Zunjie, Zhang, Xiaoqing, Higashita, Risa, Liu, Jiang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Bhavna, Antony, editor, Chen, Hao, editor, Fang, Huihui, editor, Fu, Huazhu, editor, and Lee, Cecilia S., editor
Published: 2025
Full Text: View/download PDF

33. Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?

Author: Lin, Ziqin, Li, Heng, Li, Zinan, Fu, Huazhu, and Liu, Jiang
Subjects: Electrical Engineering and Systems Science - Image and Video Processing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. This LFM has shown promising performance in fundus disease diagnosis across multiple datasets. On the other hand, deep learning models have long been challenged by dataset quality issues, such as image quality and dataset bias. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Specifically, we explored the following questions: Is LFM more robust to image quality? Is LFM affected by dataset bias? Can fine-tuning techniques alleviate these effects? Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks. Furthermore, we discovered that overall fine-tuning is an effective adapter for LFM to mitigate the impact of dataset quality issues., Comment: 10 pages, 6 figures
Published: 2024

34. Edge Intelligence Optimization for Large Language Model Inference with Batching and Quantization

Author: Zhang, Xinyuan, Liu, Jiang, Xiong, Zehui, Huang, Yudong, Xie, Gaochang, and Zhang, Ran
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Networking and Internet Architecture
Abstract: Generative Artificial Intelligence (GAI) is taking the world by storm with its unparalleled content creation ability. Large Language Models (LLMs) are at the forefront of this movement. However, the significant resource demands of LLMs often require cloud hosting, which raises issues regarding privacy, latency, and usage limitations. Although edge intelligence has long been utilized to solve these challenges by enabling real-time AI computation on ubiquitous edge resources close to data sources, most research has focused on traditional AI models and has left a gap in addressing the unique characteristics of LLM inference, such as considerable model size, auto-regressive processes, and self-attention mechanisms. In this paper, we present an edge intelligence optimization problem tailored for LLM inference. Specifically, with the deployment of the batching technique and model quantization on resource-limited edge devices, we formulate an inference model for transformer decoder-based LLMs. Furthermore, our approach aims to maximize the inference throughput via batch scheduling and joint allocation of communication and computation resources, while also considering edge resource constraints and varying user requirements of latency and accuracy. To address this NP-hard problem, we develop an optimal Depth-First Tree-Searching algorithm with online tree-Pruning (DFTSP) that operates within a feasible time complexity. Simulation results indicate that DFTSP surpasses other batching benchmarks in throughput across diverse user settings and quantization techniques, and it reduces time complexity by over 45% compared to the brute-force searching method.
Published: 2024

35. RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation

Author: Li, Heng, Li, Haojin, Chen, Jianyu, Qiu, Zhongxi, Fu, Huazhu, Wang, Lidai, Hu, Yan, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinical settings due to the scarcity of annotated data resulting from the professional and private nature of medical data. Despite the existence of decent solutions, many of them are hindered in clinical settings due to limitations in data collection and computational complexity. To tackle domain shifts in data-scarce medical scenarios, we propose a Random frequency filtering enabled Single-source Domain Generalization algorithm (RaffeSDG), which promises robust out-of-domain inference with segmentation models trained on a single-source domain. A filter-based data augmentation strategy is first proposed to promote domain variability within a single-source domain by introducing variations in frequency space and blending homologous samples. Then Gaussian filter-based structural saliency is also leveraged to learn robust representations across augmented samples, further facilitating the training of generalizable segmentation models. To validate the effectiveness of RaffeSDG, we conducted extensive experiments involving out-of-domain inference on segmentation tasks for three human tissues imaged by four diverse modalities. Through thorough investigations and comparisons, compelling evidence was observed in these experiments, demonstrating the potential and generalizability of RaffeSDG. The code is available at https://github.com/liamheng/Non-IID_Medical_Image_Segmentation.
Published: 2024

36. LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation

Author: Zheng, Haoyu, Zhang, Wenqiao, Wang, Yaoke, Zhou, Hao, Liu, Jiang, Li, Juncheng, Lv, Zheqi, Tang, Siliang, and Zhuang, Yueting
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e.g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance. Despite being promising, existing methods focus on texture- or non-rigid-based visual manipulation, which struggles to produce the fine-grained animation of smooth text-conditioned image morphing without fine-tuning, i.e., due to their highly unstructured latent space. In this paper, we introduce a tuning-free LLM-driven attention control framework, encapsulated by the progressive process of LLM planning, prompt-Aware editing, StablE animation geneRation, abbreviated as LASER. LASER employs a large language model (LLM) to refine coarse descriptions into detailed prompts, guiding pre-trained text-to-image models for subsequent image generation. We manipulate the model's spatial features and self-attention mechanisms to maintain animation integrity and enable seamless morphing directly from text prompts, eliminating the need for additional fine-tuning or annotations. Our meticulous control over spatial features and self-attention ensures structural consistency in the images. This paper presents a novel framework integrating LLMs with text-to-image models to create high-quality animations from a single text input. We also propose a Text-conditioned Image-to-Animation Benchmark to validate the effectiveness and efficacy of LASER. Extensive experiments demonstrate that LASER produces impressive, consistent, and efficient results in animation generation, positioning it as a powerful tool for advanced digital content creation., Comment: 10 pages, 7 figures
Published: 2024

37. Instrument-tissue Interaction Detection Framework for Surgical Video Understanding

Author: Lin, Wenjun, Hu, Yan, Fu, Huazhu, Yang, Mingming, Chng, Chin-Boon, Kawasaki, Ryo, Chui, Cheekong, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.
Published: 2024

38. Medical Image Registration and Its Application in Retinal Images: A Review

Author: Nie, Qiushi, Zhang, Xiaoqing, Hu, Yan, Gong, Mingdao, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, these surveys have not systematically summarized methodologies of existing medical image registration methods. To this end, we provide a comprehensive review of these methods from traditional and deep learning-based directions, aiming to help audiences understand the development of medical image registration quickly. In particular, we review recent advances in retinal image registration at the end of each section, which has not attracted much attention. Additionally, we also discuss the current challenges of retinal image registration and provide insights and prospects for future research.
Published: 2024

39. HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

Author: Zhang, Wenqiao, Lin, Tianwei, Liu, Jiang, Shu, Fangxun, Li, Haoyuan, Zhang, Lei, Wanggui, He, Zhou, Hao, Lv, Zheqi, Jiang, Hao, Li, Juncheng, Tang, Siliang, and Zhuang, Yueting
Subjects: Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks. The prevailing MLLM paradigm, \emph{e.g.}, LLaVA, transforms visual features into text-like tokens using a \emph{static} vision-language mapper, thereby enabling \emph{static} LLMs to develop the capability to comprehend visual information through visual instruction tuning. Although promising, the \emph{static} tuning strategy~\footnote{The static tuning refers to the trained model with static parameters.} that shares the same parameters may constrain performance across different downstream multimodal tasks. In light of this, we introduce HyperLLaVA, which involves adaptive tuning of the projector and LLM parameters, in conjunction with a dynamic visual expert and language expert, respectively. These experts are derived from HyperNetworks, which generates adaptive parameter shifts through visual and language guidance, enabling dynamic projector and LLM modeling in two-stage training. Our experiments demonstrate that our solution significantly surpasses LLaVA on existing MLLM benchmarks, including MME, MMBench, SEED-Bench, and LLaVA-Bench. ~\footnote{Our project is available on the link https://github.com/DCDmllm/HyperLLaVA}.
Published: 2024

40. Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud?

Author: Gao, Chenqiang, Liu, Chuandong, Shu, Jun, Liu, Fangcen, Liu, Jiang, Yang, Luyu, Gao, Xinbo, and Meng, Deyu
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised datasets is notoriously costly. To reduce the cumbersome data annotation process, we propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene. Such a sparse annotation strategy could significantly reduce the heavy annotation burden, while inexact and incomplete sparse supervision may severely deteriorate the detection performance. To address this issue, we develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation in a unified learning scheme. Using sparse annotations as seeds, we progressively generate confident fully-annotated scenes based on designing a missing-annotated instance mining module and reliable background mining module. Our proposed method produces competitive results when compared with SOTA weakly-supervised methods using the same or even more annotation costs. Besides, compared with SOTA fully-supervised methods, we achieve on-par or even better performance on the KITTI dataset with about 5x less annotation cost, and 90% of their performance on the Waymo dataset with about 15x less annotation cost. The additional unlabeled training scenes could further boost the performance. The code will be available at https://github.com/gaocq/SS3D2.
Published: 2024

41. Flattening Singular Values of Factorized Convolution for Medical Images

Author: Feng, Zexin, Zeng, Na, Fang, Jiansheng, Wang, Xingyue, Lu, Xiaoxi, Meng, Heng, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Convolutional neural networks (CNNs) have long been the paradigm of choice for robust medical image processing (MIP). Therefore, it is crucial to effectively and efficiently deploy CNNs on devices with different computing capabilities to support computer-aided diagnosis. Many methods employ factorized convolutional layers to alleviate the burden of limited computational resources at the expense of expressiveness. To this end, given weak medical image-driven CNN model optimization, a Singular value equalization generalizer-induced Factorized Convolution (SFConv) is proposed to improve the expressive power of factorized convolutions in MIP models. We first decompose the weight matrix of convolutional filters into two low-rank matrices to achieve model reduction. Then minimize the KL divergence between the two low-rank weight matrices and the uniform distribution, thereby reducing the number of singular value directions with significant variance. Extensive experiments on fundus and OCTA datasets demonstrate that our SFConv yields competitive expressiveness over vanilla convolutions while reducing complexity.
Published: 2024

42. Perovskite/silicon tandem solar cells with bilayer interface passivation

Author: Liu, Jiang, He, Yongcai, Ding, Lei, Zhang, Hua, Li, Qiaoyan, Jia, Lingbo, Yu, Jia, Lau, Ting Wai, Li, Minghui, Qin, Yuan, Gu, Xiaobing, Zhang, Fu, Li, Qibo, Yang, Ying, Zhao, Shuangshuang, Wu, Xiaoyong, Liu, Jie, Liu, Tong, Gao, Yajun, Wang, Yonglei, Dong, Xin, Chen, Hao, Li, Ping, Zhou, Tianxiang, Yang, Miao, Ru, Xiaoning, Peng, Fuguo, Yin, Shi, Qu, Minghao, Zhao, Dongming, Zhao, Zhiguo, Li, Menglei, Guo, Penghui, Yan, Hui, Xiao, Chuanxiao, Xiao, Ping, Yin, Jun, Zhang, Xiaohong, Li, Zhenguo, He, Bo, and Xu, Xixiang
Published: 2024
Full Text: View/download PDF

43. Multiple Machine Learning Identifies Key Gene PHLDA1 Suppressing NAFLD Progression

Author: Yang, Zhenwei, Chen, Zhiqin, Wang, Jingchao, Li, Yizhang, Zhang, Hailin, Xiang, Yu, Zhang, Yuwei, Shao, Zhaozhao, Wu, Pei, Lu, Ding, Lin, Huajiang, Tong, Zhaowei, Liu, Jiang, and Dong, Quan
Published: 2024
Full Text: View/download PDF

44. Prediction of the lowest energy structure of Sn(BH4)2 and its electronic properties

Author: Wu, Junlin, Ma, Li, Liu, Jiang, Chen, Hongshan, Chen, Xiaofeng, Zhang, Sa, and Wang, Xiaoxia
Published: 2024
Full Text: View/download PDF

45. Evaluation of fragility fracture risk using deep learning based on ultrasound radio frequency signal

Author: Luo, Wenqiang, Wu, Jionglin, Chen, Zhiwei, Guo, Peidong, Zhang, Qi, Lei, Baiying, Chen, Zhong, Li, Shixun, Li, Changchuan, Liu, Haoxian, Ma, Teng, Liu, Jiang, Chen, Xiaoyi, and Ding, Yue
Published: 2024
Full Text: View/download PDF

46. Pre-seismic anomaly analysis of the Turkey earthquakes on 6 February 2023 based on multi-source satellite observations

Author: Liu, Jiang, Zhang, Xuemin, Yang, Muping, Yang, Yang, He, Fuxiu, Xue, Lian, Yao, Xianliang, Yang, Xianhe, Wu, Weiwei, and Qiu, Guilan
Published: 2024
Full Text: View/download PDF

47. Synthesis of highly stable Ni nanoparticles via electrostatic self-assembly for enhanced hydrogen storage of MgH2

Author: Tang, Qin-Ke, Liu, Jiang-Chuan, Shi, Rui, Zhu, Yun-Feng, Zhang, Ji-Guang, Liu, Ya-Na, Wang, Jun, Zhang, Yao, Hu, Xiao-Hui, Liu, Zhi-Bin, and Li, Li-Quan
Published: 2024
Full Text: View/download PDF

48. ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation

Author: Hou, Wenjun, Cheng, Yi, Xu, Kaishuai, Hu, Yan, Li, Wenjie, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
Abstract: Previous research on radiology report generation has made significant progress in terms of increasing the clinical accuracy of generated reports. In this paper, we emphasize another crucial quality that it should possess, i.e., inter-report consistency, which refers to the capability of generating consistent reports for semantically equivalent radiographs. This quality is even of greater significance than the overall report accuracy in terms of ensuring the system's credibility, as a system prone to providing conflicting results would severely erode users' trust. Regrettably, existing approaches struggle to maintain inter-report consistency, exhibiting biases towards common patterns and susceptibility to lesion variants. To address this issue, we propose ICON, which improves the inter-report consistency of radiology report generation. Aiming to enhance the system's ability to capture similarities in semantically equivalent lesions, our approach first involves extracting lesions from input images and examining their characteristics. Then, we introduce a lesion-aware mixup technique to ensure that the representations of the semantically equivalent lesions align with the same attributes, achieved through a linear combination during the training phase. Extensive experiments on three publicly available chest X-ray datasets verify the effectiveness of our approach, both in terms of improving the consistency and accuracy of the generated reports.
Published: 2024

49. Scale Optimization Using Evolutionary Reinforcement Learning for Object Detection on Drone Imagery

Author: Zhang, Jialu, Yang, Xiaoying, He, Wentao, Ren, Jianfeng, Zhang, Qian, Zhao, Titian, Bai, Ruibin, He, Xiangjian, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Object detection in aerial imagery presents a significant challenge due to large scale variations among objects. This paper proposes an evolutionary reinforcement learning agent, integrated within a coarse-to-fine object detection framework, to optimize the scale for more effective detection of objects in such images. Specifically, a set of patches potentially containing objects are first generated. A set of rewards measuring the localization accuracy, the accuracy of predicted labels, and the scale consistency among nearby patches are designed in the agent to guide the scale optimization. The proposed scale-consistency reward ensures similar scales for neighboring objects of the same category. Furthermore, a spatial-semantic attention mechanism is designed to exploit the spatial semantic relations between patches. The agent employs the proximal policy optimization strategy in conjunction with the evolutionary strategy, effectively utilizing both the current patch status and historical experience embedded in the agent. The proposed model is compared with state-of-the-art methods on two benchmark datasets for object detection on drone imagery. It significantly outperforms all the compared methods., Comment: Accepted by AAAI 2024
Published: 2023

50. VSR-Net: Vessel-like Structure Rehabilitation Network with Graph Clustering

Author: Ye, Haili, Zhang, Xiaoqing, Hu, Yan, Fu, Huazhu, and Liu, Jiang
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: The morphologies of vessel-like structures, such as blood vessels and nerve fibres, play significant roles in disease diagnosis, e.g., Parkinson's disease. Deep network-based refinement segmentation methods have recently achieved promising vessel-like structure segmentation results. There are still two challenges: (1) existing methods have limitations in rehabilitating subsection ruptures in segmented vessel-like structures; (2) they are often overconfident in predicted segmentation results. To tackle these two challenges, this paper attempts to leverage the potential of spatial interconnection relationships among subsection ruptures from the structure rehabilitation perspective. Based on this, we propose a novel Vessel-like Structure Rehabilitation Network (VSR-Net) to rehabilitate subsection ruptures and improve the model calibration based on coarse vessel-like structure segmentation results. VSR-Net first constructs subsection rupture clusters with Curvilinear Clustering Module (CCM). Then, the well-designed Curvilinear Merging Module (CMM) is applied to rehabilitate the subsection ruptures to obtain the refined vessel-like structures. Extensive experiments on five 2D/3D medical image datasets show that VSR-Net significantly outperforms state-of-the-art (SOTA) refinement segmentation methods with lower calibration error. Additionally, we provide quantitative analysis to explain the morphological difference between the rehabilitation results of VSR-Net and ground truth (GT), which is smaller than SOTA methods and GT, demonstrating that our method better rehabilitates vessel-like structures by restoring subsection ruptures.
Published: 2023

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

12,479 results on '"Liu, Jiang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources