Author: "Hu, Xiaolin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hu, Xiaolin"' showing total 1,809 results

Start Over Author "Hu, Xiaolin"

1,809 results on '"Hu, Xiaolin"'

1. Faster-GCG: Efficient Discrete Optimization Jailbreak Attacks against Aligned Large Language Models

Author: Li, Xiao, Li, Zhuhong, Li, Qiongxiu, Lee, Bingze, Cui, Jinghao, and Hu, Xiaolin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Cryptography and Security
Abstract: Aligned Large Language Models (LLMs) have demonstrated remarkable performance across various tasks. However, LLMs remain susceptible to jailbreak adversarial attacks, where adversaries manipulate prompts to elicit malicious responses that aligned LLMs should have avoided. Identifying these vulnerabilities is crucial for understanding the inherent weaknesses of LLMs and preventing their potential misuse. One pioneering work in jailbreaking is the GCG attack, a discrete token optimization algorithm that seeks to find a suffix capable of jailbreaking aligned LLMs. Despite the success of GCG, we find it suboptimal, requiring significantly large computational costs, and the achieved jailbreaking performance is limited. In this work, we propose Faster-GCG, an efficient adversarial jailbreak method by delving deep into the design of GCG. Experiments demonstrate that Faster-GCG can surpass the original GCG with only 1/10 of the computational cost, achieving significantly higher attack success rates on various open-source aligned LLMs. In addition, We demonstrate that Faster-GCG exhibits improved attack transferability when testing on closed-sourced LLMs such as ChatGPT.
Published: 2024

2. Natural Language Induced Adversarial Images

Author: Zhu, Xiaopei, Xu, Peiyang, Zeng, Guanning, Dong, Yingpeng, and Hu, Xiaolin
Subjects: Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia
Abstract: Research of adversarial attacks is important for AI security because it shows the vulnerability of deep learning models and helps to build more robust models. Adversarial attacks on images are most widely studied, which include noise-based attacks, image editing-based attacks, and latent space-based attacks. However, the adversarial examples crafted by these methods often lack sufficient semantic information, making it challenging for humans to understand the failure modes of deep learning models under natural conditions. To address this limitation, we propose a natural language induced adversarial image attack method. The core idea is to leverage a text-to-image model to generate adversarial images given input prompts, which are maliciously constructed to lead to misclassification for a target model. To adopt commercial text-to-image models for synthesizing more natural adversarial images, we propose an adaptive genetic algorithm (GA) for optimizing discrete adversarial prompts without requiring gradients and an adaptive word space reduction method for improving query efficiency. We further used CLIP to maintain the semantic consistency of the generated images. In our experiments, we found that some high-frequency semantic information such as "foggy", "humid", "stretching", etc. can easily cause classifier errors. This adversarial semantic information exists not only in generated images but also in photos captured in the real world. We also found that some adversarial semantic information can be transferred to unknown classification tasks. Furthermore, our attack method can transfer to different text-to-image models (e.g., Midjourney, DALL-E 3, etc.) and image classifiers. Our code is available at: https://github.com/zxp555/Natural-Language-Induced-Adversarial-Images., Comment: Carmera-ready version. To appear in ACM MM 2024
Published: 2024

3. Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization

Author: Yao, Xinhao, Qian, Hongjin, Hu, Xiaolin, Xu, Gengze, and Liu, Yong
Subjects: Computer Science - Machine Learning
Abstract: Large Language Models (LLMs), built on Transformer architectures, exhibit remarkable generalization across a wide range of tasks. However, fine-tuning these models for specific tasks remains resource-intensive due to their extensive parameterization. In this paper, we investigate two remarkable phenomena observed during the fine-tuning of LLMs, particularly focusing on the attention mechanism: (1) Different Impact, optimizing the $\mathbf{W}_v$ matrix significantly improves performance over optimizing the $\mathbf{W}_k$ matrix. Fine-tuning only the $\mathbf{W}_q$ and $\mathbf{W}_v$ matrices is computationally efficient, delivering results that are comparable to, or even better than, fine-tuning all three matrices $\mathbf{W}_q$, $\mathbf{W}_k$, and $\mathbf{W}_v$. (2) Efficient Convergence, employing distinct learning rates for these matrices is crucial for optimal performance, with a higher learning rate for the $\mathbf{W}_v$ matrix expediting convergence. However, theoretical analyses of these phenomena are still relatively limited. We present a theoretical analysis of these phenomena from two perspectives: (i) Generalization, where we demonstrate that fine-tuning only $\mathbf{W}_q$ and $\mathbf{W}_v$ improves generalization bounds, enhances memory efficiency, and (ii) Optimization, where we emphasize that the feature learning of the attention mechanism is efficient, particularly when using distinct learning rates for the matrices, which leads to more effective fine-tuning. Building on these insights, we propose a new strategy that improves fine-tuning efficiency in terms of both storage and time. Experimental results on benchmark datasets validate the effectiveness of this approach, supporting our theoretical findings. Our analysis lays the theoretical groundwork for configuring and improving lightweight algorithms in LLMs fine-tuning.
Published: 2024

4. SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios

Author: Li, Kai, Sang, Wendi, Zeng, Chang, Yang, Runxuan, Chen, Guo, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: The systematic evaluation of speech separation and enhancement models under moving sound source conditions typically requires extensive data comprising diverse scenarios. However, real-world datasets often contain insufficient data to meet the training and evaluation requirements of models. Although synthetic datasets offer a larger volume of data, their acoustic simulations lack realism. Consequently, neither real-world nor synthetic datasets effectively fulfill practical needs. To address these issues, we introduce SonicSim, a synthetic toolkit de-designed to generate highly customizable data for moving sound sources. SonicSim is developed based on the embodied AI simulation platform, Habitat-sim, supporting multi-level adjustments, including scene-level, microphone-level, and source-level, thereby generating more diverse synthetic data. Leveraging SonicSim, we constructed a moving sound source benchmark dataset, SonicSet, using the Librispeech, the Freesound Dataset 50k (FSD50K) and Free Music Archive (FMA), and 90 scenes from the Matterport3D to evaluate speech separation and enhancement models. Additionally, to validate the differences between synthetic data and real-world data, we randomly selected 5 hours of raw data without reverberation from the SonicSet validation set to record a real-world speech separation dataset, which was then compared with the corresponding synthetic datasets. Similarly, we utilized the real-world speech enhancement dataset RealMAN to validate the acoustic gap between other synthetic datasets and the SonicSet dataset for speech enhancement. The results indicate that the synthetic data generated by SonicSim can effectively generalize to real-world scenarios. Demo and code are publicly available at https://cslikai.cn/SonicSim/., Comment: Technical report
Published: 2024

5. TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation

Author: Xu, Mohan, Li, Kai, Chen, Guo, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In recent years, much speech separation research has focused primarily on improving model performance. However, for low-latency speech processing systems, high efficiency is equally important. Therefore, we propose a speech separation model with significantly reduced parameters and computational costs: Time-frequency Interleaved Gain Extraction and Reconstruction network (TIGER). TIGER leverages prior knowledge to divide frequency bands and compresses frequency information. We employ a multi-scale selective attention module to extract contextual features, while introducing a full-frequency-frame attention module to capture both temporal and frequency contextual information. Additionally, to more realistically evaluate the performance of speech separation models in complex acoustic environments, we introduce a dataset called EchoSet. This dataset includes noise and more realistic reverberation (e.g., considering object occlusions and material properties), with speech from two speakers overlapping at random proportions. Experimental results showed that models trained on EchoSet had better generalization ability than those trained on other datasets to the data collected in the physical world, which validated the practical value of the EchoSet. On EchoSet and real-world data, TIGER significantly reduces the number of parameters by 94.3% and the MACs by 95.3% while achieving performance surpassing state-of-the-art (SOTA) model TF-GridNet. This is the first speech separation model with fewer than 1 million parameters that achieves performance comparable to the SOTA model., Comment: Technical report, demo page: https://cslikai.cn/TIGER/
Published: 2024

6. PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

Author: Wang, Qibin, Hu, Xiaolin, Xu, Weikai, Liu, Wei, Luan, Jian, and Wang, Bin
Subjects: Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Low-rank adaptation (LoRA) and its variants have recently gained much interest due to their ability to avoid excessive inference costs. However, LoRA still encounters the following challenges: (1) Limitation of low-rank assumption; and (2) Its initialization method may be suboptimal. To this end, we propose PMSS(Pre-trained Matrices Skeleton Selection), which enables high-rank updates with low costs while leveraging semantic and linguistic information inherent in pre-trained weight. It achieves this by selecting skeletons from the pre-trained weight matrix and only learning a small matrix instead. Experiments demonstrate that PMSS outperforms LoRA and other fine-tuning methods across tasks with much less trainable parameters. We demonstrate its effectiveness, especially in handling complex tasks such as DROP benchmark(+3.4%/+5.9% on LLaMA2-7B/13B) and math reasoning(+12.89%/+5.61%/+3.11% on LLaMA2-7B, Mistral-7B and Gemma-7B of GSM8K). The code and model will be released soon.
Published: 2024

7. Perfect Gradient Inversion in Federated Learning: A New Paradigm from the Hidden Subset Sum Problem

Author: Li, Qiongxiu, Luo, Lixia, Gini, Agnese, Ji, Changlong, Hu, Zhanhao, Li, Xiao, Fang, Chengfang, Shi, Jie, and Hu, Xiaolin
Subjects: Computer Science - Cryptography and Security
Abstract: Federated Learning (FL) has emerged as a popular paradigm for collaborative learning among multiple parties. It is considered privacy-friendly because local data remains on personal devices, and only intermediate parameters -- such as gradients or model updates -- are shared. Although gradient inversion is widely viewed as a common attack method in FL, analytical research on reconstructing input training samples from shared gradients remains limited and is typically confined to constrained settings like small batch sizes. In this paper, we aim to overcome these limitations by addressing the problem from a cryptographic perspective. We mathematically formulate the input reconstruction problem using the gradient information shared in FL as the Hidden Subset Sum Problem (HSSP), an extension of the well-known NP-complete Subset Sum Problem (SSP). Leveraging this formulation allows us to achieve perfect input reconstruction, thereby mitigating issues such as dependence on label diversity and underperformance with large batch sizes that hinder existing empirical gradient inversion attacks. Moreover, our analysis provides insights into why empirical input reconstruction attacks degrade with larger batch sizes. By modeling the problem as HSSP, we demonstrate that the batch size $ B $ significantly affects attack complexity, with time complexity reaching $ \mathcal{O}(B^9) $. We further show that applying secure data aggregation techniques -- such as homomorphic encryption and secure multiparty computation -- provides a strong defense by increasing the time complexity to $ \mathcal{O}(N^9 B^9) $, where $ N $ is the number of local clients in FL. To the best of our knowledge, this is the first work to rigorously analyze privacy issues in FL by modeling them as HSSP, providing a concrete analytical foundation for further exploration and development of defense strategies.
Published: 2024

8. ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Author: Li, Xiao, Sun, Wenxuan, Chen, Huanran, Li, Qiongxiu, Liu, Yining, He, Yingzhe, Shi, Jie, and Hu, Xiaolin
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliability of existing evaluations for DiffPure is questionable, as they rely on weak adaptive attacks. In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. ADBM directly constructs a reverse bridge from the diffused adversarial data back to its original clean examples, enhancing the purification capabilities of the original diffusion models. Through theoretical analysis and experimental validation across various scenarios, ADBM has proven to be a superior and robust defense mechanism, offering significant promise for practical applications., Comment: 20 pages
Published: 2024

9. PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition

Author: Li, Xiao, Liu, Yining, Dong, Na, Qin, Sitian, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Deep learning-based object recognition systems can be easily fooled by various adversarial perturbations. One reason for the weak robustness may be that they do not have part-based inductive bias like the human recognition process. Motivated by this, several part-based recognition models have been proposed to improve the adversarial robustness of recognition. However, due to the lack of part annotations, the effectiveness of these methods is only validated on small-scale nonstandard datasets. In this work, we propose PIN++, short for PartImageNet++, a dataset providing high-quality part segmentation annotations for all categories of ImageNet-1K (IN-1K). With these annotations, we build part-based methods directly on the standard IN-1K dataset for robust recognition. Different from previous two-stage part-based models, we propose a Multi-scale Part-supervised Model (MPM), to learn a robust representation with part annotations. Experiments show that MPM yielded better adversarial robustness on the large-scale IN-1K over strong baselines across various attack settings. Furthermore, MPM achieved improved robustness on common corruptions and several out-of-distribution datasets. The dataset, together with these results, enables and encourages researchers to explore the potential of part-based models in more real applications., Comment: Accepted by ECCV2024
Published: 2024

10. Controllable Navigation Instruction Generation with Chain of Thought Prompting

Author: Kong, Xianghao, Chen, Jinyu, Wang, Wenguan, Su, Hang, Hu, Xiaolin, Yang, Yi, and Liu, Si
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation environment. Leveraging the capabilities of Large Language Models (LLMs), we propose C-Instructor, which utilizes the chain-of-thought-style prompt for style-controllable and content-controllable instruction generation. Firstly, we propose a Chain of Thought with Landmarks (CoTL) mechanism, which guides the LLM to identify key landmarks and then generate complete instructions. CoTL renders generated instructions more accessible to follow and offers greater controllability over the manipulation of landmark objects. Furthermore, we present a Spatial Topology Modeling Task to facilitate the understanding of the spatial structure of the environment. Finally, we introduce a Style-Mixed Training policy, harnessing the prior knowledge of LLMs to enable style control for instruction generation based on different prompts within a single model instance. Extensive experiments demonstrate that instructions generated by C-Instructor outperform those generated by previous methods in text metrics, navigation guidance evaluation, and user studies., Comment: ECCV 2024
Published: 2024
Full Text: View/download PDF

11. CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

Author: Gao, Jiawei, Wang, Ziqin, Xiao, Zeqi, Wang, Jingbo, Wang, Tai, Cao, Jinkun, Hu, Xiaolin, Liu, Si, Dai, Jifeng, and Pang, Jiangmiao
Subjects: Computer Science - Robotics, Computer Science - Artificial Intelligence
Abstract: Enabling humanoid robots to clean rooms has long been a pursued dream within humanoid research communities. However, many tasks require multi-humanoid collaboration, such as carrying large and heavy furniture together. Given the scarcity of motion capture data on multi-humanoid collaboration and the efficiency challenges associated with multi-agent learning, these tasks cannot be straightforwardly addressed using training paradigms designed for single-agent scenarios. In this paper, we introduce Cooperative Human-Object Interaction (CooHOI), a framework designed to tackle the challenge of multi-humanoid object transportation problem through a two-phase learning paradigm: individual skill learning and subsequent policy transfer. First, a single humanoid character learns to interact with objects through imitation learning from human motion priors. Then, the humanoid learns to collaborate with others by considering the shared dynamics of the manipulated object using centralized training and decentralized execution (CTDE) multi-agent RL algorithms. When one agent interacts with the object, resulting in specific object dynamics changes, the other agents learn to respond appropriately, thereby achieving implicit communication and coordination between teammates. Unlike previous approaches that relied on tracking-based methods for multi-humanoid HOI, CooHOI is inherently efficient, does not depend on motion capture data of multi-humanoid interactions, and can be seamlessly extended to include more participants and a wide range of object types., Comment: Project website: https://gao-jiawei.com/Research/CooHOI/. NeurIPS 2024 Spotlight
Published: 2024

12. Full-ECE: A Metric For Token-level Calibration on Large Language Models

Author: Liu, Han, Zhang, Yupeng, Wang, Bingning, Chen, Weipeng, and Hu, Xiaolin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Deep Neural Networks (DNNs) excel in various domains but face challenges in providing accurate uncertainty estimates, which are crucial for high-stakes applications. Large Language Models (LLMs) have recently emerged as powerful tools, demonstrating exceptional performance in language tasks. However, traditional calibration metrics such as Expected Calibration Error (ECE) and classwise-ECE (cw-ECE) are inadequate for LLMs due to their vast vocabularies, data complexity, and distributional focus. To address this, we propose a novel calibration concept called full calibration and introduce its corresponding metric, Full-ECE. Full-ECE evaluates the entire predicted probability distribution, offering a more accurate and robust measure of calibration for LLMs.
Published: 2024

13. Enhancing In-Context Learning Performance with just SVD-Based Weight Pruning: A Theoretical Perspective

Author: Yao, Xinhao, Hu, Xiaolin, Yang, Shenzhi, and Liu, Yong
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Pre-trained large language models (LLMs) based on Transformer have demonstrated striking in-context learning (ICL) abilities. With a few demonstration input-label pairs, they can predict the label for an unseen input without any parameter updates. In this paper, we show an exciting phenomenon that SVD-based weight pruning can enhance ICL performance, and more surprising, pruning weights in deep layers often results in more stable performance improvements than in shallow layers. However, the underlying mechanism of those findings still remains an open question. To reveal those findings, we conduct an in-depth theoretical analysis by presenting the implicit gradient descent (GD) trajectories of ICL and giving the mutual information based generalization bounds of ICL via full implicit GD trajectories. This helps us reasonably explain the surprising experimental findings. Besides, based on all our experimental and theoretical insights, we intuitively propose a simple, model-compression and derivative-free algorithm for downstream tasks in enhancing ICL inference. Experiments on benchmark datasets and open source LLMs display the method effectiveness\footnote{The code is available at \url{https://github.com/chen123CtrlS/EnhancingICL_SVDPruning}.}., Comment: NeurIPS 2024
Published: 2024

14. Accurate and Reliable Predictions with Mutual-Transport Ensemble

Author: Liu, Han, Cui, Peng, Wang, Bingning, Zhu, Jun, and Hu, Xiaolin
Subjects: Computer Science - Artificial Intelligence
Abstract: Deep Neural Networks (DNNs) have achieved remarkable success in a variety of tasks, especially when it comes to prediction accuracy. However, in complex real-world scenarios, particularly in safety-critical applications, high accuracy alone is not enough. Reliable uncertainty estimates are crucial. Modern DNNs, often trained with cross-entropy loss, tend to be overconfident, especially with ambiguous samples. To improve uncertainty calibration, many techniques have been developed, but they often compromise prediction accuracy. To tackle this challenge, we propose the ``mutual-transport ensemble'' (MTE). This approach introduces a co-trained auxiliary model and adaptively regularizes the cross-entropy loss using Kullback-Leibler (KL) divergence between the prediction distributions of the primary and auxiliary models. We conducted extensive studies on various benchmarks to validate the effectiveness of our method. The results show that MTE can simultaneously enhance both accuracy and uncertainty calibration. For example, on the CIFAR-100 dataset, our MTE method on ResNet34/50 achieved significant improvements compared to previous state-of-the-art method, with absolute accuracy increases of 2.4%/3.7%, relative reductions in ECE of $42.3%/29.4%, and relative reductions in classwise-ECE of 11.6%/15.3%.
Published: 2024

15. Infrared Adversarial Car Stickers

Author: Zhu, Xiaopei, Liu, Yuqiu, Hu, Zhanhao, Li, Jianmin, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Infrared physical adversarial examples are of great significance for studying the security of infrared AI systems that are widely used in our lives such as autonomous driving. Previous infrared physical attacks mainly focused on 2D infrared pedestrian detection which may not fully manifest its destructiveness to AI systems. In this work, we propose a physical attack method against infrared detectors based on 3D modeling, which is applied to a real car. The goal is to design a set of infrared adversarial stickers to make cars invisible to infrared detectors at various viewing angles, distances, and scenes. We build a 3D infrared car model with real infrared characteristics and propose an infrared adversarial pattern generation method based on 3D mesh shadow. We propose a 3D control points-based mesh smoothing algorithm and use a set of smoothness loss functions to enhance the smoothness of adversarial meshes and facilitate the sticker implementation. Besides, We designed the aluminum stickers and conducted physical experiments on two real Mercedes-Benz A200L cars. Our adversarial stickers hid the cars from Faster RCNN, an object detector, at various viewing angles, distances, and scenes. The attack success rate (ASR) was 91.49% for real cars. In comparison, the ASRs of random stickers and no sticker were only 6.21% and 0.66%, respectively. In addition, the ASRs of the designed stickers against six unseen object detectors such as YOLOv3 and Deformable DETR were between 73.35%-95.80%, showing good transferability of the attack performance across detectors., Comment: Accepted by CVPR 2024
Published: 2024

16. SPMamba: State-space model is all you need in speech separation

Author: Li, Kai, Chen, Guo, Yang, Runxuan, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Existing CNN-based speech separation models face local receptive field limitations and cannot effectively capture long time dependencies. Although LSTM and Transformer-based speech separation models can avoid this problem, their high complexity makes them face the challenge of computational resources and inference efficiency when dealing with long audio. To address this challenge, we introduce an innovative speech separation method called SPMamba. This model builds upon the robust TF-GridNet architecture, replacing its traditional BLSTM modules with bidirectional Mamba modules. These modules effectively model the spatiotemporal relationships between the time and frequency dimensions, allowing SPMamba to capture long-range dependencies with linear computational complexity. Specifically, the bidirectional processing within the Mamba modules enables the model to utilize both past and future contextual information, thereby enhancing separation performance. Extensive experiments conducted on public datasets, including WSJ0-2Mix, WHAM!, and Libri2Mix, as well as the newly constructed Echo2Mix dataset, demonstrated that SPMamba significantly outperformed existing state-of-the-art models, achieving superior results while also reducing computational complexity. These findings highlighted the effectiveness of SPMamba in tackling the intricate challenges of speech separation in complex environments., Comment: Technical Report. Code is available at https://github.com/JusperLee/SPMamba
Published: 2024

17. SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Author: Zhang, Gang, Chen, Junnan, Gao, Guohuan, Li, Jianmin, Liu, Si, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent works have attempted to construct fully sparse detectors to solve this issue; nevertheless, the resulting models either rely on a complex multi-stage pipeline or exhibit inferior performance. In this work, we propose SAFDNet, a straightforward yet highly effective architecture, tailored for fully sparse 3D object detection. In SAFDNet, an adaptive feature diffusion strategy is designed to address the center feature missing problem. We conducted extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets. SAFDNet performed slightly better than the previous SOTA on the first two datasets but much better on the last dataset, which features long-range detection, verifying the efficacy of SAFDNet in scenarios where long-range detection is required. Notably, on Argoverse2, SAFDNet surpassed the previous best hybrid detector HEDNet by 2.6% mAP while being 2.1x faster, and yielded 2.1% mAP gains over the previous best sparse detector FSDv2 while being 1.3x faster. The code will be available at https://github.com/zhanggang001/HEDNet., Comment: Accepted by CVPR 2024 (Oral)
Published: 2024

18. TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion

Author: Pegg, Samuel, Li, Kai, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more parameters to achieve better separation performance. In this paper, we present an audio-visual speech separation model called Top-Down-Fusion Net (TDFNet), a state-of-the-art (SOTA) model for audio-visual speech separation, which builds upon the architecture of TDANet, an audio-only speech separation method. TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters. On the LRS2-2Mix dataset, TDFNet achieves a performance increase of up to 10\% across all performance metrics compared with the previous SOTA method CTCNet. Remarkably, these results are achieved using fewer parameters and only 28\% of the multiply-accumulate operations (MACs) of CTCNet. In essence, our method presents a highly effective and efficient solution to the challenges of speech separation within the audio-visual domain, making significant strides in harnessing visual information optimally.
Published: 2024
Full Text: View/download PDF

19. Real-time autonomous path planning for dynamic wildfire monitoring with uneven importance

Author: Islam, S M Towhidul and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

20. LncRNA CCRR maintains Ca2+ homeostasis against myocardial infarction through the FTO-SERCA2a pathway

Author: Yang, Hua, Xuan, Lina, Wang, Shengjie, Luo, Huishan, Duan, Xiaomeng, Guo, Jianjun, Cui, Shijia, Xin, Jieru, Hao, Junwei, Li, Xiufang, Chen, Jun, Sun, Feihan, Hu, Xiaolin, Li, Siyun, Zhang, Ying, Jiao, Lei, Yang, Baofeng, and Sun, Lihua
Published: 2024
Full Text: View/download PDF

21. Neural Retrievers are Biased Towards LLM-Generated Content

Author: Dai, Sunhao, Zhou, Yuqi, Pang, Liang, Liu, Weihao, Hu, Xiaolin, Liu, Yong, Zhang, Xiao, Wang, Gang, and Xu, Jun
Subjects: Computer Science - Information Retrieval, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
Abstract: Recently, the emergence of large language models (LLMs) has revolutionized the paradigm of information retrieval (IR) applications, especially in web search, by generating vast amounts of human-like texts on the Internet. As a result, IR systems in the LLM era are facing a new challenge: the indexed documents are now not only written by human beings but also automatically generated by the LLMs. How these LLM-generated documents influence the IR systems is a pressing and still unexplored question. In this work, we conduct a quantitative evaluation of IR models in scenarios where both human-written and LLM-generated texts are involved. Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher. We refer to this category of biases in neural retrievers towards the LLM-generated content as the \textbf{source bias}. Moreover, we discover that this bias is not confined to the first-stage neural retrievers, but extends to the second-stage neural re-rankers. Then, in-depth analyses from the perspective of text compression indicate that LLM-generated texts exhibit more focused semantics with less noise, making it easier for neural retrieval models to semantic match. To mitigate the source bias, we also propose a plug-and-play debiased constraint for the optimization objective, and experimental results show its effectiveness. Finally, we discuss the potential severe concerns stemming from the observed source bias and hope our findings can serve as a critical wake-up call to the IR community and beyond. To facilitate future explorations of IR in the LLM era, the constructed two new benchmarks are available at https://github.com/KID-22/Source-Bias., Comment: KDD 2024
Published: 2023
Full Text: View/download PDF

22. HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds

Author: Zhang, Gang, Chen, Junnan, Gao, Guohuan, Li, Jianmin, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: 3D object detection in point clouds is important for autonomous driving systems. A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene. Existing high-performance methods typically employ 3D sparse convolutional neural networks with small kernels to extract features. To reduce computational costs, these methods resort to submanifold sparse convolutions, which prevent the information exchange among spatially disconnected features. Some recent approaches have attempted to address this problem by introducing large-kernel convolutions or self-attention mechanisms, but they either achieve limited accuracy improvements or incur excessive computational costs. We propose HEDNet, a hierarchical encoder-decoder network for 3D object detection, which leverages encoder-decoder blocks to capture long-range dependencies among features in the spatial space, particularly for large and distant objects. We conducted extensive experiments on the Waymo Open and nuScenes datasets. HEDNet achieved superior detection accuracy on both datasets than previous state-of-the-art methods with competitive efficiency. The code is available at https://github.com/zhanggang001/HEDNet., Comment: Accepted by NeurIPS 2023
Published: 2023

23. Interaction mechanism of discharge readiness between discharge teaching and post-discharge outcomes in gynecological inpatients: a mediation analysis

Author: You, Huaxuan, Lei, Anjiang, Liu, Li, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

24. RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation

Author: Pegg, Samuel, Li, Kai, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Audio-visual speech separation methods aim to integrate different modalities to generate high-quality separated speech, thereby enhancing the performance of downstream tasks such as speech recognition. Most existing state-of-the-art (SOTA) models operate in the time domain. However, their overly simplistic approach to modeling acoustic features often necessitates larger and more computationally intensive models in order to achieve SOTA performance. In this paper, we present a novel time-frequency domain audio-visual speech separation method: Recurrent Time-Frequency Separation Network (RTFS-Net), which applies its algorithms on the complex time-frequency bins yielded by the Short-Time Fourier Transform. We model and capture the time and frequency dimensions of the audio independently using a multi-layered RNN along each dimension. Furthermore, we introduce a unique attention-based fusion technique for the efficient integration of audio and visual information, and a new mask separation approach that takes advantage of the intrinsic spectral nature of the acoustic features for a clearer separation. RTFS-Net outperforms the prior SOTA method in both inference speed and separation quality while reducing the number of parameters by 90% and MACs by 83%. This is the first time-frequency domain audio-visual speech separation method to outperform all contemporary time-domain counterparts., Comment: Accepted by The Twelfth International Conference on Learning Representations (ICLR) 2024, see https://openreview.net/forum?id=PEuDO2EiDr
Published: 2023

25. IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation

Author: Li, Kai, Yang, Runxuan, Sun, Fuchun, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent research has made significant progress in designing fusion modules for audio-visual speech separation. However, they predominantly focus on multi-modal fusion at a single temporal scale of auditory and visual features without employing selective attention mechanisms, which is in sharp contrast with the brain. To address this issue, We propose a novel model called Intra- and Inter-Attention Network (IIANet), which leverages the attention mechanism for efficient audio-visual feature fusion. IIANet consists of two types of attention blocks: intra-attention (IntraA) and inter-attention (InterA) blocks, where the InterA blocks are distributed at the top, middle and bottom of IIANet. Heavily inspired by the way how human brain selectively focuses on relevant content at various temporal scales, these blocks maintain the ability to learn modality-specific features and enable the extraction of different semantics from audio-visual features. Comprehensive experiments on three standard audio-visual separation benchmarks (LRS2, LRS3, and VoxCeleb2) demonstrate the effectiveness of IIANet, outperforming previous state-of-the-art methods while maintaining comparable inference time. In particular, the fast version of IIANet (IIANet-fast) has only 7% of CTCNet's MACs and is 40% faster than CTCNet on CPUs while achieving better separation quality, showing the great potential of attention mechanism for efficient and effective multimodal fusion., Comment: 18 pages, 6 figures
Published: 2023

26. NP-SemiSeg: When Neural Processes meet Semi-Supervised Semantic Segmentation

Author: Wang, Jianfeng, Massiceti, Daniela, Hu, Xiaolin, Pavlovic, Vladimir, and Lukasiewicz, Thomas
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Semi-supervised semantic segmentation involves assigning pixel-wise labels to unlabeled images at training time. This is useful in a wide range of real-world applications where collecting pixel-wise labels is not feasible in time or cost. Current approaches to semi-supervised semantic segmentation work by predicting pseudo-labels for each pixel from a class-wise probability distribution output by a model. If the predicted probability distribution is incorrect, however, this leads to poor segmentation results, which can have knock-on consequences in safety critical systems, like medical images or self-driving cars. It is, therefore, important to understand what a model does not know, which is mainly achieved by uncertainty quantification. Recently, neural processes (NPs) have been explored in semi-supervised image classification, and they have been a computationally efficient and effective method for uncertainty quantification. In this work, we move one step forward by adapting NPs to semi-supervised semantic segmentation, resulting in a new model called NP-SemiSeg. We experimentally evaluated NP-SemiSeg on the public benchmarks PASCAL VOC 2012 and Cityscapes, with different training settings, and the results verify its effectiveness., Comment: Appear at ICML2023. Source codes are available at: https://github.com/Jianf-Wang/NP-SemiSeg
Published: 2023

27. Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling

Author: Hu, Zhanhao, Chu, Wenda, Zhu, Xiaopei, Zhang, Hui, Zhang, Bo, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security
Abstract: Recent works have proposed to craft adversarial clothes for evading person detectors, while they are either only effective at limited viewing angles or very conspicuous to humans. We aim to craft adversarial texture for clothes based on 3D modeling, an idea that has been used to craft rigid adversarial objects such as a 3D-printed turtle. Unlike rigid objects, humans and clothes are non-rigid, leading to difficulties in physical realization. In order to craft natural-looking adversarial clothes that can evade person detectors at multiple viewing angles, we propose adversarial camouflage textures (AdvCaT) that resemble one kind of the typical textures of daily clothes, camouflage textures. We leverage the Voronoi diagram and Gumbel-softmax trick to parameterize the camouflage textures and optimize the parameters via 3D modeling. Moreover, we propose an efficient augmentation pipeline on 3D meshes combining topologically plausible projection (TopoProj) and Thin Plate Spline (TPS) to narrow the gap between digital and real-world objects. We printed the developed 3D texture pieces on fabric materials and tailored them into T-shirts and trousers. Experiments show high attack success rates of these clothes against multiple detectors., Comment: Accepted by CVPR 2023
Published: 2023

28. The development of an ALE finite element and discontinuous Galerkin method for the non-isothermal non-Newtonian FSI problem

Author: Gao, Puyang and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

29. A promising composite room temperature solid electrolyte via incorporating LLZTO into cross-linked ETPTA/PEO/SN matrix for all solid state lithium batteries

Author: Li, Bangxing, Yi, Xianlin, Xie, Zhenjun, Wu, Fei, Kang, Xing, Kang, Shuai, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

30. Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

Author: Martel, Héctor, Richter, Julius, Li, Kai, Hu, Xiaolin, and Gerkmann, Timo
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Machine Learning, Computer Science - Sound
Abstract: We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments. To this end, we adopt the Asynchronous Fully Recurrent Convolutional Neural Network (A-FRCNN), which has shown successful results in audio-only speech separation. Our architecture consists of an audio branch and a video branch, with iterative A-FRCNN blocks sharing weights for each modality. We evaluated our model in a controlled environment using the NTCD-TIMIT dataset and in-the-wild using a synthetic dataset that combines LRS3 and WHAM!. The experiments demonstrate the superiority of our model in both settings with respect to various audio-only and audio-visual baselines. Furthermore, the reduced footprint of our model makes it suitable for low resource applications., Comment: Accepted by Interspeech 2023
Published: 2023

31. Bridging the Sim-to-Real Gap from the Information Bottleneck Perspective

Author: He, Haoran, Wu, Peilin, Bai, Chenjia, Lai, Hang, Wang, Lingxiao, Pan, Ling, Hu, Xiaolin, and Zhang, Weinan
Subjects: Computer Science - Machine Learning, Computer Science - Robotics
Abstract: Reinforcement Learning (RL) has recently achieved remarkable success in robotic control. However, most works in RL operate in simulated environments where privileged knowledge (e.g., dynamics, surroundings, terrains) is readily available. Conversely, in real-world scenarios, robot agents usually rely solely on local states (e.g., proprioceptive feedback of robot joints) to select actions, leading to a significant sim-to-real gap. Existing methods address this gap by either gradually reducing the reliance on privileged knowledge or performing a two-stage policy imitation. However, we argue that these methods are limited in their ability to fully leverage the available privileged knowledge, resulting in suboptimal performance. In this paper, we formulate the sim-to-real gap as an information bottleneck problem and therefore propose a novel privileged knowledge distillation method called the Historical Information Bottleneck (HIB). In particular, HIB learns a privileged knowledge representation from historical trajectories by capturing the underlying changeable dynamic information. Theoretical analysis shows that the learned privileged knowledge representation helps reduce the value discrepancy between the oracle and learned policies. Empirical experiments on both simulated and real-world tasks demonstrate that HIB yields improved generalizability compared to previous methods. Videos of real-world experiments are available at https://sites.google.com/view/history-ib ., Comment: Accepted by CoRL 2024
Published: 2023

32. Amplification trojan network: Attack deep neural networks by amplifying their inherent weakness

Author: Hu, Zhanhao, Zhu, Jun, Zhang, Bo, and Hu, Xiaolin
Subjects: Computer Science - Cryptography and Security, Computer Science - Artificial Intelligence
Abstract: Recent works found that deep neural networks (DNNs) can be fooled by adversarial examples, which are crafted by adding adversarial noise on clean inputs. The accuracy of DNNs on adversarial examples will decrease as the magnitude of the adversarial noise increase. In this study, we show that DNNs can be also fooled when the noise is very small under certain circumstances. This new type of attack is called Amplification Trojan Attack (ATAttack). Specifically, we use a trojan network to transform the inputs before sending them to the target DNN. This trojan network serves as an amplifier to amplify the inherent weakness of the target DNN. The target DNN, which is infected by the trojan network, performs normally on clean data while being more vulnerable to adversarial examples. Since it only transforms the inputs, the trojan network can hide in DNN-based pipelines, e.g. by infecting the pre-processing procedure of the inputs before sending them to the DNNs. This new type of threat should be considered in developing safe DNNs., Comment: Published Sep 2022 in Neurocomputing
Published: 2023
Full Text: View/download PDF

33. On the Importance of Backbone to the Adversarial Robustness of Object Detectors

Author: Li, Xiao, Chen, Hang, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Machine Learning
Abstract: Object detection is a critical component of various security-sensitive applications, such as autonomous driving and video surveillance. However, existing deep learning-based object detectors are vulnerable to adversarial attacks, which poses a significant challenge to their reliability and safety. Through experiments, we found that existing works on improving the adversarial robustness of object detectors have given a false sense of security. We argue that using adversarially pre-trained backbone networks is essential for enhancing the adversarial robustness of object detectors. We propose a simple yet effective recipe for fast adversarial fine-tuning on object detectors with adversarially pre-trained backbones. Without any modifications to the structure of object detectors, our recipe achieved significantly better adversarial robustness than previous works. Moreover, we explore the potential of different modern object detectors to improve adversarial robustness using our recipe and demonstrate several interesting findings. Our empirical results set a new milestone and deepen the understanding of adversarially robust object detection. Code and trained checkpoints will be publicly available., Comment: 12 pages
Published: 2023

34. Meta Semantics: Towards better natural language understanding and reasoning

Author: Hu, Xiaolin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, 03B65(Primary) 68T50(Secondary), I.2.4, I.2.7
Abstract: Natural language understanding is one of the most challenging topics in artificial intelligence. Deep neural network methods, particularly large language module (LLM) methods such as ChatGPT and GPT-3, have powerful flexibility to adopt informal text but are weak on logical deduction and suffer from the out-of-vocabulary (OOV) problem. On the other hand, rule-based methods such as Mathematica, Semantic web, and Lean, are excellent in reasoning but cannot handle the complex and changeable informal text. Inspired by pragmatics and structuralism, we propose two strategies to solve the OOV problem and a semantic model for better natural language understanding and reasoning., Comment: 10 pages, 8 figures, 2 tables
Published: 2023

35. The development of semi-implicit partitioned ALE finite element/discontinuous Galerkin method for the non-Newtonian fluid structure interaction(非牛顿流固耦合问题的半隐式分区ALE有限元-间断有限元耦合算法研究)

Author: 高普阳(GAO Puyang) and 胡小林(HU Xiaolin)
Subjects: fluid structure interaction (fsi)(流固耦合), non-newtonian fluid(非牛顿流体), power law model(幂律模型), semi-implicit(半隐式), finite element(有限元), Electronic computers. Computer science, QA75.5-76.95, Physics, QC1-999
Abstract: In this paper, we develop the semi-implicit partitioned finite element/discontinuous Galerkin method for the non-Newtonian fluid structure interaction (NFSI) problem within the arbitrary Lagrangian-Eulerian (ALE) framework. The whole mathematical model of this problem involves the governing equations of the non-Newtonian fluid, the solid structure and the boundary conditions on the contacting interface. The structure is composed of elastic solid material and the rheological behavior of non-Newtonian fluid is described according to the power law constitutive equation. The governing system of non-Newtonian fluid is split into several sub-equations and the finite element and discontinuous Galerkin method are employed to solve the appropriate type equations. As for the governing equation of structure, the standard finite element method is chosen to deal with it. In addition, the modified Laplace moving mesh technique is utilized to handle the deformation of the structure and update the interface between the fluid and structure. The problem involving a flexible trapezoidal structure fixed on the bottom of a rectangular tank full of non-Newtonian fluid is investigated. The influences of the height of the structure, the fluid inlet velocity and the behavior of the fluid on the FSI problem are all analyzed.(针对非牛顿流固耦合问题，提出了任意拉格朗日-欧拉（arbitrary Lagrangian-Eulerian，ALE）框架下的半隐式分区有限元-间断有限元耦合算法。其数学模型主要包括非牛顿流体和固体结构的控制方程，以及流体-固体交界面上的边界条件。固体结构由弹性材料组成，非牛顿流体由幂律型本构模型描述。用分裂格式对非牛顿流体控制方程进行解耦，得到若干子方程；通过有限元、间断有限元方法求解对应恰当类型的子方程。用中心差分及标准有限元方法分别对固体结构弹性动力学方程进行时间和空间离散；用修正的Laplace移动网格方法处理固体结构的形变及流体区域网格的变化过程。最后用数值算法研究了带有圆弧顶端梯形结构体的非牛顿流固耦合问题，并详细分析了结构体高度、入口速度及流体特性等因素对流固耦合问题的影响及机理。)
Published: 2024
Full Text: View/download PDF

36. The Efficacy of Virtual Reality–Based Interventions on Pain, Anxiety, Depression, and Quality of Life Among Patients With Cancer: A Meta-analysis of Randomized Controlled Trials

Author: Chen, Yang, Chen, Xiaoli, Li, Linna, Li, Yunhuan, Yan, Qianwen, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

37. MIXPGD: Hybrid Adversarial Training for Speech Recognition Systems

Author: Huq, Aminul, Zhang, Weiyi, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Automatic speech recognition (ASR) systems based on deep neural networks are weak against adversarial perturbations. We propose mixPGD adversarial training method to improve the robustness of the model for ASR systems. In standard adversarial training, adversarial samples are generated by leveraging supervised or unsupervised methods. We merge the capabilities of both supervised and unsupervised approaches in our method to generate new adversarial samples which aid in improving model robustness. Extensive experiments and comparison across various state-of-the-art defense methods and adversarial attacks have been performed to show that mixPGD gains 4.1% WER of better performance than previous best performing models under white-box adversarial attack setting. We tested our proposed defense method against both white-box and transfer based black-box attack settings to ensure that our defense strategy is robust against various types of attacks. Empirical results on several adversarial attacks validate the effectiveness of our proposed approach.
Published: 2023

38. CEDNet: A Cascade Encoder-Decoder Network for Dense Prediction

Author: Zhang, Gang, Li, Ziyi, Tang, Chufeng, Li, Jianmin, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Multi-scale features are essential for dense prediction tasks, such as object detection, instance segmentation, and semantic segmentation. The prevailing methods usually utilize a classification backbone to extract multi-scale features and then fuse these features using a lightweight module (e.g., the fusion module in FPN and BiFPN, two typical object detection methods). However, as these methods allocate most computational resources to the classification backbone, the multi-scale feature fusion in these methods is delayed, which may lead to inadequate feature fusion. While some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder-decoder network, dubbed CEDNet, tailored for dense \mbox{prediction} tasks. All stages in CEDNet share the same encoder-decoder structure and perform multi-scale feature fusion within the decoder. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder-decoder structures: Hourglass, UNet, and FPN. When integrated into CEDNet, they performed much better than traditional methods that use a pre-designed classification backbone combined with a lightweight fusion module. Extensive experiments on object detection, instance segmentation, and semantic segmentation demonstrated the effectiveness of our method. The code is available at https://github.com/zhanggang001/CEDNet., Comment: Technical report; Code: https://github.com/zhanggang001/CEDNet
Published: 2023

39. NP-Match: Towards a New Probabilistic Model for Semi-Supervised Learning

Author: Wang, Jianfeng, Hu, Xiaolin, and Lukasiewicz, Thomas
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data. In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match. NP-Match is suited to this task for two reasons. Firstly, NP-Match implicitly compares data points when making predictions, and as a result, the prediction of each unlabeled data point is affected by the labeled data points that are similar to it, which improves the quality of pseudo-labels. Secondly, NP-Match is able to estimate uncertainty that can be used as a tool for selecting unlabeled samples with reliable pseudo-labels. Compared with uncertainty-based SSL methods implemented with Monte-Carlo (MC) dropout, NP-Match estimates uncertainty with much less computational overhead, which can save time at both the training and the testing phases. We conducted extensive experiments on five public datasets under three semi-supervised image classification settings, namely, the standard semi-supervised image classification, the imbalanced semi-supervised image classification, and the multi-label semi-supervised image classification, and NP-Match outperforms state-of-the-art (SOTA) approaches or achieves competitive results on them, which shows the effectiveness of NP-Match and its potential for SSL. The codes are at https://github.com/Jianf-Wang/NP-Match, Comment: An extended version of our previous ICML 2022 paper arXiv:2207.01066 with more experiments
Published: 2023

40. Language-Driven Anchors for Zero-Shot Adversarial Robustness

Author: Li, Xiao, Zhang, Wei, Liu, Yining, Hu, Zhanhao, Zhang, Bo, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Deep Neural Networks (DNNs) are known to be susceptible to adversarial attacks. Previous researches mainly focus on improving adversarial robustness in the fully supervised setting, leaving the challenging domain of zero-shot adversarial robustness an open question. In this work, we investigate this domain by leveraging the recent advances in large vision-language models, such as CLIP, to introduce zero-shot adversarial robustness to DNNs. We propose LAAT, a Language-driven, Anchor-based Adversarial Training strategy. LAAT utilizes the features of a text encoder for each category as fixed anchors (normalized feature embeddings) for each category, which are then employed for adversarial training. By leveraging the semantic consistency of the text encoders, LAAT aims to enhance the adversarial robustness of the image model on novel categories. However, naively using text encoders leads to poor results. Through analysis, we identified the issue to be the high cosine similarity between text encoders. We then design an expansion algorithm and an alignment cross-entropy loss to alleviate the problem. Our experimental results demonstrated that LAAT significantly improves zero-shot adversarial robustness over state-of-the-art methods. LAAT has the potential to enhance adversarial robustness by large-scale multimodal models, especially when labeled data is unavailable during training., Comment: Accepted by CVPR 2024
Published: 2023

41. Functional Knowledge Graph Towards Knowledge Application and Data Management for General Users

Author: Hu, Xiaolin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Zhang, Wenjie, editor, Tung, Anthony, editor, Zheng, Zhonglong, editor, Yang, Zhengyi, editor, Wang, Xiaoyang, editor, and Guo, Hongjie, editor
Published: 2024
Full Text: View/download PDF

42. nuScenesComplex: A More Rigorous Evaluation Framework for End-to-End Autonomous Driving Planning

Author: Nguyen, Dung, Zhang, Gang, Pan, Hujie, Hu, Xiaolin, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Le, Xinyi, editor, and Zhang, Zhijun, editor
Published: 2024
Full Text: View/download PDF

43. DCFNet: Dense Complementary Fusion for RGB-Thermal Urban Scene Perception

Author: Zhang, Yu-Wen Michael, Zhang, Gang, Hu, Xiaolin, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Le, Xinyi, editor, and Zhang, Zhijun, editor
Published: 2024
Full Text: View/download PDF

44. Improve Adversarial Robustness of MNIST Classification via Topological Data Analysis

Author: Liu, Yining, Li, Xiao, Qin, Sitian, Hu, Xiaolin, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Le, Xinyi, editor, and Zhang, Zhijun, editor
Published: 2024
Full Text: View/download PDF

45. CSFuser: A Cascade Siamese Fusion Architecture for RGB-Infrared Object Detection

Author: Li, Ziyi, Zhang, Gang, Zeng, Zhigang, Hu, Xiaolin, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Le, Xinyi, editor, and Zhang, Zhijun, editor
Published: 2024
Full Text: View/download PDF

46. PhonHuBERT: A Phoneme Transcription Tool for Song Datasets

Author: Prat, Amaury, Yang, Runxuan, Hu, Xiaolin, Hartmanis, Juris, Founding Editor, van Leeuwen, Jan, Series Editor, Hutchison, David, Editorial Board Member, Kanade, Takeo, Editorial Board Member, Kittler, Josef, Editorial Board Member, Kleinberg, Jon M., Editorial Board Member, Kobsa, Alfred, Series Editor, Mattern, Friedemann, Editorial Board Member, Mitchell, John C., Editorial Board Member, Naor, Moni, Editorial Board Member, Nierstrasz, Oscar, Series Editor, Pandu Rangan, C., Editorial Board Member, Sudan, Madhu, Series Editor, Terzopoulos, Demetri, Editorial Board Member, Tygar, Doug, Editorial Board Member, Weikum, Gerhard, Series Editor, Vardi, Moshe Y, Series Editor, Goos, Gerhard, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Le, Xinyi, editor, and Zhang, Zhijun, editor
Published: 2024
Full Text: View/download PDF

47. An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits

Author: Li, Kai, Xie, Fenghua, Chen, Hang, Yuan, Kexin, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Computer Vision and Pattern Recognition
Abstract: Audio-visual approaches involving visual inputs have laid the foundation for recent progress in speech separation. However, the optimization of the concurrent usage of auditory and visual inputs is still an active research area. Inspired by the cortico-thalamo-cortical circuit, in which the sensory processing mechanisms of different modalities modulate one another via the non-lemniscal sensory thalamus, we propose a novel cortico-thalamo-cortical neural network (CTCNet) for audio-visual speech separation (AVSS). First, the CTCNet learns hierarchical auditory and visual representations in a bottom-up manner in separate auditory and visual subnetworks, mimicking the functions of the auditory and visual cortical areas. Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections. Finally, the model transmits this fused information back to the auditory and visual subnetworks, and the above process is repeated several times. The results of experiments on three speech separation benchmark datasets show that CTCNet remarkably outperforms existing AVSS methods with considerably fewer parameters. These results suggest that mimicking the anatomical connectome of the mammalian brain has great potential for advancing the development of deep neural networks. Project repo is https://github.com/JusperLee/CTCNet., Comment: Accepted by TPAMI 2024
Published: 2022

48. Recognizing Object by Components with Human Prior Knowledge Enhances Adversarial Robustness of Deep Neural Networks

Author: Li, Xiao, Wang, Ziqi, Zhang, Bo, Sun, Fuchun, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Adversarial attacks can easily fool object recognition systems based on deep neural networks (DNNs). Although many defense methods have been proposed in recent years, most of them can still be adaptively evaded. One reason for the weak adversarial robustness may be that DNNs are only supervised by category labels and do not have part-based inductive bias like the recognition process of humans. Inspired by a well-known theory in cognitive psychology -- recognition-by-components, we propose a novel object recognition model ROCK (Recognizing Object by Components with human prior Knowledge). It first segments parts of objects from images, then scores part segmentation results with predefined human prior knowledge, and finally outputs prediction based on the scores. The first stage of ROCK corresponds to the process of decomposing objects into parts in human vision. The second stage corresponds to the decision process of the human brain. ROCK shows better robustness than classical recognition models across various attack settings. These results encourage researchers to rethink the rationality of currently widely-used DNN-based object recognition models and explore the potential of part-based models, once important but recently ignored, for improving robustness., Comment: Under review. Submitted to TPAMI on June 10, 2022. Major revision on September 4, 2022
Published: 2022

49. Extracting Semantic Knowledge from GANs with Unsupervised Learning

Author: Xu, Jianjin, Zhang, Zhaoxiang, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Recently, unsupervised learning has made impressive progress on various tasks. Despite the dominance of discriminative models, increasing attention is drawn to representations learned by generative models and in particular, Generative Adversarial Networks (GANs). Previous works on the interpretation of GANs reveal that GANs encode semantics in feature maps in a linearly separable form. In this work, we further find that GAN's features can be well clustered with the linear separability assumption. We propose a novel clustering algorithm, named KLiSH, which leverages the linear separability to cluster GAN's features. KLiSH succeeds in extracting fine-grained semantics of GANs trained on datasets of various objects, e.g., car, portrait, animals, and so on. With KLiSH, we can sample images from GANs along with their segmentation masks and synthesize paired image-segmentation datasets. Using the synthesized datasets, we enable two downstream applications. First, we train semantic segmentation networks on these datasets and test them on real images, realizing unsupervised semantic segmentation. Second, we train image-to-image translation networks on the synthesized datasets, enabling semantic-conditional image synthesis without human annotations.
Published: 2022

50. Drug repositioning for Alzheimer's disease with transfer learning

Author: Wu, Yetao, Liu, Han, Yan, Jie, and Hu, Xiaolin
Subjects: Quantitative Biology - Quantitative Methods
Abstract: Deep Learning and DRUG-seq (Digital RNA with perturbation of genes) have attracted attention in drug discovery. However, the public DRUG-seq dataset is too small to be used for directly training a deep learning neural network from scratch. Inspired by the transfer learning technique, we pretrain a drug efficacy prediction neural network model with the Library of Integrated Network-based Cell-Signature (LINCS) L1000 data and then use human neural cell DRUG-seq data to fine-tune it. After training, the model is used for virtual screening to find potential drugs for Alzheimer's disease (AD) treatment. Finally, we find 27 potential drugs for AD treatment including Irsogladine (PDE4 inhibitor), Tasquinimod (HDAC4 selective inhibitor), Suprofen (dual COX-1/COX-2 inhibitor) et al., Comment: 13 pages, 1 figure
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,809 results on '"Hu, Xiaolin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources