Author: "Hu, Xiaolin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Hu, Xiaolin"' showing total 1,809 results

Start Over Author "Hu, Xiaolin"

1,809 results on '"Hu, Xiaolin"'

51. An efficient encoder-decoder architecture with top-down attention for speech separation

Author: Li, Kai, Yang, Runxuan, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5\% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10\% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer., Comment: Accepted by ICLR 2023; Code & Demos: https://cslikai.cn/project/TDANet/
Published: 2022

52. On the Privacy Effect of Data Enhancement via the Lens of Memorization

Author: Li, Xiao, Li, Qiongxiu, Hu, Zhanhao, and Hu, Xiaolin
Subjects: Computer Science - Machine Learning, Computer Science - Cryptography and Security, Computer Science - Computer Vision and Pattern Recognition
Abstract: Machine learning poses severe privacy concerns as it has been shown that the learned models can reveal sensitive information about their training data. Many works have investigated the effect of widely adopted data augmentation and adversarial training techniques, termed data enhancement in the paper, on the privacy leakage of machine learning models. Such privacy effects are often measured by membership inference attacks (MIAs), which aim to identify whether a particular example belongs to the training set or not. We propose to investigate privacy from a new perspective called memorization. Through the lens of memorization, we find that previously deployed MIAs produce misleading results as they are less likely to identify samples with higher privacy risks as members compared to samples with low privacy risks. To solve this problem, we deploy a recent attack that can capture individual samples' memorization degrees for evaluation. Through extensive experiments, we unveil several findings about the connections between three essential properties of machine learning models, including privacy, generalization gap, and adversarial robustness. We demonstrate that the generalization gap and privacy leakage are less correlated than those of the previous results. Moreover, there is not necessarily a trade-off between adversarial robustness and privacy as stronger adversarial robustness does not make the model more susceptible to privacy attacks., Comment: Accepted by IEEE TIFS, 17 pages
Published: 2022

53. Visual Recognition by Request

Author: Tang, Chufeng, Xie, Lingxi, Zhang, Xiaopeng, Hu, Xiaolin, and Tian, Qi
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal. In this paper, we establish a new paradigm named visual recognition by request (ViRReq) to bridge the gap. The key lies in decomposing visual recognition into atomic tasks named requests and leveraging a knowledge base, a hierarchical and text-based dictionary, to assist task definition. ViRReq allows for (i) learning complicated whole-part hierarchies from highly incomplete annotations and (ii) inserting new concepts with minimal efforts. We also establish a solid baseline by integrating language-driven recognition into recent semantic and instance segmentation methods, and demonstrate its flexible recognition ability on CPP and ADE20K, two datasets with hierarchical whole-part annotations.
Published: 2022

54. Active Pointly-Supervised Instance Segmentation

Author: Tang, Chufeng, Xie, Lingxi, Zhang, Gang, Zhang, Xiaopeng, Tian, Qi, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The requirement of expensive annotations is a major burden for training a well-performed instance segmentation model. In this paper, we present an economic active learning setting, named active pointly-supervised instance segmentation (APIS), which starts with box-level annotations and iteratively samples a point within the box and asks if it falls on the object. The key of APIS is to find the most desirable points to maximize the segmentation accuracy with limited annotation budgets. We formulate this setting and propose several uncertainty-based sampling strategies. The model developed with these strategies yields consistent performance gain on the challenging MS-COCO dataset, compared against other learning strategies. The results suggest that APIS, integrating the advantages of active learning and point-based supervision, is an effective learning paradigm for label-efficient instance segmentation., Comment: ECCV 2022
Published: 2022

55. NP-Match: When Neural Processes meet Semi-Supervised Learning

Author: Wang, Jianfeng, Lukasiewicz, Thomas, Massiceti, Daniela, Hu, Xiaolin, Pavlovic, Vladimir, and Neophytou, Alexandros
Subjects: Computer Science - Machine Learning, Computer Science - Computer Vision and Pattern Recognition
Abstract: Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data. In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match. NP-Match is suited to this task for two reasons. Firstly, NP-Match implicitly compares data points when making predictions, and as a result, the prediction of each unlabeled data point is affected by the labeled data points that are similar to it, which improves the quality of pseudo-labels. Secondly, NP-Match is able to estimate uncertainty that can be used as a tool for selecting unlabeled samples with reliable pseudo-labels. Compared with uncertainty-based SSL methods implemented with Monte Carlo (MC) dropout, NP-Match estimates uncertainty with much less computational overhead, which can save time at both the training and the testing phases. We conducted extensive experiments on four public datasets, and NP-Match outperforms state-of-the-art (SOTA) results or achieves competitive results on them, which shows the effectiveness of NP-Match and its potential for SSL., Comment: To appear at ICML 2022. The source codes are at https://github.com/Jianf-Wang/NP-Match
Published: 2022

56. PRMT5-mediated FUBP1 methylation accelerates prostate cancer progression

Author: Yan, Weiwei, Liu, Xun, Qiu, Xuefeng, Zhang, Xuebin, Chen, Jiahui, Xiao, Kai, Wu, Ping, Peng, Chao, Hu, Xiaolin, Wang, Zengming, Qin, Jun, Sun, Liming, Chen, Luonan, Wu, Denglong, Huang, Shengsong, Yin, Lichen, and Li, Zhenfei
Subjects: Binding proteins -- Health aspects, Medical research, Medicine, Experimental, Methylation -- Health aspects, Prostate cancer -- Physiological aspects -- Development and progression, Methyltransferases -- Health aspects
Abstract: Strategies beyond hormone-related therapy need to be developed to improve prostate cancer mortality. Here, we show that FUBP1 and its methylation were essential for prostate cancer progression, and a competitive peptide interfering with FUBP1 methylation suppressed the development of prostate cancer. FUBP1 accelerated prostate cancer development in various preclinical models. PRMT5-mediated FUBP1 methylation, regulated by BRD4, was crucial for its oncogenic effect and correlated with earlier biochemical recurrence in our patient cohort. Suppressed prostate cancer progression was observed in various genetic mouse models expressing the FUBP1 mutant deficient in PRMT5-mediated methylation. A competitive peptide, which was delivered through nanocomplexes, disrupted the interaction of FUBP1 with PRMT5, blocked FUBP1 methylation, and inhibited prostate cancer development in various preclinical models. Overall, our findings suggest that targeting FUBP1 methylation provides a potential therapeutic strategy for prostate cancer management., Introduction Prostate cancer is a major global health care challenge (1, 2). Androgens from the testis and the adrenal gland bind to androgen receptor (AR) to activate AR signaling and [...]
Published: 2024
Full Text: View/download PDF

57. On the Use of Deep Mask Estimation Module for Neural Source Separation Systems

Author: Li, Kai, Hu, Xiaolin, and Luo, Yi
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Most of the recent neural source separation systems rely on a masking-based pipeline where a set of multiplicative masks are estimated from and applied to a signal representation of the input mixture. The estimation of such masks, in almost all network architectures, is done by a single layer followed by an optional nonlinear activation function. However, recent literatures have investigated the use of a deep mask estimation module and observed performance improvement compared to a shallow mask estimation module. In this paper, we analyze the role of such deeper mask estimation module by connecting it to a recently proposed unsupervised source separation method, and empirically show that the deep mask estimation module is an efficient approximation of the so-called overseparation-grouping paradigm with the conventional shallow mask estimation layers., Comment: Accepted by Interspeech 2022
Published: 2022

58. Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models

Author: Liu, Han, Wang, Bingning, Yao, Ting, Liang, Haijin, Xu, Jianjin, and Hu, Xiaolin
Subjects: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
Abstract: Large-scale pre-trained language models have achieved great success on natural language generation tasks. However, it is difficult to control the pre-trained language models to generate sentences with the desired attribute such as topic and sentiment, etc. Recently, Bayesian Controllable Language Models (BCLMs) have been shown to be efficient in controllable language generation. Rather than fine-tuning the parameters of pre-trained language models, BCLMs use external discriminators to guide the generation of pre-trained language models. However, the mismatch between training and inference of BCLMs limits the performance of the models. To address the problem, in this work we propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost. We tested our method on two controllable language generation tasks: sentiment control and topic control. On both tasks, our method reached achieved new state-of-the-art results in automatic and human evaluations., Comment: Submitted to Neurips 2022
Published: 2022

59. Infrared Invisible Clothing:Hiding from Infrared Detectors at Multiple Angles in Real World

Author: Zhu, Xiaopei, Hu, Zhanhao, Huang, Siyuan, Li, Jianmin, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
Abstract: Thermal infrared imaging is widely used in body temperature measurement, security monitoring, and so on, but its safety research attracted attention only in recent years. We proposed the infrared adversarial clothing, which could fool infrared pedestrian detectors at different angles. We simulated the process from cloth to clothing in the digital world and then designed the adversarial "QR code" pattern. The core of our method is to design a basic pattern that can be expanded periodically, and make the pattern after random cropping and deformation still have an adversarial effect, then we can process the flat cloth with an adversarial pattern into any 3D clothes. The results showed that the optimized "QR code" pattern lowered the Average Precision (AP) of YOLOv3 by 87.7%, while the random "QR code" pattern and blank pattern lowered the AP of YOLOv3 by 57.9% and 30.1%, respectively, in the digital world. We then manufactured an adversarial shirt with a new material: aerogel. Physical-world experiments showed that the adversarial "QR code" pattern clothing lowered the AP of YOLOv3 by 64.6%, while the random "QR code" pattern clothing and fully heat-insulated clothing lowered the AP of YOLOv3 by 28.3% and 22.8%, respectively. We used the model ensemble technique to improve the attack transferability to unseen models., Comment: Accepted by CVPR 2022, ORAL
Published: 2022

60. The effect of targeted palliative care interventions on depression, quality of life and caregiver burden in informal caregivers of advanced cancer patients: A systematic review and meta-analysis of randomized controlled trials

Author: Yan, Qianwen, Zhu, Chuanmei, Li, Linna, Li, Yunhuan, Chen, Yang, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

61. CCRR regulate MYZAP-PKP2-Nav1.5 signaling pathway in atrial fibrillation following myocardial infarction

Author: Xuan, Lina, Guo, Jianjun, Luo, Huishan, Cui, Shijia, Sun, Feihan, Wang, Guangze, Yang, Xingmei, Li, Siyun, Zhang, Hailong, Zhang, Qingqing, Yang, Hua, Wang, Shengjie, Hu, Xiaolin, Yang, Baofeng, and Sun, Lihua
Published: 2024
Full Text: View/download PDF

62. Molybdenum triggers the bifunctional mechanism of oxygen evolution reaction of Fe34-xNi25Co25MoxB8P8 amorphous alloy with boosted catalytic activity

Author: Wu, Yong, Guo, Xiaolong, Chen, Hongguo, Xin, Yuci, Dong, Xing’an, Hu, Xiaolin, Xia, Lei, and Yu, Peng
Published: 2024
Full Text: View/download PDF

63. DHS-DETR: Efficient DETRs with dynamic head switching

Author: Chen, Hang, Tang, Chufeng, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

64. Synthesis of Zr2ON2 via a urea-glass route to modulate the bifunctional catalytic activity of NiFe layered double hydroxide in a rechargeable zinc-air battery

Author: Hu, Xiaolin, Tian, Wenping, Wu, Zhenkun, Li, Xiang, Li, Yanhong, and Wang, Haozhi
Published: 2024
Full Text: View/download PDF

65. Precise construction of RuPt dual single-atomic sites to optimize oxygen electrocatalytic behaviors for high-performance Zn-air batteries

Author: Hu, Xiaolin, Wu, Zhenkun, and Xu, Chaohe
Published: 2024
Full Text: View/download PDF

66. Effect of Tele-exercise Interventions on Quality of Life in Cancer Patients: A Meta-analysis

Author: Chen, Xiaoli, Zhu, Chuanmei, Li, Juejin, Zhou, Lin, Zhang, Shu, Zhang, Yun, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

67. Perceptions of Telehealth Services Among Rural Lung Cancer Patients in China: A Qualitative Study Using the Technology Acceptance Model

Author: Zhou, Lin, Li, Yunhuan, Zhang, Yun, Chen, Xiaoli, Zhang, Shu, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

68. Multiple uniform lithium-ion transport channels in Li6.4La3Zr1.4Ta0.6O12/Ce(OH)3 modified polypropylene composite separator for high-performance lithium metal batteries

Author: Li, Bangxing, Kang, Xing, Wu, Xiaofeng, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

69. Hybrid response dynamic multi-objective optimization algorithm based on multi-arm bandit model

Author: Hu, Xiaolin, Wu, Lingyu, Han, Mingzhang, Zhao, Xinchao, and Sang, Xinzhu
Published: 2024
Full Text: View/download PDF

70. High acid-base tolerance and long storage time lanthanum cerium co-doped carbon quantum dots for Fe3+ detection

Author: Li, Bangxing, Wu, Fei, Xie, Zhenjun, Kang, Xing, Wang, Yanghua, Li, Wei, and Hu, Xiaolin
Published: 2025
Full Text: View/download PDF

71. Analysis of thermoacoustic instability and emission behaviors of lean premixed biogas/ammonia flame

Author: Wei, Dongliang, Li, Huaan, Fang, Hao, Zhou, Hao, Li, Hui, Liu, Hongtao, Hu, Xiaolin, and Zhang, Huanxiang
Published: 2025
Full Text: View/download PDF

72. CEDNet: A cascade encoder–decoder network for dense prediction

Author: Zhang, Gang, Li, Ziyi, Tang, Chufeng, Li, Jianmin, and Hu, Xiaolin
Published: 2025
Full Text: View/download PDF

73. An STDP-Based Supervised Learning Algorithm for Spiking Neural Networks

Author: Hu, Zhanhao, Wang, Tao, and Hu, Xiaolin
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Artificial Intelligence
Abstract: Compared with rate-based artificial neural networks, Spiking Neural Networks (SNN) provide a more biological plausible model for the brain. But how they perform supervised learning remains elusive. Inspired by recent works of Bengio et al., we propose a supervised learning algorithm based on Spike-Timing Dependent Plasticity (STDP) for a hierarchical SNN consisting of Leaky Integrate-and-fire (LIF) neurons. A time window is designed for the presynaptic neuron and only the spikes in this window take part in the STDP updating process. The model is trained on the MNIST dataset. The classification accuracy approach that of a Multilayer Perceptron (MLP) with similar architecture trained by the standard back-propagation algorithm.
Published: 2022
Full Text: View/download PDF

74. Adversarial Texture for Fooling Person Detectors in the Physical World

Author: Hu, Zhanhao, Huang, Siyuan, Zhu, Xiaopei, Sun, Fuchun, Zhang, Bo, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Nowadays, cameras equipped with AI systems can capture and analyze images to detect people automatically. However, the AI system can make mistakes when receiving deliberately designed patterns in the real world, i.e., physical adversarial examples. Prior works have shown that it is possible to print adversarial patches on clothes to evade DNN-based person detectors. However, these adversarial examples could have catastrophic drops in the attack success rate when the viewing angle (i.e., the camera's angle towards the object) changes. To perform a multi-angle attack, we propose Adversarial Texture (AdvTexture). AdvTexture can cover clothes with arbitrary shapes so that people wearing such clothes can hide from person detectors from different viewing angles. We propose a generative method, named Toroidal-Cropping-based Expandable Generative Attack (TC-EGA), to craft AdvTexture with repetitive structures. We printed several pieces of cloth with AdvTexure and then made T-shirts, skirts, and dresses in the physical world. Experiments showed that these clothes could fool person detectors in the physical world., Comment: Accepted by CVPR 2022
Published: 2022

75. The Winning Solution to the iFLYTEK Challenge 2021 Cultivated Land Extraction from High-Resolution Remote Sensing Image

Author: Zhao, Zhen, Liu, Yuqiu, Zhang, Gang, Tang, Liang, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
Abstract: Extracting cultivated land accurately from high-resolution remote images is a basic task for precision agriculture. This report introduces our solution to the iFLYTEK challenge 2021 cultivated land extraction from high-resolution remote sensing image. The challenge requires segmenting cultivated land objects in very high-resolution multispectral remote sensing images. We established a highly effective and efficient pipeline to solve this problem. We first divided the original images into small tiles and separately performed instance segmentation on each tile. We explored several instance segmentation algorithms that work well on natural images and developed a set of effective methods that are applicable to remote sensing images. Then we merged the prediction results of all small tiles into seamless, continuous segmentation results through our proposed overlap-tile fusion strategy. We achieved the first place among 486 teams in the challenge.
Published: 2022
Full Text: View/download PDF

76. Hiding from infrared detectors in real world with adversarial clothes

Author: Zhu, Xiaopei, Hu, Zhanhao, Huang, Siyuan, Li, Jianmin, Hu, Xiaolin, and Wang, Zheyao
Published: 2023
Full Text: View/download PDF

77. Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Author: Hu, Xiaolin, Li, Kai, Zhang, Weiyi, Luo, Yi, Lemercier, Jean-Marie, and Gerkmann, Timo
Subjects: Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by \textit{stages}. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme. In addition, the proposed model achieved good balance between speech separation accuracy and computational efficiency as compared to other state-of-the-art models on three benchmark datasets., Comment: Accepted by NeurIPS 2021, Demo at https://cslikai.cn/project/AFRCNN
Published: 2021

78. Adjacency constraint for efficient hierarchical reinforcement learning

Author: Zhang, Tianren, Guo, Shangqi, Tan, Tian, Hu, Xiaolin, and Chen, Feng
Subjects: Computer Science - Machine Learning
Abstract: Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches., Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. DOI: 10.1109/TPAMI.2022.3192418. arXiv admin note: substantial text overlap with arXiv:2006.11485
Published: 2021

79. Dietary vitamin B3 supplementation induces the antitumor immunity against liver cancer via biased GPR109A signaling in myeloid cell

Author: Yang, Yang, Pei, Tianduo, Hu, Xiaolin, Lu, Yu, Huang, Yanqiu, Wan, Tingya, Liu, Chaobao, Chen, Fengqian, Guo, Bao, Hong, Yuemei, Ba, Qian, Li, Xiaoguang, and Wang, Hui
Published: 2024
Full Text: View/download PDF

80. Targeting MDM2-p53 interaction in Glioblastoma: Transcriptomic analysis and Peptide-Based inhibition strategy

Author: Han, Manman, Kakar, Mohibullah, Li, Wei, Iqbal, Imran, Hu, Xiaolin, Liu, Yiting, Tang, Qing, Sun, Lizhu, Shakir, Yasmeen, and Liu, Tiantian
Published: 2024
Full Text: View/download PDF

81. Interface polarity brings a way to modulate magneto-optical effect of cubic Ni2+/Fe3+ co-doped Bi25FeO40 film: Implications in piezoelectricity

Author: Gao, Teng, Wu, Junying, Liang, Jinlun, Wang, Cheng, Liu, Mengli, Yang, Yanduan, Cao, Qinyu, Chen, Xin, Hu, Xiaolin, and Zhuang, Naifeng
Published: 2024
Full Text: View/download PDF

82. LncRNA CCRR Attenuates Postmyocardial Infarction Inflammatory Response by Inhibiting the TLR Signalling Pathway

Author: Wang, Shengjie, Xuan, Lina, Hu, Xiaolin, Sun, Feihan, Li, Siyun, Li, Xiufang, Yang, Hua, Guo, Jianjun, Duan, Xiaomeng, Luo, Huishan, Xin, Jieru, Chen, Jun, Hao, Junwei, Cui, Shijia, Liu, Dongping, Jiao, Lei, Zhang, Ying, Du, Zhimin, and Sun, Lihua
Published: 2024
Full Text: View/download PDF

83. RSG: A Simple but Effective Module for Learning Imbalanced Datasets

Author: Wang, Jianfeng, Lukasiewicz, Thomas, Hu, Xiaolin, Cai, Jianfei, and Xu, Zhenghua
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes. In this work, wepropose a new rare-class sample generator (RSG) to solvethis problem. RSG aims to generate some new samplesfor rare classes during training, and it has in particularthe following advantages: (1) it is convenient to use andhighly versatile, because it can be easily integrated intoany kind of convolutional neural network, and it works wellwhen combined with different loss functions, and (2) it isonly used during the training phase, and therefore, no ad-ditional burden is imposed on deep neural networks duringthe testing phase. In extensive experimental evaluations, weverify the effectiveness of RSG. Furthermore, by leveragingRSG, we obtain competitive results on Imbalanced CIFARand new state-of-the-art results on Places-LT, ImageNet-LT, and iNaturalist 2018. The source code is available at https://github.com/Jianf-Wang/RSG., Comment: To appear at CVPR 2021. We propose a flexible data generation/data augmentation module for long-tailed classification. Codes are available at: https://github.com/Jianf-Wang/RSG
Published: 2021

84. Convolutional Neural Networks with Gated Recurrent Connections

Author: Wang, Jianfeng and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The convolutional neural network (CNN) has become a basic model for solving many computer vision problems. In recent years, a new class of CNNs, recurrent convolution neural network (RCNN), inspired by abundant recurrent connections in the visual systems of animals, was proposed. The critical element of RCNN is the recurrent convolutional layer (RCL), which incorporates recurrent connections between neurons in the standard convolutional layer. With increasing number of recurrent computations, the receptive fields (RFs) of neurons in RCL expand unboundedly, which is inconsistent with biological facts. We propose to modulate the RFs of neurons by introducing gates to the recurrent connections. The gates control the amount of context information inputting to the neurons and the neurons' RFs therefore become adaptive. The resulting layer is called gated recurrent convolution layer (GRCL). Multiple GRCLs constitute a deep model called gated RCNN (GRCNN). The GRCNN was evaluated on several computer vision tasks including object recognition, scene text recognition and object detection, and obtained much better results than the RCNN. In addition, when combined with other adaptive RF techniques, the GRCNN demonstrated competitive performance to the state-of-the-art models on benchmark datasets for these tasks. The codes are released at \href{https://github.com/Jianf-Wang/GRCNN}{https://github.com/Jianf-Wang/GRCNN}., Comment: Accepted by TPAMI. An extension of our previous NeurIPS 2017 paper "Gated recurrent convolution neural network for OCR". We demonstrate the good performance of GRCNN on image classification and object detection. Codes are available at: https://github.com/Jianf-Wang/GRCNN
Published: 2021
Full Text: View/download PDF

85. Attack on practical speaker verification system using universal adversarial perturbations

Author: Zhang, Weiyi, Zhao, Shuning, Liu, Le, Li, Jianmin, Cheng, Xingliang, Zheng, Thomas Fang, and Hu, Xiaolin
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text. Previous studies played an audio adversarial example as a digital signal to perform physical attacks, which would be easily rejected by audio replay detection modules. This work shows that by playing our crafted adversarial perturbation as a separate source when the adversary is speaking, the practical speaker verification system will misjudge the adversary as a target speaker. A two-step algorithm is proposed to optimize the universal adversarial perturbation to be text-independent and has little effect on the authentication text recognition. We also estimated room impulse response (RIR) in the algorithm which allowed the perturbation to be effective after being played over the air. In the physical experiment, we achieved targeted attacks with success rate of 100%, while the word error rate (WER) on speech recognition was only increased by 3.55%. And recorded audios could pass replay detection for the live person speaking., Comment: 6 pages, 2 figures
Published: 2021
Full Text: View/download PDF

86. RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

Author: Zhang, Gang, Lu, Xin, Tan, Jingru, Li, Jianmin, Zhang, Zhaoxiang, Li, Quanquan, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: The two-stage methods for instance segmentation, e.g. Mask R-CNN, have achieved excellent performance recently. However, the segmented masks are still very coarse due to the downsampling operations in both the feature pyramid and the instance-wise pooling process, especially for large objects. In this work, we propose a new method called RefineMask for high-quality instance segmentation of objects and scenes, which incorporates fine-grained features during the instance-wise segmenting process in a multi-stage manner. Through fusing more detailed information stage by stage, RefineMask is able to refine high-quality masks consistently. RefineMask succeeds in segmenting hard cases such as bent parts of objects that are over-smoothed by most previous methods and outputs accurate boundaries. Without bells and whistles, RefineMask yields significant gains of 2.6, 3.4, 3.8 AP over Mask R-CNN on COCO, LVIS, and Cityscapes benchmarks respectively at a small amount of additional computational cost. Furthermore, our single-model result outperforms the winner of the LVIS Challenge 2020 by 1.3 points on the LVIS test-dev set and establishes a new state-of-the-art. Code will be available at https://github.com/zhanggang001/RefineMask., Comment: Accepted by CVPR 2021. Code is available at https://github.com/zhanggang001/RefineMask
Published: 2021

87. Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

Author: Tang, Chufeng, Chen, Hang, Li, Xiao, Li, Jianmin, Zhang, Zhaoxiang, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Tremendous efforts have been made on instance segmentation but the mask quality is still not satisfactory. The boundaries of predicted instance masks are usually imprecise due to the low spatial resolution of feature maps and the imbalance problem caused by the extremely low proportion of boundary pixels. To address these issues, we propose a conceptually simple yet effective post-processing refinement framework to improve the boundary quality based on the results of any instance segmentation model, termed BPR. Following the idea of looking closer to segment boundaries better, we extract and refine a series of small boundary patches along the predicted instance boundaries. The refinement is accomplished by a boundary patch refinement network at higher resolution. The proposed BPR framework yields significant improvements over the Mask R-CNN baseline on Cityscapes benchmark, especially on the boundary-aware metrics. Moreover, by applying the BPR framework to the PolyTransform + SegFix baseline, we reached 1st place on the Cityscapes leaderboard., Comment: CVPR 2021
Published: 2021

88. CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds

Author: Gao, Ge, Lauri, Mikko, Hu, Xiaolin, Zhang, Jianwei, and Frintrop, Simone
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: It is often desired to train 6D pose estimation systems on synthetic data because manual annotation is expensive. However, due to the large domain gap between the synthetic and real images, synthesizing color images is expensive. In contrast, this domain gap is considerably smaller and easier to fill for depth information. In this work, we present a system that regresses 6D object pose from depth information represented by point clouds, and a lightweight data synthesis pipeline that creates synthetic point cloud segments for training. We use an augmented autoencoder (AAE) for learning a latent code that encodes 6D object pose information for pose regression. The data synthesis pipeline only requires texture-less 3D object models and desired viewpoints, and it is cheap in terms of both time and hardware storage. Our data synthesis process is up to three orders of magnitude faster than commonly applied approaches that render RGB image data. We show the effectiveness of our system on the LineMOD, LineMOD Occlusion, and YCB Video datasets. The implementation of our system is available at: https://github.com/GeeeG/CloudAAE., Comment: Accepted to ICRA 2021
Published: 2021

89. Rethinking Natural Adversarial Examples for Classification Models

Author: Li, Xiao, Li, Jianmin, Dai, Ting, Shi, Jie, Zhu, Jun, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Recently, it was found that many real-world examples without intentional modifications can fool machine learning models, and such examples are called "natural adversarial examples". ImageNet-A is a famous dataset of natural adversarial examples. By analyzing this dataset, we hypothesized that large, cluttered and/or unusual background is an important reason why the images in this dataset are difficult to be classified. We validated the hypothesis by reducing the background influence in ImageNet-A examples with object detection techniques. Experiments showed that the object detection models with various classification models as backbones obtained much higher accuracy than their corresponding classification models. A detection model based on the classification model EfficientNet-B7 achieved a top-1 accuracy of 53.95%, surpassing previous state-of-the-art classification models trained on ImageNet, suggesting that accurate localization information can significantly boost the performance of classification models on ImageNet-A. We then manually cropped the objects in images from ImageNet-A and created a new dataset, named ImageNet-A-Plus. A human test on the new dataset showed that the deep learning-based classifiers still performed quite poorly compared with humans. Therefore, the new dataset can be used to study the robustness of classification models to the internal variance of objects without considering the background disturbance., Comment: 12 pages
Published: 2021

90. The MSR-Video to Text Dataset with Clean Annotations

Author: Chen, Haoran, Li, Jianmin, Frintrop, Simone, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning, 68T45, 68T50, I.2.10, I.2.7
Abstract: Video captioning automatically generates short descriptions of the video content, usually in form of a single sentence. Many methods have been proposed for solving this task. A large dataset called MSR Video to Text (MSR-VTT) is often used as the benchmark dataset for testing the performance of the methods. However, we found that the human annotations, i.e., the descriptions of video contents in the dataset are quite noisy, e.g., there are many duplicate captions and many captions contain grammatical problems. These problems may pose difficulties to video captioning models for learning underlying patterns. We cleaned the MSR-VTT annotations by removing these problems, then tested several typical video captioning models on the cleaned dataset. Experimental results showed that data cleaning boosted the performances of the models measured by popular quantitative metrics. We recruited subjects to evaluate the results of a model trained on the original and cleaned datasets. The human behavior experiment demonstrated that trained on the cleaned dataset, the model generated captions that were more coherent and more relevant to the contents of the video clips., Comment: The paper is under consideration at Computer Vision and Image Understanding
Published: 2021
Full Text: View/download PDF

91. Frame Difference-Based Temporal Loss for Video Stylization

Author: Xu, Jianjin, Xiong, Zheyang, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Neural style transfer models have been used to stylize an ordinary video to specific styles. To ensure temporal inconsistency between the frames of the stylized video, a common approach is to estimate the optic flow of the pixels in the original video and make the generated pixels match the estimated optical flow. This is achieved by minimizing an optical flow-based (OFB) loss during model training. However, optical flow estimation is itself a challenging task, particularly in complex scenes. In addition, it incurs a high computational cost. We propose a much simpler temporal loss called the frame difference-based (FDB) loss to solve the temporal inconsistency problem. It is defined as the distance between the difference between the stylized frames and the difference between the original frames. The differences between the two frames are measured in both the pixel space and the feature space specified by the convolutional neural networks. A set of human behavior experiments involving 62 subjects with 25,600 votes showed that the performance of the proposed FDB loss matched that of the OFB loss. The performance was measured by subjective evaluation of stability and stylization quality of the generated videos on two typical video stylization models. The results suggest that the proposed FDB loss is a strong alternative to the commonly used OFB loss for video stylization.
Published: 2021

92. Fooling thermal infrared pedestrian detectors in real world using small bulbs

Author: Zhu, Xiaopei, Li, Xiao, Li, Jianmin, Wang, Zheyao, and Hu, Xiaolin
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Thermal infrared detection systems play an important role in many areas such as night security, autonomous driving, and body temperature detection. They have the unique advantages of passive imaging, temperature sensitivity and penetration. But the security of these systems themselves has not been fully explored, which poses risks in applying these systems. We propose a physical attack method with small bulbs on a board against the state of-the-art pedestrian detectors. Our goal is to make infrared pedestrian detectors unable to detect real-world pedestrians. Towards this goal, we first showed that it is possible to use two kinds of patches to attack the infrared pedestrian detector based on YOLOv3. The average precision (AP) dropped by 64.12% in the digital world, while a blank board with the same size caused the AP to drop by 29.69% only. After that, we designed and manufactured a physical board and successfully attacked YOLOv3 in the real world. In recorded videos, the physical board caused AP of the target detector to drop by 34.48%, while a blank board with the same size caused the AP to drop by 14.91% only. With the ensemble attack techniques, the designed physical board had good transferability to unseen detectors. We also proposed the first physical multispectral (infrared and visible) attack. By using a combination method, we successfully hide from the visible light and infrared object detection systems at the same time., Comment: Documents officially published by AAAI 2021, including the main text and supplementary material
Published: 2021

93. A new Ni/Bi Co-doping rare earth iron garnet crystal with high transmittance, low temperature and wavelength coefficients

Author: Yang, Yanduan, Fan, Xiaoyu, Liu, Haipeng, Wu, Junying, Yang, Yang, Chen, Xin, Hu, Xiaolin, and Zhuang, Naifeng
Published: 2024
Full Text: View/download PDF

94. Solvated Inverse vulcanisation by photopolymerisation

Author: Jia, Jinhong, Yan, Peiyao, Cai, Shanshan Diana, Cui, Yunfei, Xun, Xingwei, Liu, Jingjiang, Wang, Haoran, Dodd, Liam, Hu, Xiaolin, Lester, Daniel, Wang, Xi-Cun, Wu, Xiaofeng, Hasell, Tom, and Quan, Zheng-Jun
Published: 2024
Full Text: View/download PDF

95. Controlled three-dimensional leaf-like NiCoO2@NiCo layered double hydroxide heterostructures for oxygen evolution electrocatalysts in rechargeable Zn–air batteries

Author: Wu, Zhenkun, Hu, Xiaolin, Cai, Chengbin, Wang, Yuru, Li, Xiang, Wen, Jie, Li, Bangxing, and Gong, Hengxiang
Published: 2024
Full Text: View/download PDF

96. In situ production of bacterial nanocellulose-activated carbon composites from pear juice industry wastewater by two new Komagataeibacter intermedius and Komagataeibacter xylinus isolates for heavy metal removal

Author: Yan, Yiran, Chen, Tao, Tan, Ran, Han, Shuai, Zhang, Xinyu, Shen, Yang, Hu, Xiaolin, Zhao, Shukun, Qu, Dehui, Chen, Linxu, Wu, Nan, and Wu, Guochao
Published: 2024
Full Text: View/download PDF

97. Hiding from thermal imaging pedestrian detectors in the physical world

Author: Zhu, Xiaopei, Li, Xiao, Li, Jianmin, Wang, Zheyao, and Hu, Xiaolin
Published: 2024
Full Text: View/download PDF

98. Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Author: Li, Xiang, Wang, Wenhai, Hu, Xiaolin, Li, Jun, Tang, Jinhui, and Yang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: Localization Quality Estimation (LQE) is crucial and popular in the recent advancement of dense object detectors since it can provide accurate ranking scores that benefit the Non-Maximum Suppression processing and improve detection performance. As a common practice, most existing methods predict LQE scores through vanilla convolutional features shared with object classification or bounding box regression. In this paper, we explore a completely novel and different perspective to perform LQE -- based on the learned distributions of the four parameters of the bounding box. The bounding box distributions are inspired and introduced as "General Distribution" in GFLV1, which describes the uncertainty of the predicted bounding boxes well. Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality. Specifically, a bounding box distribution with a sharp peak usually corresponds to high localization quality, and vice versa. By leveraging the close correlation between distribution statistics and the real localization quality, we develop a considerably lightweight Distribution-Guided Quality Predictor (DGQP) for reliable LQE based on GFLV1, thus producing GFLV2. To our best knowledge, it is the first attempt in object detection to use a highly relevant, statistical representation to facilitate LQE. Extensive experiments demonstrate the effectiveness of our method. Notably, GFLV2 (ResNet-101) achieves 46.2 AP at 14.6 FPS, surpassing the previous state-of-the-art ATSS baseline (43.6 AP at 14.6 FPS) by absolute 2.6 AP on COCO {\tt test-dev}, without sacrificing the efficiency both in training and inference. Code will be available at https://github.com/implus/GFocalV2.
Published: 2020

99. Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Author: Zhang, Tianren, Guo, Shangqi, Tan, Tian, Hu, Xiaolin, and Chen, Feng
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. Searching in a large goal space poses difficulties for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that the proposed adjacency constraint preserves the optimal hierarchical policy in deterministic MDPs, and show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks show that incorporating the adjacency constraint improves the performance of state-of-the-art HRL approaches in both deterministic and stochastic environments., Comment: Accepted by NeurIPS 2020
Published: 2020

100. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection

Author: Li, Xiang, Wang, Wenhai, Wu, Lijun, Chen, Shuo, Hu, Xiaolin, Li, Jun, Tang, Jinhui, and Yang, Jian
Subjects: Computer Science - Computer Vision and Pattern Recognition
Abstract: One-stage detector basically formulates object detection as dense classification and localization. The classification is usually optimized by Focal Loss and the box location is commonly learned under Dirac delta distribution. A recent trend for one-stage detectors is to introduce an individual prediction branch to estimate the quality of localization, where the predicted quality facilitates the classification to improve detection performance. This paper delves into the representations of the above three fundamental elements: quality estimation, classification and localization. Two problems are discovered in existing practices, including (1) the inconsistent usage of the quality estimation and classification between training and inference and (2) the inflexible Dirac delta distribution for localization when there is ambiguity and uncertainty in complex scenes. To address the problems, we design new representations for these elements. Specifically, we merge the quality estimation into the class prediction vector to form a joint representation of localization quality and classification, and use a vector to represent arbitrary distribution of box locations. The improved representations eliminate the inconsistency risk and accurately depict the flexible distribution in real data, but contain continuous labels, which is beyond the scope of Focal Loss. We then propose Generalized Focal Loss (GFL) that generalizes Focal Loss from its discrete form to the continuous version for successful optimization. On COCO test-dev, GFL achieves 45.0\% AP using ResNet-101 backbone, surpassing state-of-the-art SAPD (43.5\%) and ATSS (43.6\%) with higher or comparable inference speed, under the same backbone and training settings. Notably, our best model can achieve a single-model single-scale AP of 48.2\%, at 10 FPS on a single 2080Ti GPU. Code and models are available at https://github.com/implus/GFocal., Comment: 14 pages, 14 figures
Published: 2020

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,809 results on '"Hu, Xiaolin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources