1,059 results on '"Zhang, Xiao Lei"'
Search Results
2. Eliminating Quantization Errors in Classification-Based Sound Source Localization
- Author
-
Feng, Linfeng, Zhang, Xiao-Lei, and Li, Xuelong
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Sound Source Localization (SSL) involves estimating the Direction of Arrival (DOA) of sound sources. Since the DOA estimation output space is continuous, regression might be more suitable for DOA, offering higher precision. However, in practice, classification often outperforms regression, exhibiting greater robustness to interference. Conversely, classification's drawback is inherent quantization error. Within the classification paradigm, the DOA output space is discretized into intervals, each treated as a class. These classes show strong inter-class correlations, being inherently ordered, with higher similarity as intervals grow closer. Nevertheless, this has not been fully exploited. To address this, we propose an Unbiased Label Distribution (ULD) to eliminate quantization error in training targets. Furthermore, we tailor two loss functions for the soft label family: Negative Log Absolute Error (NLAE) and Mean Squared Error without activation (MSE(wo)). Finally, we introduce Weighted Adjacent Decoding (WAD) to overcome quantization error during model prediction decoding. Experimental results demonstrate our approach surpasses classification quantization limits, achieving state-of-the-art performance. Our code and supplementary materials are available at https://github.com/linfeng-feng/ULD., Comment: 12 pages
- Published
- 2023
3. Diffusion-Based Adversarial Purification for Speaker Verification
- Author
-
Bai, Yibo, Zhang, Xiao-Lei, and Li, Xuelong
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Recently, automatic speaker verification (ASV) based on deep learning is easily contaminated by adversarial attacks, which is a new type of attack that injects imperceptible perturbations to audio signals so as to make ASV produce wrong decisions. This poses a significant threat to the security and reliability of ASV systems. To address this issue, we propose a Diffusion-Based Adversarial Purification (DAP) method that enhances the robustness of ASV systems against such adversarial attacks. Our method leverages a conditional denoising diffusion probabilistic model to effectively purify the adversarial examples and mitigate the impact of perturbations. DAP first introduces controlled noise into adversarial examples, and then performs a reverse denoising process to reconstruct clean audio. Experimental results demonstrate the efficacy of the proposed DAP in enhancing the security of ASV and meanwhile minimizing the distortion of the purified audio signals., Comment: Accepted by IEEE Signal Processing Letters
- Published
- 2023
- Full Text
- View/download PDF
4. Zeroth- and first-order difference discrimination for unsupervised domain adaptation
- Author
-
Wang, Jie, Chen, Xing, and Zhang, Xiao-Lei
- Published
- 2024
- Full Text
- View/download PDF
5. Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays
- Author
-
Chen, Yijiang, Liang, Chengdong, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The performance of speaker verification degrades significantly in adverse acoustic environments with strong reverberation and noise. To address this issue, this paper proposes a spatial-temporal graph convolutional network (GCN) method for the multi-channel speaker verification with ad-hoc microphone arrays. It includes a feature aggregation block and a channel selection block, both of which are built on graphs. The feature aggregation block fuses speaker features among different time and channels by a spatial-temporal GCN. The graph-based channel selection block discards the noisy channels that may contribute negatively to the system. The proposed method is flexible in incorporating various kinds of graphs and prior knowledge. We compared the proposed method with six representative methods in both real-world and simulated environments. Experimental results show that the proposed method achieves a relative equal error rate (EER) reduction of $\mathbf{15.39\%}$ lower than the strongest referenced method in the simulated datasets, and $\mathbf{17.70\%}$ lower than the latter in the real datasets. Moreover, its performance is robust across different signal-to-noise ratios and reverberation time.
- Published
- 2023
6. Soft Label Coding for End-to-end Sound Source Localization With Ad-hoc Microphone Arrays
- Author
-
Feng, Linfeng, Gong, Yijun, and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recently, an end-to-end two-dimensional sound source localization algorithm with ad-hoc microphone arrays formulates the sound source localization problem as a classification problem. The algorithm divides the target indoor space into a set of local areas, and predicts the local area where the speaker locates. However, the local areas are encoded by one-hot code, which may lose the connections between the local areas due to quantization errors. In this paper, we propose a new soft label coding method, named label smoothing, for the classification-based two-dimensional sound source location with ad-hoc microphone arrays. The core idea is to take the geometric connection between the classes into the label coding process.The first one is named static soft label coding (SSLC), which modifies the one-hot codes into soft codes based on the distances between the local areas. Because SSLC is handcrafted which may not be optimal, the second one, named dynamic soft label coding (DSLC), further rectifies SSLC, by learning the soft codes according to the statistics of the predictions produced by the classification-based localization model in the training stage. Experimental results show that the proposed methods can effectively improve the localization accuracy., Comment: 4pages, 2figures, conference
- Published
- 2023
7. Quantitative separation of CEST effect by Rex-line-fit analysis of Z-spectra
- Author
-
Xiao, Gang, Zhang, Xiao-Lei, Wang, Si-Qi, Lai, Shi-Xin, Nie, Ting-Ting, Chen, Yao-Wen, Zhuang, Cai-Yu, Yan, Gen, and Wu, Ren-Hua
- Published
- 2024
- Full Text
- View/download PDF
8. A ResNet mini architecture for brain age prediction
- Author
-
Zhang, Xuan, Duan, Si-Yuan, Wang, Si-Qi, Chen, Yao-Wen, Lai, Shi-Xin, Zou, Ji-Sheng, Cheng, Yan, Guan, Ji-Tian, Wu, Ren-Hua, and Zhang, Xiao-Lei
- Published
- 2024
- Full Text
- View/download PDF
9. Experimental and numerical investigations on the mechanical response of full-scale PHC pile foundations for solar power generation
- Author
-
Feng, Shi-Jin, Xi, Wang, Zhang, Xiao-Lei, and Sun, Da-Ming
- Published
- 2024
- Full Text
- View/download PDF
10. Paraptosis: a non-classical paradigm of cell death for cancer therapy
- Author
-
Xu, Chun-cao, Lin, Yi-fan, Huang, Mu-yang, Zhang, Xiao-lei, Wang, Pei, Huang, Ming-qing, and Lu, Jin-jian
- Published
- 2024
- Full Text
- View/download PDF
11. MSWIFA and cement cooperate in the disposal of soft soil — experimental study on silty sand and silty clay
- Author
-
Liu, Zong-Hui, Li, Jia-Qi, Zhang, Xiao-Lei, Li, Hao-Dong, Su, Dong-Po, and Liang, Jia-Wei
- Published
- 2024
- Full Text
- View/download PDF
12. Optimizing Quantum Federated Learning Based on Federated Quantum Natural Gradient Descent
- Author
-
Qi, Jun, Zhang, Xiao-Lei, and Tejedor, Javier
- Subjects
Quantum Physics ,Computer Science - Machine Learning - Abstract
Quantum federated learning (QFL) is a quantum extension of the classical federated learning model across multiple local quantum devices. An efficient optimization algorithm is always expected to minimize the communication overhead among different quantum participants. In this work, we propose an efficient optimization algorithm, namely federated quantum natural gradient descent (FQNGD), and further, apply it to a QFL framework that is composed of a variational quantum circuit (VQC)-based quantum neural networks (QNN). Compared with stochastic gradient descent methods like Adam and Adagrad, the FQNGD algorithm admits much fewer training iterations for the QFL to get converged. Moreover, it can significantly reduce the total communication overhead among local quantum devices. Our experiments on a handwritten digit classification dataset justify the effectiveness of the FQNGD for the QFL framework in terms of a faster convergence rate on the training set and higher accuracy on the test set., Comment: Accepted in Proc. ICASSP 2023. arXiv admin note: substantial text overlap with arXiv:2209.00564
- Published
- 2023
13. Interpretable Spectrum Transformation Attacks to Speaker Recognition
- Author
-
Yao, Jiadi, Luo, Hong, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Computer Science - Artificial Intelligence ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The success of adversarial attacks to speaker recognition is mainly in white-box scenarios. When applying the adversarial voices that are generated by attacking white-box surrogate models to black-box victim models, i.e. \textit{transfer-based} black-box attacks, the transferability of the adversarial voices is not only far from satisfactory, but also lacks interpretable basis. To address these issues, in this paper, we propose a general framework, named spectral transformation attack based on modified discrete cosine transform (STA-MDCT), to improve the transferability of the adversarial voices to a black-box victim model. Specifically, we first apply MDCT to the input voice. Then, we slightly modify the energy of different frequency bands for capturing the salient regions of the adversarial noise in the time-frequency domain that are critical to a successful attack. Unlike existing approaches that operate voices in the time domain, the proposed framework operates voices in the time-frequency domain, which improves the interpretability, transferability, and imperceptibility of the attack. Moreover, it can be implemented with any gradient-based attackers. To utilize the advantage of model ensembling, we not only implement STA-MDCT with a single white-box surrogate model, but also with an ensemble of surrogate models. Finally, we visualize the saliency maps of adversarial voices by the class activation maps (CAM), which offers an interpretable basis to transfer-based attacks in speaker recognition for the first time. Extensive comparison results with five representative attackers show that the CAM visualization clearly explains the effectiveness of STA-MDCT, and the weaknesses of the comparison methods; the proposed method outperforms the comparison methods by a large margin.
- Published
- 2023
14. Design and Performance Test of ENT2465 Long-Life Neutron Tube
- Author
-
LIU Ze-wei, YUE Ai-zhong, LI Bing, ZHAO Jing-yi, JIANG Li-ming, LIU Jiong, MA Hui-sheng, ZHANG Xiao-lei, LU Ning, and WANG Shu-sheng
- Subjects
neutron tube ,ion source ,drive-in target ,Nuclear engineering. Atomic power ,TK9001-9401 ,Chemical technology ,TP1-1185 - Abstract
The neutron tube is the core component of the controllable neutron source logging instrument. Its working stability, temperature resistance, neutron yield and other indicators have an important impact on the working performance of instrument. At present, with the requirement of deep logging, the neutron tube used in logging is supposed to improve neutron yield, temperature resistance, operating life and working stability. In this paper, the structure, materials and manufacturing process of drive-in target neutron tube are designed and optimized to reduce power consumption and improve operating time. The temperature resistance, operating life, and neutron yield of sample tube have tested to estimate the performance of ENT2465 with an outer diameter of 25 millimeters. Place the sample tube in the oil tank of the neutron testing platform and connect the cable. Record the temperature, cumulative working time, neutron yield, target voltage, target current and anode current during the work of the sample tube. The results show that under the target voltage of 80 kV and target current less than 60 μA, the accumulated operating life of the sample tube exceeds 1 000 hours, including 23 hours of continuous operation at 175℃ and more than 500 hours of accumulated operation, and 36 hours of continuous operation at room temperature. The neutron yield only decreases by 5.3 % after 1 000 hours under the same target voltage and current.
- Published
- 2024
- Full Text
- View/download PDF
15. Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
- Author
-
Liang, Chengdong, Zhang, Xiao-Lei, Zhang, BinBin, Wu, Di, Li, Shengqiang, Song, Xingchen, Peng, Zhendong, and Pan, Fuping
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small chunk, while using a large chunk in the top layers of its encoder to compensate the performance degradation caused by the small chunk. Moreover, we use knowledge distillation method to reduce the token emission latency. We present extensive experiments on Aishell-1 dataset. Experiments and ablation studies show that compared to U2++, fast-U2++ reduces model latency from 320ms to 80ms, and achieves a character error rate (CER) of 5.06% with a streaming setup., Comment: 5 pages, 3 figures
- Published
- 2022
16. LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification
- Author
-
Chen, Xing, Wang, Jie, Zhang, Xiao-Lei, Zhang, Wei-Qiang, and Yang, Kunde
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
Although the security of automatic speaker verification (ASV) is seriously threatened by recently emerged adversarial attacks, there have been some countermeasures to alleviate the threat. However, many defense approaches not only require the prior knowledge of the attackers but also possess weak interpretability. To address this issue, in this paper, we propose an attacker-independent and interpretable method, named learnable mask detector (LMD), to separate adversarial examples from the genuine ones. It utilizes score variation as an indicator to detect adversarial examples, where the score variation is the absolute discrepancy between the ASV scores of an original audio recording and its transformed audio synthesized from its masked complex spectrogram. A core component of the score variation detector is to generate the masked spectrogram by a neural network. The neural network needs only genuine examples for training, which makes it an attacker-independent approach. Its interpretability lies that the neural network is trained to minimize the score variation of the targeted ASV, and maximize the number of the masked spectrogram bins of the genuine training examples. Its foundation is based on the observation that, masking out the vast majority of the spectrogram bins with little speaker information will inevitably introduce a large score variation to the adversarial example, and a small score variation to the genuine example. Experimental results with 12 attackers and two representative ASV systems show that our proposed method outperforms five state-of-the-art baselines. The extensive experimental results can also be a benchmark for the detection-based ASV defenses., Comment: 13 pages, 9 figures
- Published
- 2022
17. Symmetric Saliency-based Adversarial Attack To Speaker Identification
- Author
-
Yao, Jiadi, Chen, Xing, Zhang, Xiao-Lei, Zhang, Wei-Qiang, and Yang, Kunde
- Subjects
Computer Science - Sound ,Computer Science - Cryptography and Security ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Adversarial attack approaches to speaker identification either need high computational cost or are not very effective, to our knowledge. To address this issue, in this paper, we propose a novel generation-network-based approach, called symmetric saliency-based encoder-decoder (SSED), to generate adversarial voice examples to speaker identification. It contains two novel components. First, it uses a novel saliency map decoder to learn the importance of speech samples to the decision of a targeted speaker identification system, so as to make the attacker focus on generating artificial noise to the important samples. It also proposes an angular loss function to push the speaker embedding far away from the source speaker. Our experimental results demonstrate that the proposed SSED yields the state-of-the-art performance, i.e. over 97% targeted attack success rate and a signal-to-noise level of over 39 dB on both the open-set and close-set speaker identification tasks, with a low computational cost.
- Published
- 2022
- Full Text
- View/download PDF
18. WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit
- Author
-
Wang, Jie, Xu, Menglong, Hou, Jingyong, Zhang, Binbin, Zhang, Xiao-Lei, Xie, Lei, and Pan, Fuping
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. WeKws contains the implementations of several state-of-the-art backbone networks, making it achieve highly competitive results on three publicly available datasets. To make WeKws a pure E2E toolkit, we utilize a refined max-pooling loss to make the model learn the ending position of the keyword by itself, which significantly simplifies the training pipeline and makes WeKws very efficient to be applied in real-world scenarios. The toolkit is publicly available at https://github.com/wenet-e2e/wekws.
- Published
- 2022
19. Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays
- Author
-
Liu, Shupei, Feng, Linfeng, Gong, Yijun, Liang, Chengdong, Zhang, Chen, Zhang, Xiao-Lei, and Li, Xuelong
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distributed microphone nodes, each of which is equipped with a traditional array. Specifically, we first employ convolutional neural networks at each node to estimate speaker directions. Then, we integrate these DOA estimates using triangulation and clustering techniques to get 2D speaker locations. To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes. Extensive experiments on both simulated and real-world data demonstrate that our approach significantly outperforms conventional methods. The proposed node selection further refines performance. The real-world dataset in the experiment, named Libri-adhoc-node10 which is a newly recorded data described for the first time in this paper, is online available at https://github.com/Liu-sp/Libri-adhoc-nodes10.
- Published
- 2022
20. End-to-end Two-dimensional Sound Source Localization With Ad-hoc Microphone Arrays
- Author
-
Gong, Yijun, Liu, Shupei, and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Conventional sound source localization methods are mostly based on a single microphone array that consists of multiple microphones. They are usually formulated as the estimation of the direction of arrival problem. In this paper, we propose a deep-learning-based end-to-end sound source localization method with ad-hoc microphone arrays, where an ad-hoc microphone array is a set of randomly distributed microphone arrays that collaborate with each other. It can produce two-dimensional locations of speakers with only a single microphone per node. Specifically, we divide a targeted indoor space into multiple local areas. We encode each local area by a one-hot code, therefore, the node and speaker locations can be represented by the one-hot codes. Accordingly, the sound source localization problem is formulated as such a classification task of recognizing the one-hot code of the speaker given the one hot codes of the microphone nodes and their speech recordings. An end-to-end spatial-temporal deep model is designed for the classification problem. It utilizes a spatial-temporal attention architecture with a fusion layer inserted in the middle of the architecture, which is able to handle arbitrarily different numbers of microphone nodes during the model training and test. Experimental results show that the proposed method yields good performance in highly reverberant and noisy environments., Comment: 6 pages, 4 figures, coference
- Published
- 2022
21. Improving Pseudo Labels With Intra-Class Similarity for Unsupervised Domain Adaptation
- Author
-
Wang, Jie and Zhang, Xiao-Lei
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Computer Science - Machine Learning - Abstract
Unsupervised domain adaptation (UDA) transfers knowledge from a label-rich source domain to a different but related fully-unlabeled target domain. To address the problem of domain shift, more and more UDA methods adopt pseudo labels of the target samples to improve the generalization ability on the target domain. However, inaccurate pseudo labels of the target samples may yield suboptimal performance with error accumulation during the optimization process. Moreover, once the pseudo labels are generated, how to remedy the generated pseudo labels is far from explored. In this paper, we propose a novel approach to improve the accuracy of the pseudo labels in the target domain. It first generates coarse pseudo labels by a conventional UDA method. Then, it iteratively exploits the intra-class similarity of the target samples for improving the generated coarse pseudo labels, and aligns the source and target domains with the improved pseudo labels. The accuracy improvement of the pseudo labels is made by first deleting dissimilar samples, and then using spanning trees to eliminate the samples with the wrong pseudo labels in the intra-class samples. We have applied the proposed approach to several conventional UDA methods as an additional term. Experimental results demonstrate that the proposed method can boost the accuracy of the pseudo labels and further lead to more discriminative and domain invariant features than the conventional baselines., Comment: 26 pages, 8 figures
- Published
- 2022
22. A two-step deep learning-based framework for metro tunnel lining defect recognition
- Author
-
Feng, Yong, Feng, Shi-Jin, Zhang, Xiao-Lei, Zhang, Dong-Mei, and Zhao, Yong
- Published
- 2024
- Full Text
- View/download PDF
23. New Interpretation of Neonatal Outcomes by Phenotypically Classified Preterm Syndrome: A Retrospective Cohort Study
- Author
-
Lv, Dan, Zhang, Yan-ling, Xie, Yin, Ye, Fang, Zhang, Xiao-lei, Xu, He-ze, Sun, Ya-nan, Li, Fan-fan, He, Meng-zhou, Fan, Yao, Li, Wei, Zeng, Wan-jiang, Chen, Su-hua, Feng, Ling, Lin, Xing-guang, and Deng, Dong-rui
- Published
- 2023
- Full Text
- View/download PDF
24. Establishment of a Rat Model of Capillary Leakage Syndrome Induced by Cardiopulmonary Resuscitation After Cardiac Arrest
- Author
-
Zhang, Xiao-lei, Cheng, Ye, Xing, Chun-lin, Ying, Jia-yun, Yang, Xue, Cai, Xiao-di, and Lu, Guo-ping
- Published
- 2023
- Full Text
- View/download PDF
25. Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays
- Author
-
Liang, Chengdong, Chen, Yijiang, Yao, Jiadi, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Speaker verification based on ad-hoc microphone arrays has the potential of reducing the error significantly in adverse acoustic environments. However, existing approaches extract utterance-level speaker embeddings from each channel of an ad-hoc microphone array, which does not consider fully the spatial-temporal information across the devices. In this paper, we propose to aggregate the multichannel signals of the ad-hoc microphone array at the frame-level by exploring the cross-channel information deeply with two attention mechanisms. The first one is a self-attention method. It consists of a cross-frame self-attention layer and a cross-channel self-attention layer successively, both working at the frame level. The second one learns the cross-frame and cross-channel information via two graph attention layers. Experimental results demonstrate that the proposed methods reach the state-of-the-art performance. Moreover, the graph-attention method is better than the self-attention method in most cases., Comment: 5 pages, 3 figures
- Published
- 2021
26. Robust multilayer bootstrap networks in ensemble for unsupervised representation learning and clustering
- Author
-
Zhang, Xiao-Lei and Li, Xuelong
- Published
- 2024
- Full Text
- View/download PDF
27. Zelquistinel acts at an extracellular binding domain to modulate intracellular calcium inactivation of N-methyl-d-aspartate receptors
- Author
-
Zhang, Xiao-lei, Li, Yong-Xin, Berglund, Nils, Burgdorf, Jeffrey S., Donello, John E., Moskal, Joseph R., and Stanton, Patric K.
- Published
- 2024
- Full Text
- View/download PDF
28. Conformer-based End-to-end Speech Recognition With Rotary Position Embedding
- Author
-
Li, Shengqiang, Xu, Menglong, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Transformer-based end-to-end speech recognition models have received considerable attention in recent years due to their high training speed and ability to model a long-range global context. Position embedding in the transformer architecture is indispensable because it provides supervision for dependency modeling between elements at different positions in the input sequence. To make use of the time order of the input sequence, many works inject some information about the relative or absolute position of the element into the input sequence. In this work, we investigate various position embedding methods in the convolution-augmented transformer (conformer) and adopt a novel implementation named rotary position embedding (RoPE). RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information into a self-attention module. To evaluate the effectiveness of the RoPE method, we conducted experiments on AISHELL-1 and LibriSpeech corpora. Results show that the conformer enhanced with RoPE achieves superior performance in the speech recognition task. Specifically, our model achieves a relative word error rate reduction of 8.70% and 7.27% over the conformer on test-clean and test-other sets of the LibriSpeech corpus respectively., Comment: 5 pages, 3 figures
- Published
- 2021
29. AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data
- Author
-
Xu, Menglong, Li, Shengqiang, Liang, Chengdong, and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Sound - Abstract
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, if training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where unseen sounds that are out of the training data are frequently encountered. Most conventional methods aim to maximize the classification accuracy on the training set, without taking the unseen sounds into account. To enhance the robustness of the deep neural networks based KWS, in this paper, we introduce a new loss function, named the maximization of the area under the receiver-operating-characteristic curve (AUC). The proposed method not only maximizes the classification accuracy of keywords on the closed training set, but also maximizes the AUC score for optimizing the performance of non-keyword segments detection. Experimental results on the Google Speech Commands dataset v1 and v2 show that our method achieves new state-of-the-art performance in terms of most evaluation metrics., Comment: submitted to ASRU2021
- Published
- 2021
30. fMBN-E: Efficient Unsupervised Network Structure Ensemble and Selection for Clustering
- Author
-
Zhang, Xiao-Lei
- Subjects
Computer Science - Machine Learning - Abstract
It is known that unsupervised nonlinear dimensionality reduction and clustering is sensitive to the selection of hyperparameters, particularly for deep learning based methods, which hinders its practical use. How to select a proper network structure that may be dramatically different in different applications is a hard issue for deep models, given little prior knowledge of data. In this paper, we aim to automatically determine the optimal network structure of a deep model, named multilayer bootstrap networks (MBN), via simple ensemble learning and selection techniques. Specifically, we first propose an MBN ensemble (MBN-E) algorithm which concatenates the sparse outputs of a set of MBN base models with different network structures into a new representation. Then, we take the new representation produced by MBN-E as a reference for selecting the optimal MBN base models. Moreover, we propose a fast version of MBN-E (fMBN-E), which is not only theoretically even faster than a single standard MBN but also does not increase the estimation error of MBN-E. Importantly, MBN-E and its ensemble selection techniques maintain the simple formulation of MBN that is based on one-nearest-neighbor learning. Empirically, comparing to a number of advanced deep clustering methods and as many as 20 representative unsupervised ensemble learning and selection methods, the proposed methods reach the state-of-the-art performance without manual hyperparameter tuning. fMBN-E is empirically even hundreds of times faster than MBN-E without suffering performance degradation. The applications to image segmentation and graph data mining further demonstrate the advantage of the proposed methods.
- Published
- 2021
31. Myeloid cell deficiency of the inflammatory transcription factor Stat4 protects long-term synaptic plasticity from the effects of a high-fat, high-cholesterol diet
- Author
-
Zhang, Xiao-lei, Hollander, Callie M., Khan, Mohammad Yasir, D’silva, Melinee, Ma, Haoqin, Yang, Xinyuan, Bai, Robin, Keeter, Coles K., Galkina, Elena V., Nadler, Jerry L., and Stanton, Patric K.
- Published
- 2023
- Full Text
- View/download PDF
32. Attention-based multi-channel speaker verification with ad-hoc microphone arrays
- Author
-
Liang, Chengdong, Chen, Junqi, Guan, Shanzheng, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel speaker verification with ad-hoc microphone arrays. Specifically, we add an inter-channel processing layer and a global fusion layer after the pooling layer of a single-channel speaker verification system. The inter-channel processing layer applies a so-called residual self-attention along the channel dimension for allocating weights to different microphones. The global fusion layer integrates all channels in a way that is independent to the number of the input channels. We further replace the softmax operator in the residual self-attention with sparsemax, which forces the channel weights of very noisy channels to zero. Experimental results with ad-hoc microphone arrays of over 30 channels demonstrate the effectiveness of the proposed methods. For example, the multi-channel speaker verification with sparsemax achieves an equal error rate (EER) of over 20% lower than oracle one-best system on semi-real data sets, and over 30% lower on simulation data sets, in test scenarios with both matched and mismatched channel numbers., Comment: Submitted to APSIPA ASC 2021
- Published
- 2021
33. Efficient conformer-based speech recognition with linear attention
- Author
-
Li, Shengqiang, Xu, Menglong, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recently, conformer-based end-to-end automatic speech recognition, which outperforms recurrent neural network based ones, has received much attention. Although the parallel computing of conformer is more efficient than recurrent neural networks, the computational complexity of its dot-product self-attention is quadratic with respect to the length of the input feature. To reduce the computational complexity of the self-attention layer, we propose multi-head linear self-attention for the self-attention layer, which reduces its computational complexity to linear order. In addition, we propose to factorize the feed forward module of the conformer by low-rank matrix factorization, which successfully reduces the number of the parameters by approximate 50% with little performance loss. The proposed model, named linear attention based conformer (LAC), can be trained and inferenced jointly with the connectionist temporal classification objective, which further improves the performance of LAC. To evaluate the effectiveness of LAC, we conduct experiments on the AISHELL-1 and LibriSpeech corpora. Results show that the proposed LAC achieves better performance than 7 recently proposed speech recognition models, and is competitive with the state-of-the-art conformer. Meanwhile, the proposed LAC has a number of parameters of only 50% over the conformer with faster training speed than the latter., Comment: submitted to APSIPA ASC 2021
- Published
- 2021
34. Preferential adsorption control: Ca-based LDO for regenerating used lubricating oil
- Author
-
Zhang, Xiao-lei, Wu, Jian-zhong, Liu, Yun, Zhang, Zai-wu, Hu, Chao, and Lu, Yong-sheng
- Published
- 2024
- Full Text
- View/download PDF
35. Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
- Author
-
Liang, Chengdong, Xu, Menglong, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Computer Science - Computation and Language ,Computer Science - Machine Learning ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Self-attention (SA), which encodes vector sequences according to their pairwise similarity, is widely used in speech recognition due to its strong context modeling ability. However, when applied to long sequence data, its accuracy is reduced. This is caused by the fact that its weighted average operator may lead to the dispersion of the attention distribution, which results in the relationship between adjacent signals ignored. To address this issue, in this paper, we introduce relative-position-awareness self-attention (RPSA). It not only maintains the global-range dependency modeling ability of self-attention, but also improves the localness modeling ability. Because the local window length of the original RPSA is fixed and sensitive to different test data, here we propose Gaussian-based self-attention (GSA) whose window length is learnable and adaptive to the test data automatically. We further generalize GSA to a new residual Gaussian self-attention (resGSA) for the performance improvement. We apply RPSA, GSA, and resGSA to Transformer-based speech recognition respectively. Experimental results on the AISHELL-1 Mandarin speech recognition corpus demonstrate the effectiveness of the proposed methods. For example, the resGSA-Transformer achieves a character error rate (CER) of 5.86% on the test set, which is relative 7.8% lower than that of the SA-Transformer. Although the performance of the proposed resGSA-Transformer is only slightly better than that of the RPSA-Transformer, it does not have to tune the window length manually., Comment: There is an error in the description of section 3.2.1
- Published
- 2021
36. Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays
- Author
-
Chen, Junqi and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
Recently, speech recognition with ad-hoc microphone arrays has received much attention. It is known that channel selection is an important problem of ad-hoc microphone arrays, however, this topic seems far from explored in speech recognition yet, particularly with a large-scale ad-hoc microphone array. To address this problem, we propose a Scaling Sparsemax algorithm for the channel selection problem of the speech recognition with large-scale ad-hoc microphone arrays. Specifically, we first replace the conventional Softmax operator in the stream attention mechanism of a multichannel end-to-end speech recognition system with Sparsemax, which conducts channel selection by forcing the channel weights of noisy channels to zero. Because Sparsemax punishes the weights of many channels to zero harshly, we propose Scaling Sparsemax which punishes the channels mildly by setting the weights of very noisy channels to zero only. Experimental results with ad-hoc microphone arrays of over 30 channels under the conformer speech recognition architecture show that the proposed Scaling Sparsemax yields a word error rate of over 30% lower than Softmax on simulation data sets, and over 20% lower on semi-real data sets, in test scenarios with both matched and mismatched channel numbers.
- Published
- 2021
- Full Text
- View/download PDF
37. Libri-adhoc40: A dataset collected from synchronized ad-hoc microphone arrays
- Author
-
Guan, Shanzheng, Liu, Shupei, Chen, Junqi, Zhu, Wenbo, Li, Shengqiang, Tan, Xu, Yang, Ziye, Xu, Menglong, Chen, Yijiang, Wang, Jianyu, and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recently, there is a research trend on ad-hoc microphone arrays. However, most research was conducted on simulated data. Although some data sets were collected with a small number of distributed devices, they were not synchronized which hinders the fundamental theoretical research to ad-hoc microphone arrays. To address this issue, this paper presents a synchronized speech corpus, named Libri-adhoc40, which collects the replayed Librispeech data from loudspeakers by ad-hoc microphone arrays of 40 strongly synchronized distributed nodes in a real office environment. Besides, to provide the evaluation target for speech frontend processing and other applications, we also recorded the replayed speech in an anechoic chamber. We trained several multi-device speech recognition systems on both the Libri-adhoc40 dataset and a simulated dataset. Experimental results demonstrate the validness of the proposed corpus which can be used as a benchmark to reflect the trend and difference of the models with different ad-hoc microphone arrays. The dataset is online available at https://github.com/ISmallFish/Libri-adhoc40.
- Published
- 2021
38. Deep NMF Topic Modeling
- Author
-
Wang, JianYu and Zhang, Xiao-Lei
- Subjects
Computer Science - Information Retrieval - Abstract
Nonnegative matrix factorization (NMF) based topic modeling methods do not rely on model- or data-assumptions much. However, they are usually formulated as difficult optimization problems, which may suffer from bad local minima and high computational complexity. In this paper, we propose a deep NMF (DNMF) topic modeling framework to alleviate the aforementioned problems. It first applies an unsupervised deep learning method to learn latent hierarchical structures of documents, under the assumption that if we could learn a good representation of documents by, e.g. a deep model, then the topic word discovery problem can be boosted. Then, it takes the output of the deep model to constrain a topic-document distribution for the discovery of the discriminant topic words, which not only improves the efficacy but also reduces the computational complexity over conventional unsupervised NMF methods. We constrain the topic-document distribution in three ways, which takes the advantages of the three major sub-categories of NMF -- basic NMF, structured NMF, and constrained NMF respectively. To overcome the weaknesses of deep neural networks in unsupervised topic modeling, we adopt a non-neural-network deep model -- multilayer bootstrap network. To our knowledge, this is the first time that a deep NMF model is used for unsupervised topic modeling. We have compared the proposed method with a number of representative references covering major branches of topic modeling on a variety of real-world text corpora. Experimental results illustrate the effectiveness of the proposed method under various evaluation metrics.
- Published
- 2021
39. Minimum-volume Multichannel Nonnegative matrix factorization for blind source separation
- Author
-
Wang, Jianyu, Guan, Shanzheng, Liu, Shupei, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Multichannel blind audio source separation aims to recover the latent sources from their multichannel mixtures without supervised information. One state-of-the-art blind audio source separation method, named independent low-rank matrix analysis (ILRMA), unifies independent vector analysis (IVA) and nonnegative matrix factorization (NMF). However, the spectra matrix produced from NMF may not find a compact spectral basis. It may not guarantee the identifiability of each source as well. To address this problem, here we propose to enhance the identifiability of the source model by a minimum-volume prior distribution. We further regularize a multichannel NMF (MNMF) and ILRMA respectively with the minimum-volume regularizer. The proposed methods maximize the posterior distribution of the separated sources, which ensures the stability of the convergence. Experimental results demonstrate the effectiveness of the proposed methods compared with auxiliary independent vector analysis, MNMF, ILRMA and its extensions.
- Published
- 2021
40. Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
- Author
-
Yang, Ziye, Guan, Shanzheng, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Recently, the research on ad-hoc microphone arrays with deep learning has drawn much attention, especially in speech enhancement and separation. Because an ad-hoc microphone array may cover such a large area that multiple speakers may locate far apart and talk independently, target-dependent speech separation, which aims to extract a target speaker from a mixed speech, is important for extracting and tracing a specific speaker in the ad-hoc array. However, this technique has not been explored yet. In this paper, we propose deep ad-hoc beamforming based on speaker extraction, which is to our knowledge the first work for target-dependent speech separation based on ad-hoc microphone arrays and deep learning. The algorithm contains three components. First, we propose a supervised channel selection framework based on speaker extraction, where the estimated utterance-level SNRs of the target speech are used as the basis for the channel selection. Second, we apply the selected channels to a deep learning based MVDR algorithm, where a single-channel speaker extraction algorithm is applied to each selected channel for estimating the mask of the target speech. We conducted an extensive experiment on a WSJ0-adhoc corpus. Experimental results demonstrate the effectiveness of the proposed method.
- Published
- 2020
41. Speaker Recognition Based on Deep Learning: An Overview
- Author
-
Bai, Zhongxin and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Speaker recognition is a task of identifying persons from their voices. Recently, deep learning has dramatically revolutionized speaker recognition. However, there is lack of comprehensive reviews on the exciting progress. In this paper, we review several major subtasks of speaker recognition, including speaker verification, identification, diarization, and robust speaker recognition, with a focus on deep-learning-based methods. Because the major advantage of deep learning over conventional methods is its representation ability, which is able to produce highly abstract embedding features from utterances, we first pay close attention to deep-learning-based speaker feature extraction, including the inputs, network structures, temporal pooling strategies, and objective functions respectively, which are the fundamental components of many speaker recognition subtasks. Then, we make an overview of speaker diarization, with an emphasis of recent supervised, end-to-end, and online diarization. Finally, we survey robust speaker recognition from the perspectives of domain adaptation and speech enhancement, which are two major approaches of dealing with domain mismatch and noise problems. Popular and recently released corpora are listed at the end of the paper.
- Published
- 2020
42. A comparison of handcrafted, parameterized, and learnable features for speech separation
- Author
-
Zhu, Wenbo, Wang, Mou, Zhang, Xiao-Lei, and Rahardja, Susanto
- Subjects
Computer Science - Sound - Abstract
The design of acoustic features is important for speech separation. It can be roughly categorized into three classes: handcrafted, parameterized, and learnable features. Among them, learnable features, which are trained with separation networks jointly in an end-to-end fashion, become a new trend of modern speech separation research, e.g. convolutional time domain audio separation network (Conv-Tasnet), while handcrafted and parameterized features are also shown competitive in very recent studies. However, a systematic comparison across the three kinds of acoustic features has not been conducted yet. In this paper, we compare them in the framework of Conv-Tasnet by setting its encoder and decoder with different acoustic features. We also generalize the handcrafted multi-phase gammatone filterbank (MPGTF) to a new parameterized multi-phase gammatone filterbank (ParaMPGTF). Experimental results on the WSJ0-2mix corpus show that (i) if the decoder is learnable, then setting the encoder to STFT, MPGTF, ParaMPGTF, and learnable features lead to similar performance; and (ii) when the pseudo-inverse transforms of STFT, MPGTF, and ParaMPGTF are used as the decoders, the proposed ParaMPGTF performs better than the other two handcrafted features.
- Published
- 2020
43. Transformer-based End-to-End Speech Recognition with Local Dense Synthesizer Attention
- Author
-
Xu, Menglong, Li, Shengqiang, and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Computer Science - Computation and Language ,Electrical Engineering and Systems Science - Audio and Speech Processing ,68T10 - Abstract
Recently, several studies reported that dot-product selfattention (SA) may not be indispensable to the state-of-theart Transformer models. Motivated by the fact that dense synthesizer attention (DSA), which dispenses with dot products and pairwise interactions, achieved competitive results in many language processing tasks, in this paper, we first propose a DSA-based speech recognition, as an alternative to SA. To reduce the computational complexity and improve the performance, we further propose local DSA (LDSA) to restrict the attention scope of DSA to a local range around the current central frame for speech recognition. Finally, we combine LDSA with SA to extract the local and global information simultaneously. Experimental results on the Ai-shell1 Mandarine speech recognition corpus show that the proposed LDSA-Transformer achieves a character error rate (CER) of 6.49%, which is slightly better than that of the SA-Transformer. Meanwhile, the LDSA-Transformer requires less computation than the SATransformer. The proposed combination method not only achieves a CER of 6.18%, which significantly outperforms the SA-Transformer, but also has roughly the same number of parameters and computational complexity as the latter. The implementation of the multi-head LDSA is available at https://github.com/mlxu995/multihead-LDSA., Comment: 5 pages, 3 figures
- Published
- 2020
- Full Text
- View/download PDF
44. Speech enhancement aided end-to-end multi-task learning for voice activity detection
- Author
-
Tan, Xu and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing ,Computer Science - Machine Learning ,Computer Science - Sound - Abstract
Robust voice activity detection (VAD) is a challenging task in low signal-to-noise (SNR) environments. Recent studies show that speech enhancement is helpful to VAD, but the performance improvement is limited. To address this issue, here we propose a speech enhancement aided end-to-end multi-task model for VAD. The model has two decoders, one for speech enhancement and the other for VAD. The two decoders share the same encoder and speech separation network. Unlike the direct thought that takes two separated objectives for VAD and speech enhancement respectively, here we propose a new joint optimization objective -- VAD-masked scale-invariant source-to-distortion ratio (mSI-SDR). mSI-SDR uses VAD information to mask the output of the speech enhancement decoder in the training process. It makes the VAD and speech enhancement tasks jointly optimized not only at the shared encoder and separation network, but also at the objective level. It also satisfies real-time working requirement theoretically. Experimental results show that the multi-task method significantly outperforms its single-task VAD counterpart. Moreover, mSI-SDR outperforms SI-SDR in the same multi-task setting., Comment: Accepted by ICASSP2021
- Published
- 2020
45. SZC-6, a small-molecule activator of SIRT3, attenuates cardiac hypertrophy in mice
- Author
-
Li, Ze-yu, Lu, Guo-qing, Lu, Jing, Wang, Pan-xia, Zhang, Xiao-lei, Zou, Yong, and Liu, Pei-qing
- Published
- 2023
- Full Text
- View/download PDF
46. Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting
- Author
-
Xu, Menglong and Zhang, Xiao-Lei
- Subjects
Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
One difficult problem of keyword spotting is how to miniaturize its memory footprint while maintain a high precision. Although convolutional neural networks have shown to be effective to the small-footprint keyword spotting problem, they still need hundreds of thousands of parameters to achieve good performance. In this paper, we propose an efficient model based on depthwise separable convolution layers and squeeze-and-excitation blocks. Specifically, we replace the standard convolution by the depthwise separable convolution, which reduces the number of the parameters of the standard convolution without significant performance degradation. We further improve the performance of the depthwise separable convolution by reweighting the output feature maps of the first convolution layer with a so-called squeeze-and-excitation block. We compared the proposed method with five representative models on two experimental settings of the Google Speech Commands dataset. Experimental results show that the proposed method achieves the state-of-the-art performance. For example, it achieves a classification error rate of 3.29% with a number of parameters of 72K in the first experiment, which significantly outperforms the comparison methods given a similar model size. It achieves an error rate of 3.97% with a number of parameters of 10K, which is also slightly better than the state-of-the-art comparison method given a similar model size.
- Published
- 2020
- Full Text
- View/download PDF
47. Augmented Q Imitation Learning (AQIL)
- Author
-
Zhang, Xiao Lei and Agarwal, Anish
- Subjects
Computer Science - Machine Learning ,Computer Science - Artificial Intelligence - Abstract
The study of unsupervised learning can be generally divided into two categories: imitation learning and reinforcement learning. In imitation learning the machine learns by mimicking the behavior of an expert system whereas in reinforcement learning the machine learns via direct environment feedback. Traditional deep reinforcement learning takes a significant time before the machine starts to converge to an optimal policy. This paper proposes Augmented Q-Imitation-Learning, a method by which deep reinforcement learning convergence can be accelerated by applying Q-imitation-learning as the initial training process in traditional Deep Q-learning., Comment: 5 pages
- Published
- 2020
48. Partial AUC optimization based deep speaker embeddings with class-center learning for text-independent speaker verification
- Author
-
Bai, Zhongxin, Zhang, Xiao-Lei, and Chen, Jingdong
- Subjects
Computer Science - Machine Learning ,Computer Science - Sound ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
Deep embedding based text-independent speaker verification has demonstrated superior performance to traditional methods in many challenging scenarios. Its loss functions can be generally categorized into two classes, i.e., verification and identification. The verification loss functions match the pipeline of speaker verification, but their implementations are difficult. Thus, most state-of-the-art deep embedding methods use the identification loss functions with softmax output units or their variants. In this paper, we propose a verification loss function, named the maximization of partial area under the Receiver-operating-characteristic (ROC) curve (pAUC), for deep embedding based text-independent speaker verification. We also propose a class-center based training trial construction method to improve the training efficiency, which is critical for the proposed loss function to be comparable to the identification loss in performance. Experiments on the Speaker in the Wild (SITW) and NIST SRE 2016 datasets show that the proposed pAUC loss function is highly competitive with the state-of-the-art identification loss functions.
- Published
- 2019
49. Learning deep representations by multilayer bootstrap networks for speaker diarization
- Author
-
Li, Meng-Zhen and Zhang, Xiao-Lei
- Subjects
Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
The performance of speaker diarization is strongly affected by its clustering algorithm at the test stage. However, it is known that clustering algorithms are sensitive to random noises and small variations, particularly when the clustering algorithms themselves suffer some weaknesses, such as bad local minima and prior assumptions. To deal with the problem, a compact representation of speech segments with small within-class variances and large between-class distances is usually needed. In this paper, we apply an unsupervised deep model, named multilayer bootstrap network (MBN), to further process the embedding vectors of speech segments for the above problem. MBN is an unsupervised deep model for nonlinear dimensionality reduction. Unlike traditional neural network based deep model, it is a stack of $k$-centroids clustering ensembles, each of which is trained simply by random resampling of data and one-nearest-neighbor optimization. We construct speaker diarization systems by combining MBN with either the i-vector frontend or x-vector frontend, and evaluated their effectiveness on a simulated NIST diarization dataset, the AMI meeting corpus, and NIST SRE 2000 CALLHOME database. Experimental results show that the proposed systems are better than or at least comparable to the systems that do not use MBN., Comment: 5 pages, 4figures,coference
- Published
- 2019
50. Deep topic modeling by multilayer bootstrap network and lasso
- Author
-
Wang, Jianyu and Zhang, Xiao-Lei
- Subjects
Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Topic modeling is widely studied for the dimension reduction and analysis of documents. However, it is formulated as a difficult optimization problem. Current approximate solutions also suffer from inaccurate model- or data-assumptions. To deal with the above problems, we propose a polynomial-time deep topic model with no model and data assumptions. Specifically, we first apply multilayer bootstrap network (MBN), which is an unsupervised deep model, to reduce the dimension of documents, and then use the low-dimensional data representations or their clustering results as the target of supervised Lasso for topic word discovery. To our knowledge, this is the first time that MBN and Lasso are applied to unsupervised topic modeling. Experimental comparison results with five representative topic models on the 20-newsgroups and TDT2 corpora illustrate the effectiveness of the proposed algorithm.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.