Sparse Wavelet Decomposition and Filter Banks with CNN Deep Learning for Speech Recognition

Authors :: Lizhe Tan
Yaan Zhang
Jean Jiang
Jintao Hou
Jingzhao Dai
Xiewen Wang
Source :: EIT
Publication Year :: 2019
Publisher :: IEEE, 2019.
Abstract: In this paper, the speech recognition algorithms using CNN deep learning based on the sparse discrete wavelet decomposition (SDWD) and bandpass filter banks (BPFB) are proposed. The proposed algorithms consist of three stages. First, speech signal is decomposed into sub-band signals according to the Mel filter bank frequency specification using the SDWD or BPFB. The power values from sub-bands form a feature vector for the speech frame. Cascading feature vectors for consecutive speech frames constructs a two-dimension feature image. Secondly, each obtained feature image is subject to flipping operations in order to reduce edge effect when using the standard CNN. Finally, the CNN deep learning is adopted for training and recognition. The experimental results demonstrate that our proposed SDWD-CNN and BPFB-CNN outperforms the support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF) algorithms.

Subjects :: Computer science
business.industry
Deep learning
Feature vector
Speech recognition
Frame (networking)
Filter (signal processing)
Filter bank
Random forest
Support vector machine
030507 speech-language pathology & audiology
03 medical and health sciences
Feature (machine learning)
Artificial intelligence
0305 other medical science
business

Database :: OpenAIRE
Journal :: 2019 IEEE International Conference on Electro Information Technology (EIT)
Accession number :: edsair.doi...........f81d6d054724df2be775df5e91b6935a
Full Text :: https://doi.org/10.1109/eit.2019.8833972

Full Text Access

Tools