Back to Search Start Over

A high-performance speech BioHashing retrieval algorithm based on audio segmentation.

Authors :
Huang, Yi-Bo
Chen, De-Huai
Hua, Bo-Run
Zhang, Qiu-Yu
Source :
Computer Speech & Language. Jan2024, Vol. 83, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

As one of the research hotspots in the field of speech recognition, content-based speech retrieval algorithms can detect speech information with the same content features, which improves computer intelligence while reducing labor costs, and thus have been widely used. Although most of the current speech content retrieval algorithms can guarantee excellent retrieval performance for small-scale speech retrieval work, the performance of the above algorithms is greatly reduced under the constraints of large speech data storage space and high content redundancy. In order to solve the above problems, a high-performance speech BioHashing retrieval algorithm based on audio segmentation is proposed in this paper. The algorithm is divided into an offline pre-processing phase and an online retrieval phase, The offline pre-processing stage converts the speech data into BioHashing sequences with speech content characteristics. In this process, first of all, the Power-Normalized Cepstral Coefficients (PNCC) features of the speech data are extracted and biometric templates with single mapping keys are constructed according to the PNCC features, obtaining BioHashing sequences. Then, slice the original speeches into short-time audio segments according to the proposed audio segmentation algorithm, and the hash reconstruction operation is performed on the BioHashing sequences to obtain the reconstructed Hashing sequences for online retrieval. The online search phase responds to the users' query requests, just find the hash index that matches the query hash sequence from the BioHashing index table, and will the standardized editing distance (SED) to the closest 1 value corresponding to the hash index as the retrieval result back to the user. The experimental results show that the reconstructed hash sequences obtained after removing the silent redundant segments have better robustness and discrimination. Moreover, the algorithm achieves 100% retrieval accuracy for the original speech clips, and the average retrieval time is only 0.0157 s, which shows that the algorithm has good retrieval performance and can meet the needs of speech retrieval in various environments. • The audio segmentation algorithm is applied to speech retrieval. • Using normalized editing distances to match hash sequences of different lengths. • The machine learning classification algorithm is applied to the speech feature data. • Constructing revocable biometric templates, implemented secondary distribution retrieval. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
08852308
Volume :
83
Database :
Academic Search Index
Journal :
Computer Speech & Language
Publication Type :
Academic Journal
Accession number :
171991642
Full Text :
https://doi.org/10.1016/j.csl.2023.101551