Database: Academic Search Index / Publication Year Range: Last 10 years / Publisher: academic press inc. / Search Limiters: Full Text and Peer Reviewed / Topic: algorithms and automatic speech recognition - Searchworks@Jio Institute Digital Library Search Results

Showing total 5 results

Start Over Search Limiters Full Text Search Limiters Peer Reviewed Topic algorithms Topic automatic speech recognition Publication Year Range Last 10 years Database Academic Search Index Publisher academic press inc.

5 results

1. Scalable algorithms for unsupervised clustering of acoustic data for speech recognition.

Author: Rath, Shakti P.
Subjects: *ALGORITHMS, *AUTOMATIC speech recognition, *ACOUSTICS, *CLUSTER analysis (Statistics), *ARTIFICIAL neural networks
Abstract: In this paper an unsupervised clustering algorithm is developed for acoustic data in the context of speech recognition tasks. One of the key features of the algorithm is scalability to large data sets. Specifically, given the unlabeled training and test sets, the class-labels of the utterances are obtained in an automatic manner. The extracted labels may correspond to the speakers in the speech corpus if the data is relatively clean. The proposed scheme is attractive from an industrial perspective as it alleviates the need to store the speaker-labels manually, saving considerable amount of human efforts and expenses. The core of the algorithm comprises a three-stage architecture that processes the input data one after the other, while each stage is designed to perform a well-defined and specific task. In more detail, the first-pass involves a bottom-up clustering mechanism, the second-pass comprises a cluster splitting operation and the third-pass consists of a cluster refining process. Each of the stages allows for data parallelization using multiple CPUs that leads to faster computation. Two alternative forms of the algorithm are presented – the first considers Gaussian distributions and the other i-Vectors – to facilitate the clustering. Although the algorithm may find applications in various realms of speech recognition, in this paper, the effectiveness of the schemes are evaluated by means of speaker adaptive training (SAT) and speaker-aware training of DNN-HMM acoustic models. In particular, experiments are conducted on the Switchboard task to extract the speaker-labels for the utterances in the training and test sets. It is shown that the SAT DNN-HMM trained using the Gaussian based scheme yields a 7.2% relative improvement in the ASR accuracy over the speaker independent DNN-HMM, whereas the i-Vector approach provides an additional improvement, amounting to a 10.8% relative gain overall. The standard SAT DNN-HMM developed using the ground-truth speaker-labels is found to be only 2.7% relative better than the proposed scheme. Similar observation is made as with speaker-aware training. The analysis of computational complexity, conducted stage by stage, demonstrates the scalability of the proposed algorithms. [ABSTRACT FROM AUTHOR]
Published: 2017
Full Text: View/download PDF

2. DBMiP: A pre-training method for information propagation over deep networks.

Author: Zoughi, Toktam and Homayounpour, Mohammad Mehdi
Subjects: *INFORMATION dissemination, *ARTIFICIAL neural networks, *AUTOMATIC speech recognition, *ALGORITHMS, *GAUSSIAN mixture models
Abstract: Deep neural networks (DNNs) have recently been successful in many applications and have become a popular approach for speech recognition. Training a DNN model for speech recognition is computationally expensive due to the model large number of parameters. Pre-training improves DNN modeling. However, DNN learning is challenging if pre-training is inefficient. This paper introduces a new framework for pre-training that utilizes label information in lower layers (layers near input) for better recognition. The proposed pre-training method dynamically inserts discriminative information not only in the last layer but also in other layers. In this algorithm, the lower layers achieve more generative information while the higher layers achieve more discriminative information. In addition, this method uses speaker information by employing the Subspace Gaussian Mixture Model (SGMM), which improves recognition accuracy. Experimental results on TIMIT, MNIST, Switchboard, and English Broadcast News datasets show that this approach significantly outperforms current state-of-the-art methods such as the Deep Belief Network and the Deep Boltzmann Machine. Moreover, the proposed algorithm has minimal memory requirements. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

3. SFA: Searching faster architectures for end-to-end automatic speech recognition models.

Author: Liu, Yukun, Li, Ta, Zhang, Pengyuan, and Yan, Yonghong
Subjects: *AUTOMATIC speech recognition, *SEARCH algorithms, *ALGORITHMS, *PATTERN recognition systems
Abstract: Recently End-to-end (E2E) Automatic Speech Recognition (ASR) has been widely used due to its advantages over the hybrid method. Even though existing E2E ASR models have achieved impressive performance, they usually take a large model size and suffer from a slow inference speed in real-world applications. To obtain faster models for E2E ASR, we propose searching faster architectures with the help of neural architecture search (NAS) in this paper, named SFA. SFA consists of one search space that contains a set of candidate architectures and one search algorithm responsible for searching the optimal architecture from the search space. On one hand, SFA designs a topology-fused search space to integrate different topologies of existing architectures (e.g. Transformer, Conformer) and explore more brand-new ones. On the other hand, combined with the training criterion of E2E ASR, SFA develops a speed-aware differentiable search algorithm to search faster architectures according to target hardware devices. Additionally, a connectionist temporal classification based progressive search algorithm is proposed to reduce the difficulty of the architecture search and obtain better performance. On two commonly-used Mandarin datasets, SFA can effectively improve the inference speed of existing E2E ASR models with comparable performance and achieve at most 2. 46 × / 1. 98 × CPU/GPU speedup than the best human-designed baselines. • E2E ASR suffers from a slow inference speed. • The proposed SFA platform designs fasterarchitectures with the for E2E ASR. • A topology-fused search space provides adequate candidate architectures. • A speed-aware search algorithm selects promising architectures. • An obvious speedup can be achieved with SFA. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

4. A review of speaker diarization: Recent advances with deep learning.

Author: Park, Tae Jin, Kanda, Naoyuki, Dimitriadis, Dimitrios, Han, Kyu J., Watanabe, Shinji, and Narayanan, Shrikanth
Subjects: *DEEP learning, *SPEECH, *AUTOMATIC speech recognition, *ALGORITHMS, *TECHNOLOGICAL innovations
Abstract: Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognition on multispeaker audio recordings to enable speaker adaptive processing. These algorithms also gained their own value as a standalone application over time to provide speaker-specific metainformation for downstream tasks such as audio retrieval. More recently, with the emergence of deep learning technology, which has driven revolutionary changes in research and practices across speech application domains, rapid advancements have been made for speaker diarization. In this paper, we review not only the historical development of speaker diarization technology but also the recent advancements in neural speaker diarization approaches. Furthermore, we discuss how speaker diarization systems have been integrated with speech recognition applications and how the recent surge of deep learning is leading the way of jointly modeling these two components to be complementary to each other. By considering such exciting technical trends, we believe that this paper is a valuable contribution to the community to provide a survey work by consolidating the recent developments with neural methods and thus facilitating further progress toward a more efficient speaker diarization. • The latest trends and approaches to speaker diarization as part of speech interaction applications. • Overview of the development of speaker diarization in the era of deep learning. • Review of diarization techniques belonging to the proposed taxonomy. • Introduction of techniques used in the traditional, modular speaker diarization systems. • Recent advancements in joint training approaches and fully end-to-end models. • A perspective of how speaker diarization has been investigated in the context of ASR. • Review of the challenges, the future of speaker diarization and the applications of speaker diarization. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

5. Situated language understanding for a spoken dialog system within vehicles.

Author: Misu, Teruhisa, Raux, Antoine, Gupta, Rakesh, and Lane, Ian
Subjects: *AUTOMATIC speech recognition, *AUTOMATIC systems in automobiles, *HUMAN-machine relationship, *COMPUTATIONAL linguistics, *ALGORITHMS
Abstract: In this paper, we address issues in situated language understanding in a moving car, which has the additional challenge of being a rapidly changing environment. More specifically, we propose methods for understanding user queries regarding specific target buildings in their surroundings. Unlike previous studies on physically situated interactions, such as interactions with mobile robots, the task at hand is very time sensitive because the spatial relationship between the car and target changes while the user is speaking. We collected situated utterances from drivers using our research system called Townsurfer, which was embedded in a real vehicle. Based on this data, we analyzed the timing of user queries, the spatial relationships between the car and the targets, the head pose of the user, and linguistic cues. Based on this analysis, we further propose methods to optimize timing and spatial distances and to make use of linguistic cues. Finally, we demonstrate that our algorithms improved the target identification rate by 24.1% absolute. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

5 results

1. Scalable algorithms for unsupervised clustering of acoustic data for speech recognition.

2. DBMiP: A pre-training method for information propagation over deep networks.

3. SFA: Searching faster architectures for end-to-end automatic speech recognition models.

4. A review of speaker diarization: Recent advances with deep learning.

5. Situated language understanding for a spoken dialog system within vehicles.

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

5 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources