Back to Search Start Over

Continuous, speaker-independent, speech recognition for a speech to viseme translator

Authors :
Kelleher, Holly
Publication Year :
1999
Publisher :
University of Surrey, 1999.

Abstract

The work presented in this thesis forms part of a research project which attempts to generate a visualisation of a speaker's mouth from purely acoustic speech signals. The aim is to provide an aid for partially hearing impaired people in which visual information is presented alongside limited acoustic signals, facilitating easier use of the telephone. The system is essentially a low-level speech recogniser in which phonemic information is extracted from the speech waveform and mapped onto visemes generated on a synthetic facial image. This thesis presents a description of a major part of this project, that is, the development of an accurate phoneme discriminator which is capable of speaker independent operation, on continuous speech. The recognition process is realised in three stages: a pre-processor to convert the speech into a suitable parametric form; a pattern recogniser to identify the possible phoneme classes and a post-processor to produce the viseme information. The pattern recognition stage uses a self-organising Kohonen network, followed by a Learning Vector Quantiser (LVQ) to further improve the recognition accuracy. The performance of this stage is highly dependent on the choice of pre-processor used at the input to the network and it is the design of the pre-processor stage that forms a significant part of this work. A novel technique known as the pseudo-cepstrum forms the basis of this pre-processor. Extensive investigations have been conducted into the dependence of performance on a range of parameters, both at the pre-processor stage and within the Kohonen classifier. In particular, a performance comparison of several preprocessor techniques, including the pseudo-cepstrum, has been carried out. Factors affecting both the training and operation of the classifier are also described here, with the sensitivity of recognition performance to the input data, being a major issue. Overall recognition accuracies of 80% have been achieved.

Subjects

Subjects :
621.3994
Acoustic

Details

Language :
English
Database :
British Library EThOS
Publication Type :
Dissertation/ Thesis
Accession number :
edsble.298086
Document Type :
Electronic Thesis or Dissertation