Start Over

Linguistic units and units of speech production

Authors :: Celia Scully
Source :: Speech Communication. 6:77-142
Publication Year :: 1987
Publisher :: Elsevier BV, 1987.
Abstract: Links are needed to bridge the gap between the analysis of speech as a set of discrete, ordered but durationless linguistic unit and analyses of the continuously changing acoustic signals, defined along a time axis. Current recognition and synthesis devices do not make good use of the structure imposed by speech production processes on the mapping between an allophone sequence and the many possible associated speech signals. A quantitative, flexible articulatory time framework has been developed as a contribution to the new kinds of phonetic descriptions needed. Units of articulation for allophones of the phonemes of British English and methods for linking adjacent allophones are proposed. Tentative specifications for a sub-set are offered, based on a review of published findings for natural speech. Articulatory schemes are taken to be organised with reference to particular events E . Pairs of events need to be appropriately coordinated in time. The two events may relate to inter-articulator coordination between two different quasi-independent articulators or to the durational extent of a statically maintained state for a single articulator. The coordination between the two events is expressed through the duration D of the time interval between them. Six examples are given of the construction of a complete articulatory time plan for an English sequence. This forms the first stage for a computer-implemented model of the articulatory, aerodynamic and acoustic processes of speech production. The synthetic speech output from the model is given acoustic variations intended to mimic those arising in natural speech due to a speaker's choice of options, including a change in rate of speech. This is achieved in the modelling by altering one or more D values in the articulatory time plan and by dispensing with some optional actions. The variability of multiple repetitions by a real speaker can be introduced into the synthetic speech by perturbing the D values. The model needs to be matched to specific real speakers in order to assess the extent to which it is realistic in its simulation of the variation and variability of acoustic pattern features for natural speech and the extent to which covariations can be predicted with it.