|
Modelling
|
of voice quality correlates |
Formant-TrackingFormant tracking is a well investigated problem and there have been a number of solutions with various degree of success proposed over the years. Although a term formant is reserved for the resonant frequencies of the vocal tract, the formant tracking techniques are mainly based on the spectral observations of a speech signal and thereby ignore the fact that the speech signal as such is a result of convolution of non-stationary speech excitation and the vocal tract. The effect of glottal excitation is to some extent minimised, but far from being eliminated by pre-emphasising the speech. The study of frequency domain behaviour of synthetic source excitation and speech indicate that formant tracking can be significantly improved by an adequate source-vocal tract de-convolution. The glottal excitation energy is predominantly confined to a low frequency spectrum, in the range of first and second formants. The spectral shape of source excitation energy can be described as having a resonant peak in the region of 0-600Hz (check this) and thereafter on average monotonically decreasing in value. From the LF model perspective, the glottal formant is an additive result of a sinusoid enveloped by a rising exponential (in duration of te) and glottal signal from than onwards until the onset of closing phase. The LF parameters generate glottal formant of varying frequency, magnitude and bandwidth, varying pre-glottal formant behaviour and varying slope in the post formant region. As far as the formant tracking is concerned the most significant determinant of the source excitation is posed by the glottal formant as it is high in energy and exists in the region where first formant occurs. However, as the glottal energy decreases with increasing frequency relatively slowly (around 6dB/oct) the estimation of the second formant is also affected. The experiments have showed that the observations of frequency spectrum of synthetic speech can give up to 25% discrepancy in frequency of the first formant to the one used to generate the signal. The formants other then the first two are estimated with a high degree of success as the energy of the glottal excitation is very low in comparison to those of formants. The studies have shown that the first two formants are very active in terms of their movement across the frequency space compared to the other formants. This correlates well with our study of the affect of glottal excitation on formant estimation which indicates that this varying behaviour could at least partially be due to the glottal excitation. With the formant the LF parameter tracking technique that is based on glottal excitation-vocal tract de-convolution better estimates of the formants would be obtained and most importantly the effect of the glottal excitation would be significantly reduced. This will enable a better study of both formants and glottal excitation and the extent to which they carry accent, speaker, and phonetic identity. |
Designed for screen resolution 1024x768 using HTML 4.0 and CSS level 1.
Any comments, questins or suggestions are very welcome and should be directed to
emir.turajlic@brunel.ac.uk.