|
Modelling
|
of voice quality correlates |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Introduction toVoice Quality Definition: Perceived characteristic ‘acoustic coloring’ of voice, derived from a variety of laryngeal and supralaryngeal features that are not unique to one individual but form clusters of identifiable voice types. Modal Voice, Breathy Voice, Pressed Voice, Creaky Voice, Tense Voice, Harsh Voice, Nasal Voice are all examples of different voice types. Voice quality is an effect of vocal tract anatomy, laryngeal anatomy and vocal habits. This is illustrated in the following examples. The characteristic sound of the voice is brought about by the mode of vibration of the vocal cords and folds. Differences in the degree and manner of glottal closure distinguish modal voice, breathy voice and whispery voice. The quality of the voice depends on the degree of tension in the larynx and pharynx, and on the vertical displacement of the larynx: a raised larynx produces a thin tense voice, and a lowered larynx a booming voice. Perceptual importance: In English, apart from distinguishing voiced and voiceless sounds, voice quality does not make linguistic contrasts, but conveys information about the speaker. In some languages such as Gujerati and Mazatec differences in voice quality or pitch trajectory are used to convey linguistic meanings. Languages and dialects have characteristic voice qualities; personal voice quality enables a listener to recognize a particular individual. Furthermore, the quality of someone's voice also conveys emotions and attitudes. Anatomy of human voice: In order to understand different voice types it is beneficial to link them to the voice producing anatomy. For this purpose I will refer to myoelastic aerodynamic theory of vocal fold vibrations. Figure 1 shows a model of human voice production system. It is consisted of the sub-glottal area (containing diaphragm, trachea and lungs), larynx (containing vocal folds) and supra-glottal area (vocal tract). The vocal folds are made of muscles tendon and mucous, and are of variable mass and elasticity. In-between the fold, the glottis of variable geometric properties controls the airflow towards the vocal tract. For the informative purpose only: Above the vocal folds there is a pair of what is commonly referred to as fake vocal folds but they are rarely used and hence are not considered in this project. Vocal tract is consisted of oral and nasal cavity, pharyngeal cavity and larynx tube. The human voice is a series of puffs of air separated by (partial) closure of the vocal folds between each puff. Voice is thereby produced by the vibrations of the vocal folds activated by the air pressure from the lungs and is characterised by the shape and the physiology of the vocal folds and larynx.
The main factors affecting the vibrations are
So phonation can be defined as a self-sustaining quasi-periodic oscillation of vocal folds that arises from the interaction of muscular aerodynamic forces in the vocal tract. A single cycle of opening and closing is at 100Hz rate for a male speaker. As such it is too rapid for the human ear to be able to discriminate each individual cycle of oscillations. However, we are sensitive to the change in overall rate of vibration and perceive it as changes in the pitch of the voice. The laryngeal anatomy is illustrated in the figure 2.
The intrinsic muscles of the larynx may be categorised by function: their effect on the shape of the glottis and on the vibratory behaviour of vocal folds. The following are the basic features of laryngeal adjustments to the different phonological settings: a)abduction or adduction of vocal folds, b) constriction of supra-glottal structures-adjustment of length, c) stiffness and thickness of the vocal folds, d) elevation and lowering of larynx. The glottal flow excites the vocal tract resonator. Vocal tract provides two paths for the excitation air flow, through nasal and/or oral cavity. This is controlled by velum. The more open it is, the more nasal the sound. Velum together with other articulators (lips, tongue and jaw) controls the spectral shape of the resonator. It is important to be aware that source excitation and the vocal tract are not linearly independent systems. Although due to the high glottis impedance, it is assumed that the interaction between the two system is negligible vocal tract configuration is to a certain degree correlated with the source excitation signal. Correlates of perceived voice quality: The perceived voice quality is inherently dependent on the following aspects of the speech: Glottal Pulse shape, Pitch, Formants, Formant bandwidths, Formant intensities, Nasal, Voice Settings. Glottal pulse shape: Glottal pulse shape solely is the most important aspect of speech in the way it affects our perception of voice quality. The most common mathematical
representation of derivative of glottal pulse airflow is the LF
(Liljencrants / Fant) Glottal Pulse Model.
LF (Liljencrants / Fant) Glottal Pulse Model
Equations 1& 2, LF model Legend:
Laryngeal anatomy, LF model of glottal airflow and
its derivative in relation to voice quality:
In the following section perceptual voice quality will be linked to
its corresponding laryngeal anatomy settings via the LF model thereby to its
corresponding glottal airflow shape.
A number of various voice types is provided in order to help
exemplify and distinguish different voice types.
The specific characteristics of various phonation types are expressed
in comparison to modal phonation taking over the approach of Zemlin (1988),
Stevens(1994), Trask(1996) and Ni Chasaide and Gobl (1997).
Further classification of pathological voice qualities: A wide range of parameters is used to describe the “voice roughness”, its fluctuations in amplitude and temporal domains. Refer to the following literature: Baken -1987, Koike – 1973, Pinto & Titze –1990, Kitaijma et al.- 1975, Davis – 1978, Gubrynowicz et al – 1980, Titze & Liang – 1993. They investigate pitch perturbation factors over long and short periods of time (pitch jitter), as well as the pitch period length distribution of in relation to the normal distribution, Hays-1988. The autocorrelation function is employed to portray shimmer -RMS fluctuation that is averaged over pitch periods, Davis -1978. Cepstral measurements are also used to portray pathological voice qualities, Gerull et al. - 1992. The level of noise in produced speech has a strong effect on perceptual voice quality as voice hoarseness. Over the years various techniques are developed to measure hoarseness. The long-time averaged spectrum (LTAS) (Frokjaer-Jensen & Prytz – 1976, Gauffin & Sundberg- 1977- 1989) distinguishes between certain types of voices in particular frequency bands. The spectral flatness of the residue signal (Markel & Gray - 1978) reveals voice quality dependence on the spectral noise level. Teager Energy Operator, Gavidia-Ceballos et al., 1996; Cairns et al., 1996. enables comparison of energy levels generated by voicing (harmonic component) and turbulent flow (friction or breath noise). Davis, 1978 used harmonic-to-noise ratio to characterize pathological voices. As a result of high complexity and multidimensionality the existing parameterisation of
pathological voices often fails in practice. The above-mentioned methods are either invasive or lacking robustness complex. They do not provide the description of the Open and Speed Quotients of the glottal waveform that exhibit high level or correlation with various types of voice quality. Therefore, these parameters will not be use in isolation but rather as a complement to the voice quality parameters, described in the previous section in an attempt to structure a set of parameters that provide a more complete description of voice quality.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Designed for screen resolution 1024x768 using HTML 4.0 and CSS level 1.
Any comments, questins or suggestions are very welcome and should be directed to
emir.turajlic@brunel.ac.uk.