Modelling

of voice quality correlates

 
Introduction toVoice Quality

 

Definition: Perceived characteristic ‘acoustic coloring’ of voice, derived from a variety of laryngeal and supralaryngeal features that are not unique to one individual but form clusters of identifiable voice types.

 

Modal Voice, Breathy Voice, Pressed Voice, Creaky Voice, Tense Voice, Harsh Voice, Nasal Voice are all examples of different voice types.

 

Voice quality is an effect of vocal tract anatomy, laryngeal anatomy and vocal habits.

This is illustrated in the following examples.  The characteristic sound of the voice is brought about by the mode of vibration of the vocal cords and folds.

 

Differences in the degree and manner of glottal closure distinguish modal voice, breathy voice and whispery voice.  The quality of the voice depends on the degree of tension in the larynx and pharynx, and on the vertical displacement of the larynx: a raised larynx produces a thin tense voice, and a lowered larynx a booming voice.

 

Perceptual importance: In English, apart from distinguishing voiced and voiceless sounds, voice quality does not make linguistic contrasts, but conveys information about the speaker.  In some languages such as Gujerati and Mazatec differences in voice quality or pitch trajectory are used to convey linguistic meanings.  Languages and dialects have characteristic voice qualities; personal voice quality enables a listener to recognize a particular individual.  Furthermore, the quality of someone's voice also conveys emotions and attitudes.

 

Anatomy of human voice: In order to understand different voice types it is beneficial to link them to the voice producing anatomy. For this purpose I will refer to myoelastic aerodynamic theory of vocal fold vibrations.  Figure 1 shows a model of human voice production system.  It is consisted of the sub-glottal area (containing diaphragm, trachea and lungs), larynx (containing vocal folds) and supra-glottal area (vocal tract).  The vocal folds are made of muscles tendon and mucous, and are of variable mass and elasticity.  In-between the fold, the glottis of variable geometric properties controls the airflow towards the vocal tract.  For the informative purpose only: Above the vocal folds there is a pair of what is commonly referred to as fake vocal folds but they are rarely used and hence are not considered in this project.  Vocal tract is consisted of oral and nasal cavity, pharyngeal cavity and larynx tube. The human voice is a series of puffs of air separated by (partial) closure of the vocal folds between each puff.  Voice is thereby produced by the vibrations of the vocal folds activated by the air pressure from the lungs and is characterised by the shape and the physiology of the vocal folds and larynx.


During the phonation cycle air pressure from the lungs builds up under closed vocal folds.  Built up air pressure forces vocal folds to open and release air.  Vocal folds close due to their elasticity and a sudden drop in pressure.  The air pressure builds up and the cycle is repeated.

The main factors affecting the vibrations are

  • Pressure and airflow: The respiratory system
  • Active muscle contraction and position of arytenoid cartilages
  • Elastic properties of vocal folds, mass length and elasticity

 

So phonation can be defined as a self-sustaining quasi-periodic oscillation of vocal folds that arises from the interaction of muscular aerodynamic forces in the vocal tract. A single cycle of opening and closing is at 100Hz rate for a male speaker.  As such it is too rapid for the human ear to be able to discriminate each individual cycle of oscillations.  However, we are sensitive to the change in overall rate of vibration and perceive it as changes in the pitch of the voice.

 

The laryngeal anatomy is illustrated in the figure 2.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The intrinsic muscles of the larynx may be categorised by function: their effect on the shape of the glottis and on the vibratory behaviour of vocal folds.  The following are the basic features of laryngeal adjustments to the different phonological settings: a)abduction or adduction of vocal folds, b) constriction of supra-glottal structures-adjustment of length, c) stiffness and thickness of the vocal folds, d) elevation and lowering of larynx.  The glottal flow excites the vocal tract resonator.  Vocal tract provides two paths for the excitation air flow, through nasal and/or oral cavity.  This is controlled by velum.  The more open it is, the more nasal the sound.  Velum together with other articulators (lips, tongue and jaw) controls the spectral shape of the resonator.   It is important to be aware that source excitation and the vocal tract are not linearly independent systems.  Although due to the high glottis impedance, it is assumed that the interaction between the two system is negligible vocal tract configuration is to a certain degree correlated with the source excitation signal.

 

Correlates of perceived voice quality: The perceived voice quality is inherently dependent on the following aspects of the speech:  Glottal Pulse shape, Pitch, Formants, Formant bandwidths, Formant intensities, Nasal, Voice Settings.

 

 

 

Glottal pulse shape:  Glottal pulse shape solely is the most important aspect of speech in the way it affects our perception of voice quality.

 

The most common mathematical representation of derivative of glottal pulse airflow is the LF (Liljencrants / Fant) Glottal Pulse Model.

LF (Liljencrants / Fant) Glottal Pulse Model

 

1

 
The shows a typical shape of glottal flow pulse and its derivative as represented by the LF model.

 


 

 

 

 

 

Equations 1& 2,       LF model

 

 

 

 

Legend:

 

  • To        Pitch period
  • t1          Instant of vocal fold separation and onset of airflow
  • t2          Instant of maximum glottal flow of AV amplitude
  • t3          Instant of onset of glottal closure and maximum change of glottal flow
  • t4          Instant of complete glottal closure – no airflow through glottis

 

Laryngeal anatomy, LF model of glottal airflow and its derivative in relation to voice quality:  In the following section perceptual voice quality will be linked to its corresponding laryngeal anatomy settings via the LF model thereby to its corresponding glottal airflow shape.   A number of various voice types is provided in order to help exemplify and distinguish different voice types.  The specific characteristics of various phonation types are expressed in comparison to modal phonation taking over the approach of Zemlin (1988), Stevens(1994), Trask(1996) and Ni Chasaide and Gobl (1997).

 

 

Modal Voice

Breathy voice

 
Glottal flow
U(t)
Glottal derivative

dU(t)/dt

Glottal flow
U(t)
Glottal derivative

dU(t)/dt

  • The open Quotient (OQ) is 20%-50%
  • All muscular adjustments are at moderate level
  • The folds open and close in a triangular shape

Flow rate at 100-300cc/s

  • The open quotient (OQ) is high
  • The fall phase is longer than in modal voice
  • The flow cut is more gradual
  • Glottal pulse is more symmetrical
  • The peak glottal flow is high
  • Lower pitch

 

wave file breathy                           

 

wave file breathy                           

 

 

 

Whisper Voice

Creaky Voice

Glottal flow
U(t)
Glottal derivative

dU(t)/dt

Glottal flow
U(t)
Glottal derivative

dU(t)/dt

 

  • The open Quotient is high but lower than for breathy voice
  • The fall phase is longer than in modal voice
  • The flow cut is more gradual
  • Glottal pulse is more symmetrical compared to modal type, but more skewed than for breathy voice

The peak glottal flow is high but lower than  for breathy voice

 

  • The open Quotient (OQ) is low
  • Impulses have a relatively short rise time and very low pitch
  • Very high adductive tension
  • Very high adductive tension
  • Low glottal flow 12-20 cc/s
  • Arytenoid cartilages held tightly together-vocal cords vibrate at one end

 

wave file Whisper                         

wave file Creaky                          

 

 

 

Harsh Voice

Falsetto Voice

 
Glottal flow
U(t)
Glottal derivative

dU(t)/dt

 
Glottal flow
U(t)
 
Glottal derivative

dU(t)/dt

 

 

 

  • The open Quotient (OQ) is low
  • The pitch is irregular
  • The pitch is above 100Hz

Some breath noise may be present

 

  • The open quotient is high
  • The fall phase is longer than in modal voice
  • The flow cut is more gradual
  • Glottal pulse is more symmetrical
  • The peak glottal flow is low
  • Very high pitch
  • Often the glottis is slightly open giving rise to turbulent flow

 

wave file example                         

wave file example                         

 

 

 

Further classification of pathological voice qualities: A wide range of parameters is used to describe the “voice roughness”, its fluctuations in amplitude and temporal domains. Refer to the following literature: Baken -1987, Koike – 1973, Pinto & Titze –1990, Kitaijma et al.- 1975, Davis – 1978, Gubrynowicz et al – 1980, Titze & Liang – 1993.

 They investigate pitch perturbation factors over long and short periods of time (pitch jitter), as well as the pitch period length distribution of in relation to the normal distribution, Hays-1988. The autocorrelation function is employed to portray shimmer -RMS fluctuation that is averaged over pitch periods, Davis -1978. Cepstral measurements are also used to portray pathological voice qualities, Gerull et al. - 1992.

The level of noise in produced speech has a strong effect on perceptual voice quality as voice hoarseness.  Over the years various techniques are developed to measure hoarseness.  The long-time averaged spectrum (LTAS) (Frokjaer-Jensen & Prytz – 1976, Gauffin & Sundberg- 1977- 1989)  distinguishes between certain types of voices in particular frequency bands. The spectral flatness of the residue signal (Markel & Gray - 1978) reveals voice quality dependence on the spectral noise level. Teager Energy Operator, Gavidia-Ceballos et al., 1996; Cairns et al., 1996. enables comparison of energy levels generated by voicing (harmonic component) and turbulent flow (friction or breath noise).  Davis, 1978 used harmonic-to-noise ratio to characterize pathological voices.

As a result of high complexity and multidimensionality the existing parameterisation of

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pathological voices often fails in practice.  The above-mentioned methods are either invasive or lacking robustness complex. They do not provide the description of the Open and Speed Quotients of the glottal waveform that exhibit high level or correlation with various types of voice quality.  Therefore, these parameters will not be use in isolation but rather as a complement to the voice quality parameters, described in the previous section in an attempt to structure a set of parameters that provide a more complete description of voice quality. 

 

Designed for screen resolution 1024x768 using HTML 4.0 and CSS level 1.
Any comments, questins or suggestions are very welcome and should be directed to emir.turajlic@brunel.ac.uk.