MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALING CONDITIONS Ismal Shahn and Naeh Botros 2 Electrcal/Electroncs and Comuter Engneerng Deartment Unversty of Sharjah, P. O. Box 27272, Sharjah, Unted Arab Emrates 2 Deartment of Electrcal and Comuter Engneerng uthern Illnos Unversty at Carbondale, Carbondale, IL 6290-6603, U.S.A. E-mal: smal@sharjah.ac.ae 2 E-mal: botrosn@su.edu ABSTRACT In ths research, we model and analye the vocal tract under normal and stressful talkng condtons. Ths research answers the queston of the degradaton n the recognton erformance of textdeendent seaker dentfcaton under stressful talkng condtons. Ths research can be used (for future research) to mrove the recognton erformance under stressful talkng condtons. I. INTRODUCTION: HUMAN SPEECH PRODUCTION MECHANISM The rocess of generatng seech begns n the lungs. Durng exctaton, muscle contracton forces ar out of the lungs through the vocal cords. When the vocal cords reman oen, the seech roduced s sad to be unvoced and the ntal seech sectrum may be modeled as a whte nose. On the other hand, when the vocal cords are closed durng exhalaton, they begn to vbrate, rovdng an exctaton n the form of a erodc tran of ulses, the seech roduced s sad to be voced seech [, 2]. The sectrum of ether of these exctatons s modfed by the acoustc cavtes formed by the vocal tract. The vocal tract begns at the vocal cords and ends at the ls. The shae of the vocal tract changes contnuously whch causes the seech sound to be contnuously tme varyng [, 2]. References [, 2] have more detals about human seech roducton mechansm. The conventonal dvson of seech sounds s nto consonants and vowels. In a vowel sound, the ar n the vocal tract vbrates at frequences smultaneously. These frequences are called formant frequences of the vocal tract. These formant frequences and ther corresondng bandwdths are functons of the shae of the vocal tract [3]. II. VOCAL TRACT MODEL UNDER NORMAL TALING STYLE Under the normal talkng style (no stress), the vocal tract can be modeled as shown n Fgure a. Ths model can be aroxmated as shown n Fgure b. The vocal tract s dvded nto number of cylndrcal sectons whch s a farly close aroxmaton to ts actual shae. The vocal tract can be reresented by an all-ole transfer functon gven as [, 2]: H() α... α () where, : s a constant gan. : s the th redcton coeffcent whch can be calculated usng the followng
trachea ls Fg. a Vocal tract under normal talkng style A + A A Fg. b Vocal tract aroxmaton under normal talkng style formula f the shae of the vocal tract s known []: α A A A A (2) where, A : s the th vocal tract area functon. A + : s the (+)th vocal tract area functon. The formant frequences of the vocal tract and ther corresondng bandwdths can be calculated usng the followng two equatons resectvely []: θ f s F 2π where, F : s thethformant frequency. θ f s B where, (3) : s the angle (n radans) of thethole. : s thesamlng frequency. ln π f s (4) B : s the bandwdth of the th formant frequency. :s the dstance (from the orgn) of thethole. III. VOCAL TRACT MODEL UNDER LOUD TALING STYLE Under the loud talkng style, the vocal tract can be modeled as shown n Fgure 2a [4-6]. Ths model can be aroxmated as shown n Fgure 2b. Ar exts the glotts lke a jet and attaches to the nearest wall of the vocal tract. A cavty s formed n the vocal tract because the ressure of the ar nsde the vocal tract s ncreased. Vortces of the ar are formed as soon as the ar asses over the cavty. The bulk of the ar contnues roagatng towards the ls whle adherng to the walls of the vocal tract. These vortces roduce sound that overlas wth the orgnal sound [4-6]. The th redcton coeffcent for the loud talkng style can be calculated as: α A A A A (5) The vocal tract transfer functon becomes: H ( ) α... α (6) The locatons of the oles of the transfer functon are changed to a large extent but the oles are stll located nsde the unt crcle. Therefore, the redcton coeffcents under the loud talkng style are dfferent to a large extent from those under the normal talkng style. Consequently, the cestral coeffcents under the loud talkng style are dfferent to a large degree from those under the
normal talkng style. Therefore, the cestral coeffcents under the loud vortex trachea ls Fg. 2a Vocal tract under loud talkng style A + A A Fg. 2b Vocal tract aroxmaton under loud talkng style talkng style are contamnated wth stress comonents. Snce the formant frequences of the vocal tract and ther corresondng bandwdths are functons of the shae of the vocal tract [3], the formant frequences and ther corresondng bandwdths become: θ fs F (7) 2 π ln f s B (8), the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the loud talkng style are changed by a large degree. IV. VOCAL TRACT MODEL UNDER SHOUT TALING STYLE Under the shout talkng style, the ressure of the ar s ncreased by a large extent. Ths ncrease roduces a large cavty whch ncreases the vortces nsde the vocal tract. Increasng the vortces yelds an ncrease n the roducton of sound that overlas wth the orgnal sound [4-6]. The vocal tract transfer functon becomes: Sh H () (9) Sh Sh α... α The locatons of the oles of the transfer functon are changed to a large extent but the oles are stll located nsde the unt crcle. As n the case of the loud talkng style, the redcton coeffcents under the shout talkng style are dfferent to a large extent from those under the normal talkng style. Consequently, the cestral coeffcents under the shout talkng style are dfferent to a large degree from those under the normal talkng style. Therefore, the cestral coeffcents under the shout talkng style are contamnated largely wth stress comonents. It s known that a art of the sound energy s lost wthn the vocal tract due to vscous frcton, heat conducton, and vbraton of the vocal tract wall. Ths energy loss has sgnfcant effects on the vocal tract formant frequences and ther corresondng bandwdths [7]. Snce the formant frequences of the vocal tract and ther corresondng bandwdths are functons of the shae of the vocal tract [3], the formant frequences and ther corresondng bandwdths become:
F Sh Sh θ fs (0) 2 π ln f s B (4) Sh ln f Sh s B (), the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the shout talkng style are changed by a large degree. V. VOCAL TRACT MODEL UNDER SOFT TALING STYLE Under the soft talkng style, the ressure of the ar s decreased by a small extent. The vocal tract transfer functon becomes: H () α... α (2) The locatons of the oles of the transfer functon are changed by a small extent but the oles are stll located nsde the unt crcle. Therefore, the redcton coeffcents under the soft talkng style are dfferent to a slght range from those under the normal talkng style. Consequently, the cestral coeffcents under the soft talkng style are dfferent to a small extent from those under the normal talkng style. Therefore, the contamnaton of the cestral coeffcents under the soft talkng style s small. Snce the formant frequences of the vocal tract and ther corresondng bandwdths are functons of the shae of the vocal tract [3], the formant frequences and ther corresondng bandwdths become: θ fs F (3) 2 π, the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the soft talkng style are changed to a small degree. VI. VOCAL TRACT MODEL UNDER SLOW TALING STYLE Under the slow talkng style, the ressure of the ar s ncreased to a small extent. Ths means that the formaton of the vortces nsde the vocal tract s small. These small vortces roduce a mnor sound that overlas wth the orgnal sound [4-6]. The vocal tract transfer functon becomes: H () α... α (5) The locatons of the oles of the transfer functon under the slow talkng style are close to those under the normal talkng style but the oles are stll located nsde the unt crcle. Therefore, the redcton coeffcents under the slow talkng style are close to those under the normal talkng style. Consequently, the cestral coeffcents under the slow talkng style are close to those under the normal talkng style. Therefore, the contamnaton of the cestral coeffcents under the slow talkng style s mnor. The formant frequences of the vocal tract and ther corresondng bandwdths become: θ fs F (6) 2 π
ln f s B (7), the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the slow talkng style are close to those under the normal talkng style. VII. SPEECH DATA BASE The exerments and tests conducted n ths research are erformed at uthern Illnos Unversty at Carbondale. me talkng styles are desgned to smulate the seech roduced by dfferent seakers under real stressful condtons [8, 9]. The talkng styles are: normal, shout, slow, loud, and soft. In ths research, the data base conssts of nne dfferent seakers (three adult males and sx adult females) utterng the same word nne tmes under each talkng style. VIII. RESULTS An all-ole transfer functon of the vocal tract under any talkng style s gven as: H sty () α sty sty... α (8) The redcton coeffcents (, 2,, ) have been calculated usng Levnson or Durbn recurson method. Table I shows the recognton erformance under normal and stressful talkng condtons usng dynamc tme warng algorthm [0]. Table II shows the recognton erformance under normal and stressful talkng condtons usng hdden Markov model algorthm []. Fgures 3 and 4 show the formant frequences and ther corresondng bandwdths for two seakers only. IX. DISCUSSION AND CONCLUSIONS In ths research, the followng conclusons can be drawn: ) Comarng the frst formant frequences under the shout, slow, loud, and soft talkng styles wth the frst formant frequences under the normal talkng style, our results show that: a. The frst formant frequences are dslaced to a large degree under the loud talkng style. Ths result s n agreement wth the results reorted by Wakta and Schulman [7, 2]. b. The frst formant frequences are dslaced to a large extent under the shout talkng style. Ths result s n agreement wth the results reorted by Wakta and Summers [7, 2, 3]. c. The formant frequences are dslaced to a small degree under the soft and slow talkng styles. 2) The dslacement of the formant frequences degrades the erformance of seaker recognton systems. The hgher the dslacement, the hgher the degradaton of recognton erformance and vce versa. For examle, under the shout talkng style, the dslacement of the formant frequences s hgh whch results n hgh degradaton of recognton erformance. Another examle s that under the slow talkng style, the dslacement of the formant frequences s low whch results n low degradaton of recognton erformance. 3) Our results are n agreement wth the results reorted by Cummngs and Clements [4]. Cummngs and Clements
reorted an extensve nvestgaton of the varatons that occur n the glottal exctaton of eleven commonly encountered seech styles. Ther results showed that the soft and loud talkng styles are drastcally dfferent from all other styles. Ther results also showed that the slow talkng style s rarely confused wth other styles. Our results are n agreement wth ther results under the soft and slow talkng styles snce the recognton erformance under these two styles s better to a larger extent n our research. On the other hand, our results are not n agreement wth ther results under the loud talkng style snce our results show that the recognton erformance under ths style s degraded. 4) The hghest degradaton n the recognton erformance haens under the shout talkng style. It seems that when seech s contamnated under the shout style, the degree of the contamnaton s large. Ths hgh degree of contamnaton s caused by the hgh degree of dslacement of the formant frequences under the shout style. 5) The method of modelng and analyng the vocal tract under normal and stressful talkng condtons that has been used n ths research s constraned by the lmted amount of data under dfferent talkng styles; a comrehensve assessment of the method requres a larger set of test data. REFERENCES [] S. Furu, "Dgtal Seech Processng, Synthess, and Processng." New York: Marcel Dekker, 989. [2] T. W. Parsons, "Voce and Seech Processng." New York: McGraw Hll, 987. [3] F. Fallsde and W. A. Woods, "Comuter Seech Processng." New Jersey: Prentce-Hall, Englewood Clffs, 985. [4] H. M. Teager and S. M. Teager, "The effects of searated ar flow on vocalaton," n Vocal Fold Physology: Contemorary Research and Clncal Issues, edted by D. M. Bless and J. H. Abbs, College Hll, San Dego, 98. [5] H. M. Teager and S. M. Teager, "A henomenologcal model for vowel roducton n the vocal tract," n Seech Scences: Recent Advances, edted by R. G. Danloff, College Hll,. 73-09, San Dego, 983. [6] H. M. Teager and S. M. Teager, "Evdence for nonlnear roducton mechansms n the vocal tract," n Seech Producton and Seech Modelng, NATO Advanced Study Insttute Seres D, Vol. 55,. 24-26, luwer, Boston, 990. [7] H. Wakta, "Estmaton of vocal tract shaes from acoustcal analyss of the seech wave: the state of the art," IEEE Trans., Vol. ASSP-27, No. 3,. 28-285, June 979. [8] Y. Chen, "Cestral doman talker stress comensaton for robust seech recognton," IEEE Trans. on ASSP, Vol. ASSP-36, No. 4,. 433-439, Arl 988. [9] Y. Chen, "Cestral doman talker stress comensaton for robust seech recognton," ICASSP '87,. 77-720, Dallas, Arl 987. [0] I. Shahn and N. Botros, "Seaker dentfcaton usng dynamc tme warng wth stress comensaton technque," IEEE SOUTHEASTCON '98 Proceedngs,. 65-68, Orlando, FL, Arl 998.
[] I. Shahn and N. Botros, "Textdeendent seaker dentfcaton usng hdden Markov model wth stress comensaton technque," IEEE SOUTHEASTCON '98 Proceedngs,. 6-64, Orlando, FL, Arl 998. [2] R. Schulman, "Artculatory dynamcs of loud and normal seech," J. Acoust. c. Am., Vol. 85, No.,. 295-32, January 988. Stokes, "Effects of nose on seech roducton: Acoustc and ercetual analyss," J. Acoust. c. Am., Vol. 84, No. 3,. 97-928, Setember 988. [4]. E. Cummngs and M. A. Clements, "Analyss of the glottal exctaton of emotonally styled and stressed seech," J. Acoust. c. Am., Vol. 98, No.,. 88-98, July 995. [3] W. V. Summers, D. B. Pson, R. H. Bernack, R. I. Pedlow, and M. A. Table I Recognton rate usng dynamc tme warng algorthm Style Normal Shout ow ud ft Recognton Rate 00% 33% 5% 40% 52% Table II Recognton rate usng hdden Markov model algorthm Style Normal Shout ow ud ft Recognton Rate 90% 9% 62% 38% 30%
Amltude Amltude 0.8 0.6 0.4 0.2 normal shout slow loud soft 0 0 400 800 200 600 2000 Frequency (H) Fg. 3 Formant frequences of seaker 0.75 0.5 0.25 normal shout slow loud soft 0 0 500 000 500 2000 2500 Frequency (H) Fg. 4 Formant frequences of seaker 2