MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALKING CONDITIONS

Similar documents
Simple Solution for Designing the Piecewise Linear Scalar Companding Quantizer for Gaussian Source

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

Following a musical performance from a partially specified score.

Error Concealment Aware Rate Shaping for Wireless Video Transport 1

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

Modeling Form for On-line Following of Musical Performances

A Scalable HDD Video Recording Solution Using A Real-time File System

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Product Information. Manual change system HWS

QUICK START GUIDE v0.98

A Quantization-Friendly Separable Convolution for MobileNets

Improving Reliability and Energy Efficiency of Disk Systems via Utilization Control

Product Information. Manual change system HWS

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

AIAA Optimal Sampling Techniques for Zone- Based Probabilistic Fatigue Life Prediction

Optimized PMU placement by combining topological approach and system dynamics aspects

Simple VBR Harmonic Broadcasting (SVHB)

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

A STUDY OF TRUMPET ENVELOPES

Failure Rate Analysis of Power Circuit Breaker in High Voltage Substation

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

SONG STRUCTURE IDENTIFICATION OF JAVANESE GAMELAN MUSIC BASED ON ANALYSIS OF PERIODICITY DISTRIBUTION

The Traffic Image Is Dehazed Based on the Multi Scale Retinex Algorithm and Implementation in FPGA Cui Zhe1, a, Chao Li2, b *, Jiaqi Meng3, c

Quantization of Three-Bit Logic for LDPC Decoding

A Comparative Analysis of Disk Scheduling Policies

Scalable QoS-Aware Disk-Scheduling

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

Craig Webre, Sheriff Personnel Division/Law Enforcement Complex 1300 Lynn Street Thibodaux, Louisiana 70301

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

Critical Path Reduction of Distributed Arithmetic Based FIR Filter

Analysis of Subscription Demand for Pay-TV

Study on the location of building evacuation indicators based on eye tracking

Why Take Notes? Use the Whiteboard Capture System

AN INTERACTIVE APPROACH FOR MULTI-CRITERIA SORTING PROBLEMS

Reduce Distillation Column Cost by Hybrid Particle Swarm and Ant

Detecting Errors in Blood-Gas Measurement by Analysiswith Two Instruments

INSTRUCTION MANUAL FOR THE INSTALLATION, USE AND MAINTENANCE OF THE REGULATOR GENIUS POWER COMBI

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

Conettix D6600/D6100IPv6 Communications Receiver/Gateway Quick Start

arxiv: v1 [cs.cl] 12 Sep 2018

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

Production of Natural Penicillins by Strains of Penicillium chrysogenutn

Novel Quantization Strategies for Linear Prediction with Guarantees

FPGA Implementation of Cellular Automata Based Stream Cipher: YUGAM-128

The Use of the Attack Transient Envelope in Instrument Recognition

Anchor Box Optimization for Object Detection

Discussion Paper Series

current activity shows on the top right corner in green. The steps appear in yellow

Research on the optimization of voice quality of network English teaching system

Technical Information

Richard Barrett. dying words (II) 2013 solo female vocalist with flute. performing score

The Comparison of Selected Audio Features and Classification Techniques in the Task of the Musical Instrument Recognition

Quantitative Evaluation of Violin Solo Performance

SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS. Orion Hodson, Colin Perkins, and Vicky Hardman

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

S Micro--Strip Tool in. S Combination Strip Tool ( ) S Cable Holder Assembly (Used only

Clock Synchronization in Satellite, Terrestrial and IP Set-top Box for Digital Television

IN DESCRIBING the tape transport of

Statistics AGAIN? Descriptives

Modular Plug Connectors (Standard and Small Conductor)

Product Information. Miniature rotary unit ERD

User Manual. AV Router. High quality VGA RGBHV matrix that distributes signals directly. Controlled via computer.

Expressive Musical Timing

Pitch-Synchronous Spectrogram: Principles and Applications

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

User s manual. Digital control relay SVA

Environmental Reviews. Cause-effect analysis for sustainable development policy

JTAG / Boundary Scan. Multidimensional JTAG / Boundary Scan Instrumentation. Get the total Coverage!

Multi-Line Acquisition With Minimum Variance Beamforming in Medical Ultrasound Imaging

3 Part differentiation, 20 parameters, 3 histograms Up to patient results (including histograms) can be stored

T541 Flat Panel Monitor User Guide ENGLISH

An investigation of memory latency reduction using an address prediction buffer

Bachelor s Degree Programme (BDP)

SWS 160. Moment loading. Technical data. M x max Nm M y max Nm. M z max Nm

Patrolling Mechanisms for Disconnected Targets in Wireless Mobile Data Mules Networks

Five Rounds. by Peter Billam. Peter J Billam, 1986

Color Monitor. L200p. English. User s Guide

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg

Two-Dimensional Lithium-Ion Battery Modeling with Electrolyte and Cathode Extensions

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

in Partial For the Degree of

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

User Manual ANALOG/DIGITAL, POSTIONER RECEIVER WITH EMBEDDED VIACCESS AND COMMON INTERFACE

Sealed Circular LC Connector System Plug

Social Interactions and Stigmatized Behavior: Donating Blood Plasma in Rural China

Image Restoration using Multilayer Neural Networks with Minimization of Total Variation Approach

CONNECTIONS GUIDE. To Find Your Hook.up Turn To Page 1

JTAG / Boundary Scan. Multidimensional JTAG / Boundary Scan Instrumentation

DT-500 OPERATION MANUAL MODE D'EMPLOI MANUAL DE MANEJO MANUAL DE OPERA(_._,O. H.-,lri-D PROJECTOR PROJECTEUR PROYECTOR PROJETOR

9! VERY LARGE IN THEIR CONCERNS. AND THEREFORE, UH, i

CASH TRANSFER PROGRAMS WITH INCOME MULTIPLIERS: PROCAMPO IN MEXICO

V (D) i (gm) Except for 56-7,63-8 Flute and Oboe are the same. Orchestration will only list Fl for space purposes

Product Bulletin 40C 40C-10R 40C-20R 40C-114R. Product Description For Solvent, Eco-Solvent, UV and Latex Inkjet and Screen Printing 3-mil vinyl films

Phone-based Plosive Detection

TRADE-OFF ANALYSIS TOOL FOR INTERACTIVE NONLINEAR MULTIOBJECTIVE OPTIMIZATION Petri Eskelinen 1, Kaisa Miettinen 2

Transcription:

MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALING CONDITIONS Ismal Shahn and Naeh Botros 2 Electrcal/Electroncs and Comuter Engneerng Deartment Unversty of Sharjah, P. O. Box 27272, Sharjah, Unted Arab Emrates 2 Deartment of Electrcal and Comuter Engneerng uthern Illnos Unversty at Carbondale, Carbondale, IL 6290-6603, U.S.A. E-mal: smal@sharjah.ac.ae 2 E-mal: botrosn@su.edu ABSTRACT In ths research, we model and analye the vocal tract under normal and stressful talkng condtons. Ths research answers the queston of the degradaton n the recognton erformance of textdeendent seaker dentfcaton under stressful talkng condtons. Ths research can be used (for future research) to mrove the recognton erformance under stressful talkng condtons. I. INTRODUCTION: HUMAN SPEECH PRODUCTION MECHANISM The rocess of generatng seech begns n the lungs. Durng exctaton, muscle contracton forces ar out of the lungs through the vocal cords. When the vocal cords reman oen, the seech roduced s sad to be unvoced and the ntal seech sectrum may be modeled as a whte nose. On the other hand, when the vocal cords are closed durng exhalaton, they begn to vbrate, rovdng an exctaton n the form of a erodc tran of ulses, the seech roduced s sad to be voced seech [, 2]. The sectrum of ether of these exctatons s modfed by the acoustc cavtes formed by the vocal tract. The vocal tract begns at the vocal cords and ends at the ls. The shae of the vocal tract changes contnuously whch causes the seech sound to be contnuously tme varyng [, 2]. References [, 2] have more detals about human seech roducton mechansm. The conventonal dvson of seech sounds s nto consonants and vowels. In a vowel sound, the ar n the vocal tract vbrates at frequences smultaneously. These frequences are called formant frequences of the vocal tract. These formant frequences and ther corresondng bandwdths are functons of the shae of the vocal tract [3]. II. VOCAL TRACT MODEL UNDER NORMAL TALING STYLE Under the normal talkng style (no stress), the vocal tract can be modeled as shown n Fgure a. Ths model can be aroxmated as shown n Fgure b. The vocal tract s dvded nto number of cylndrcal sectons whch s a farly close aroxmaton to ts actual shae. The vocal tract can be reresented by an all-ole transfer functon gven as [, 2]: H() α... α () where, : s a constant gan. : s the th redcton coeffcent whch can be calculated usng the followng

trachea ls Fg. a Vocal tract under normal talkng style A + A A Fg. b Vocal tract aroxmaton under normal talkng style formula f the shae of the vocal tract s known []: α A A A A (2) where, A : s the th vocal tract area functon. A + : s the (+)th vocal tract area functon. The formant frequences of the vocal tract and ther corresondng bandwdths can be calculated usng the followng two equatons resectvely []: θ f s F 2π where, F : s thethformant frequency. θ f s B where, (3) : s the angle (n radans) of thethole. : s thesamlng frequency. ln π f s (4) B : s the bandwdth of the th formant frequency. :s the dstance (from the orgn) of thethole. III. VOCAL TRACT MODEL UNDER LOUD TALING STYLE Under the loud talkng style, the vocal tract can be modeled as shown n Fgure 2a [4-6]. Ths model can be aroxmated as shown n Fgure 2b. Ar exts the glotts lke a jet and attaches to the nearest wall of the vocal tract. A cavty s formed n the vocal tract because the ressure of the ar nsde the vocal tract s ncreased. Vortces of the ar are formed as soon as the ar asses over the cavty. The bulk of the ar contnues roagatng towards the ls whle adherng to the walls of the vocal tract. These vortces roduce sound that overlas wth the orgnal sound [4-6]. The th redcton coeffcent for the loud talkng style can be calculated as: α A A A A (5) The vocal tract transfer functon becomes: H ( ) α... α (6) The locatons of the oles of the transfer functon are changed to a large extent but the oles are stll located nsde the unt crcle. Therefore, the redcton coeffcents under the loud talkng style are dfferent to a large extent from those under the normal talkng style. Consequently, the cestral coeffcents under the loud talkng style are dfferent to a large degree from those under the

normal talkng style. Therefore, the cestral coeffcents under the loud vortex trachea ls Fg. 2a Vocal tract under loud talkng style A + A A Fg. 2b Vocal tract aroxmaton under loud talkng style talkng style are contamnated wth stress comonents. Snce the formant frequences of the vocal tract and ther corresondng bandwdths are functons of the shae of the vocal tract [3], the formant frequences and ther corresondng bandwdths become: θ fs F (7) 2 π ln f s B (8), the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the loud talkng style are changed by a large degree. IV. VOCAL TRACT MODEL UNDER SHOUT TALING STYLE Under the shout talkng style, the ressure of the ar s ncreased by a large extent. Ths ncrease roduces a large cavty whch ncreases the vortces nsde the vocal tract. Increasng the vortces yelds an ncrease n the roducton of sound that overlas wth the orgnal sound [4-6]. The vocal tract transfer functon becomes: Sh H () (9) Sh Sh α... α The locatons of the oles of the transfer functon are changed to a large extent but the oles are stll located nsde the unt crcle. As n the case of the loud talkng style, the redcton coeffcents under the shout talkng style are dfferent to a large extent from those under the normal talkng style. Consequently, the cestral coeffcents under the shout talkng style are dfferent to a large degree from those under the normal talkng style. Therefore, the cestral coeffcents under the shout talkng style are contamnated largely wth stress comonents. It s known that a art of the sound energy s lost wthn the vocal tract due to vscous frcton, heat conducton, and vbraton of the vocal tract wall. Ths energy loss has sgnfcant effects on the vocal tract formant frequences and ther corresondng bandwdths [7]. Snce the formant frequences of the vocal tract and ther corresondng bandwdths are functons of the shae of the vocal tract [3], the formant frequences and ther corresondng bandwdths become:

F Sh Sh θ fs (0) 2 π ln f s B (4) Sh ln f Sh s B (), the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the shout talkng style are changed by a large degree. V. VOCAL TRACT MODEL UNDER SOFT TALING STYLE Under the soft talkng style, the ressure of the ar s decreased by a small extent. The vocal tract transfer functon becomes: H () α... α (2) The locatons of the oles of the transfer functon are changed by a small extent but the oles are stll located nsde the unt crcle. Therefore, the redcton coeffcents under the soft talkng style are dfferent to a slght range from those under the normal talkng style. Consequently, the cestral coeffcents under the soft talkng style are dfferent to a small extent from those under the normal talkng style. Therefore, the contamnaton of the cestral coeffcents under the soft talkng style s small. Snce the formant frequences of the vocal tract and ther corresondng bandwdths are functons of the shae of the vocal tract [3], the formant frequences and ther corresondng bandwdths become: θ fs F (3) 2 π, the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the soft talkng style are changed to a small degree. VI. VOCAL TRACT MODEL UNDER SLOW TALING STYLE Under the slow talkng style, the ressure of the ar s ncreased to a small extent. Ths means that the formaton of the vortces nsde the vocal tract s small. These small vortces roduce a mnor sound that overlas wth the orgnal sound [4-6]. The vocal tract transfer functon becomes: H () α... α (5) The locatons of the oles of the transfer functon under the slow talkng style are close to those under the normal talkng style but the oles are stll located nsde the unt crcle. Therefore, the redcton coeffcents under the slow talkng style are close to those under the normal talkng style. Consequently, the cestral coeffcents under the slow talkng style are close to those under the normal talkng style. Therefore, the contamnaton of the cestral coeffcents under the slow talkng style s mnor. The formant frequences of the vocal tract and ther corresondng bandwdths become: θ fs F (6) 2 π

ln f s B (7), the dslacement of the formant frequences of the vocal tract and ther corresondng bandwdths under the slow talkng style are close to those under the normal talkng style. VII. SPEECH DATA BASE The exerments and tests conducted n ths research are erformed at uthern Illnos Unversty at Carbondale. me talkng styles are desgned to smulate the seech roduced by dfferent seakers under real stressful condtons [8, 9]. The talkng styles are: normal, shout, slow, loud, and soft. In ths research, the data base conssts of nne dfferent seakers (three adult males and sx adult females) utterng the same word nne tmes under each talkng style. VIII. RESULTS An all-ole transfer functon of the vocal tract under any talkng style s gven as: H sty () α sty sty... α (8) The redcton coeffcents (, 2,, ) have been calculated usng Levnson or Durbn recurson method. Table I shows the recognton erformance under normal and stressful talkng condtons usng dynamc tme warng algorthm [0]. Table II shows the recognton erformance under normal and stressful talkng condtons usng hdden Markov model algorthm []. Fgures 3 and 4 show the formant frequences and ther corresondng bandwdths for two seakers only. IX. DISCUSSION AND CONCLUSIONS In ths research, the followng conclusons can be drawn: ) Comarng the frst formant frequences under the shout, slow, loud, and soft talkng styles wth the frst formant frequences under the normal talkng style, our results show that: a. The frst formant frequences are dslaced to a large degree under the loud talkng style. Ths result s n agreement wth the results reorted by Wakta and Schulman [7, 2]. b. The frst formant frequences are dslaced to a large extent under the shout talkng style. Ths result s n agreement wth the results reorted by Wakta and Summers [7, 2, 3]. c. The formant frequences are dslaced to a small degree under the soft and slow talkng styles. 2) The dslacement of the formant frequences degrades the erformance of seaker recognton systems. The hgher the dslacement, the hgher the degradaton of recognton erformance and vce versa. For examle, under the shout talkng style, the dslacement of the formant frequences s hgh whch results n hgh degradaton of recognton erformance. Another examle s that under the slow talkng style, the dslacement of the formant frequences s low whch results n low degradaton of recognton erformance. 3) Our results are n agreement wth the results reorted by Cummngs and Clements [4]. Cummngs and Clements

reorted an extensve nvestgaton of the varatons that occur n the glottal exctaton of eleven commonly encountered seech styles. Ther results showed that the soft and loud talkng styles are drastcally dfferent from all other styles. Ther results also showed that the slow talkng style s rarely confused wth other styles. Our results are n agreement wth ther results under the soft and slow talkng styles snce the recognton erformance under these two styles s better to a larger extent n our research. On the other hand, our results are not n agreement wth ther results under the loud talkng style snce our results show that the recognton erformance under ths style s degraded. 4) The hghest degradaton n the recognton erformance haens under the shout talkng style. It seems that when seech s contamnated under the shout style, the degree of the contamnaton s large. Ths hgh degree of contamnaton s caused by the hgh degree of dslacement of the formant frequences under the shout style. 5) The method of modelng and analyng the vocal tract under normal and stressful talkng condtons that has been used n ths research s constraned by the lmted amount of data under dfferent talkng styles; a comrehensve assessment of the method requres a larger set of test data. REFERENCES [] S. Furu, "Dgtal Seech Processng, Synthess, and Processng." New York: Marcel Dekker, 989. [2] T. W. Parsons, "Voce and Seech Processng." New York: McGraw Hll, 987. [3] F. Fallsde and W. A. Woods, "Comuter Seech Processng." New Jersey: Prentce-Hall, Englewood Clffs, 985. [4] H. M. Teager and S. M. Teager, "The effects of searated ar flow on vocalaton," n Vocal Fold Physology: Contemorary Research and Clncal Issues, edted by D. M. Bless and J. H. Abbs, College Hll, San Dego, 98. [5] H. M. Teager and S. M. Teager, "A henomenologcal model for vowel roducton n the vocal tract," n Seech Scences: Recent Advances, edted by R. G. Danloff, College Hll,. 73-09, San Dego, 983. [6] H. M. Teager and S. M. Teager, "Evdence for nonlnear roducton mechansms n the vocal tract," n Seech Producton and Seech Modelng, NATO Advanced Study Insttute Seres D, Vol. 55,. 24-26, luwer, Boston, 990. [7] H. Wakta, "Estmaton of vocal tract shaes from acoustcal analyss of the seech wave: the state of the art," IEEE Trans., Vol. ASSP-27, No. 3,. 28-285, June 979. [8] Y. Chen, "Cestral doman talker stress comensaton for robust seech recognton," IEEE Trans. on ASSP, Vol. ASSP-36, No. 4,. 433-439, Arl 988. [9] Y. Chen, "Cestral doman talker stress comensaton for robust seech recognton," ICASSP '87,. 77-720, Dallas, Arl 987. [0] I. Shahn and N. Botros, "Seaker dentfcaton usng dynamc tme warng wth stress comensaton technque," IEEE SOUTHEASTCON '98 Proceedngs,. 65-68, Orlando, FL, Arl 998.

[] I. Shahn and N. Botros, "Textdeendent seaker dentfcaton usng hdden Markov model wth stress comensaton technque," IEEE SOUTHEASTCON '98 Proceedngs,. 6-64, Orlando, FL, Arl 998. [2] R. Schulman, "Artculatory dynamcs of loud and normal seech," J. Acoust. c. Am., Vol. 85, No.,. 295-32, January 988. Stokes, "Effects of nose on seech roducton: Acoustc and ercetual analyss," J. Acoust. c. Am., Vol. 84, No. 3,. 97-928, Setember 988. [4]. E. Cummngs and M. A. Clements, "Analyss of the glottal exctaton of emotonally styled and stressed seech," J. Acoust. c. Am., Vol. 98, No.,. 88-98, July 995. [3] W. V. Summers, D. B. Pson, R. H. Bernack, R. I. Pedlow, and M. A. Table I Recognton rate usng dynamc tme warng algorthm Style Normal Shout ow ud ft Recognton Rate 00% 33% 5% 40% 52% Table II Recognton rate usng hdden Markov model algorthm Style Normal Shout ow ud ft Recognton Rate 90% 9% 62% 38% 30%

Amltude Amltude 0.8 0.6 0.4 0.2 normal shout slow loud soft 0 0 400 800 200 600 2000 Frequency (H) Fg. 3 Formant frequences of seaker 0.75 0.5 0.25 normal shout slow loud soft 0 0 500 000 500 2000 2500 Frequency (H) Fg. 4 Formant frequences of seaker 2