Speech Recognition for Controlling Movement of the Wheelchair

Similar documents
Facial Expression Recognition Method Based on Stacked Denoising Autoencoders and Feature Reduction

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

Estimating PSNR in High Definition H.264/AVC Video Sequences Using Artificial Neural Networks

AN IMPROVED VARIABLE STEP-SIZE AFFINE PROJECTION SIGN ALGORITHM FOR ECHO CANCELLATION * Jianming Liu and Steven L Grant 1

Mathematical Model of the Pharmacokinetic Behavior of Orally Administered Erythromycin to Healthy Adult Male Volunteers

Polychrome Devices Reference Manual

A Proposal for the LDPC Decoder Architecture for DVB-S2

Chapter 7 Registers and Register Transfers

Math of Projections:Overview. Perspective Viewing. Perspective Projections. Perspective Projections. Math of perspective projection

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

Digital Signal Processing, Fall E-Study -

Elizabeth H. Phillips-Hershey and Barbara Kanagy Mitchell

Motivation. Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

Organic Macromolecules and the Genetic Code A cell is mostly water.

References and quotations

Image Intensifier Reference Manual

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

For children aged 5 7

Implementation of Expressive Performance Rules on the WF-4RIII by modeling a professional flutist performance using NN

MODELLING PERCEPTION OF SPEED IN MUSIC AUDIO

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

A Novel Method for Music Retrieval using Chord Progression

STx. Compact HD/SD COFDM Transmitter. Features. Options. Accessories. Applications

NCH Software VideoPad Video Editor

Line numbering and synchronization in digital HDTV systems

The Blizzard Challenge 2014

Australian Journal of Basic and Applied Sciences

Fig. 1. Fig. 3. Ordering data. Fig. Mounting

Recognition of Human Speech using q-bernstein Polynomials

Forces: Calculating Them, and Using Them Shobhana Narasimhan JNCASR, Bangalore, India

Music Scope Headphones: Natural User Interface for Selection of Music

PROJECTOR SFX SUFA-X. Properties. Specifications. Application. Tel

Using a Computer Screen as a Whiteboard while Recording the Lecture as a Sound Movie

Quality improvement in measurement channel including of ADC under operation conditions

Audio Professional LPR 35

An Investigation of Acoustic Features for Singing Voice Conversion based on Perceptual Age

2 Specialty Application Photoelectric Sensors

LONGITUDINAL AND TRANSVERSE PHASE SPACE CHARACTERIZATION

BesTrans AOC (Active Optical Cable) Spec and Manual

Reliable Transmission Control Scheme Based on FEC Sensing and Adaptive MIMO for Mobile Internet of Things

Internet supported Analysis of MPEG Compressed Newsfeeds

To Bean or not to bean! by Uwe Rosenberg, with illustrations by Björn Pertoft Players: 2 7 Ages: 10 and up Duration: approx.

The Communication Method of Distance Education System and Sound Control Characteristics

Emotional Intelligence:

Analyzing the influence of pitch quantization and note segmentation on singing voice alignment in the context of audio-based Query-by-Humming

T-25e, T-39 & T-66. G657 fibres and how to splice them. TA036DO th June 2011

Image Enhancement in the JPEG Domain for People with Vision Impairment

,..,,.,. - z : i,; ;I.,i,,?-.. _.m,vi LJ

Drum Transcription in the presence of pitched instruments using Prior Subspace Analysis

2 Specialty Application Photoelectric Sensors

PIANO SYLLABUS SPECIFICATION. Also suitable for Keyboards Edition

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

SMARTEYE ColorWise TM. Specialty Application Photoelectric Sensors. True Color Sensor 2-65

ABSTRACT. woodwind multiphonics. Each section is based on a single multiphonic or a combination thereof distributed across the wind

Quantifying Domestic Movie Revenues Using Online Resources in China

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

Available online at ScienceDirect. Procedia Computer Science 73 (2015 ) 48 55

Obbi Silver Luxo. external automations for swing gates

TRAINING & QUALIFICATION PROSPECTUS

NexLine AD Power Line Adaptor INSTALLATION AND OPERATION MANUAL. Westinghouse Security Electronics an ISO 9001 certified company

Manual Industrial air curtain

RHYTHM TRANSCRIPTION OF POLYPHONIC MIDI PERFORMANCES BASED ON A MERGED-OUTPUT HMM FOR MULTIPLE VOICES

EDT/Collect for DigitalMicrograph

NewBlot PVDF 5X Stripping Buffer

Background Manuscript Music Data Results... sort of Acknowledgments. Suite, Suite Phylogenetics. Michael Charleston and Zoltán Szabó

ROUNDNESS EVALUATION BY GENETIC ALGORITHMS

THE Internet of Things (IoT) is likely to be incorporated

ttco.com

GRAFIK Systems OMX-CCO-8 Control Interfaces. Output Status LED (typical of 8) Manual Override Buttons (typical of 8) Control Link Options Switches

Image Generation in Microprocessor-based System with Simultaneous Video Memory Read/Write Access

A World of Stories. Chatterbooks Activity Pack

Practice Guide Sonata in F Minor, Op. 2, No. 1, I. Allegro Ludwig van Beethoven

Working with PlasmaWipe Effects

LAN CABLETESTER INSTRUCTION MANUAL I. INTRODUCTION

Research on the Classification Algorithms for the Classical Poetry Artistic Conception based on Feature Clustering Methodology. Jin-feng LIANG 1, a

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Canon Canada Builds Its New LEED Gold Certified Canadian Headquarters in Partnership with Applied Electronics

American English in Mind

GMM-based Synchronization rules for HMM-based Audio-Visual laughter synthesis

Remarks on The Logistic Lattice in Random Number Generation. Neal R. Wagner

Comparative Study of Different Techniques for License Plate Recognition

Geometric Path Planning for Automatic Parallel Parking in Tiny Spots

Multi-TS Streaming Software

The Selfish Giant for flute and storyteller

LDPC-PAM12 PHY proposal for 10GBase-T. P802.3an July 04 Jose Tellado, Teranetics Katsutoshi Seki, NEC Electronics

The new, parametrised VS Model for Determining the Quality of Video Streams in the Video-telephony Service

DIGITAL DISPLAY SOLUTION REAL ESTATE POINTS OF SALE (POS)

Randomness Analysis of Pseudorandom Bit Sequences

The Tight Bound for the Number of Pilots in Channel Estimation for OFDM Systems

Perspectives AUTOMATION. As the valve turns By Jim Garrison. The Opportunity to make Misteaks By Doug Aldrich, Ph.D., CFM

CRAYON. The crayons for the digital generation

Movies are great! Within a passage, words or phrases can give clues to the meaning of other words. This

Analysis and Detection of Historical Period in Symbolic Music Data

MultiTest Modules. EXFO FTB-3923 Specs Provided by FTB-3920 and FTB-1400

RELIABILITY EVALUATION OF REPAIRABLE COMPLEX SYSTEMS AN ANALYZING FAILURE DATA

VOCALS SYLLABUS SPECIFICATION Edition

ProductCatalog

Operation Guide 5200

Read Only Memory (ROM)

What Does it Take to Build a Complete Test Flow for 3-D IC?

Transcription:

Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia Speech Recogitio for Cotroig Moveet of the Wheechair Thiag Eectrica Egieerig Departet, Petra Christia Uiversity Siwaakerto 2-3, Surabaya, Idoesia Eai: thiag@petra.ac.id, phoe: +62-3-29835 Abstract A otorized wheechair usuay uses a joystick as the iput iterface for cotroig oveet of the wheechair. Istead of the joystick, speech siga of soe words are used to cotro oveet of the wheechair. I order to achieve that ai, a speech recogitio syste has bee ipeeted to recogize the word ad the cotro oveet of the wheechair accordig to recogized word. The ethod used to recogize the speech siga is Liear Predictive Codig (LPC cobied with Eucidea Squared Distace. LPC is used as the feature etractio ethod ad Eucidea squared Distace is used as the recogitio ethod. This approach works o tie doai. The wheechair oveet is actuated by usig two DC otors. Both DC otors are cotroed by usig icrocotroers. A sipe ope oop cotro syste is ipeeted to cotro the speed of DC otor. Eperiets were doe to aayze perforace of the desiged syste. The eperiets were doe usig sape, 3 sapes, ad 5 sapes of traiig data per word. Eperieta resuts show that the highest average recogitio rate that ca be achieved was 78.57%. forward, backward, eft, right, up, ad dow. I order to achieve that ai, a speech recogitio syste has bee ipeeted to recogize the word ad the cotro oveet of the wheechair accordig to recogized word. The speech recogitio is ipeeted o a icrocotroer, which aso cotros the wheechair. The approach used to recogize the word i the speech siga is iear predictive codig (LPC ad eucidea squared distace. Two DC otors are used to actuate the oveet of wheechair. For ore detai, et chapter wi describe the echais of wheechair. Chapter 3 wi describe about iear predictive codig ad Eucidea squared distace. Chapter 4 wi epai about the ipeetatio of speech recogitio usig LPC ad eucidea squared distace. Eperieta resut wi be showed i the chapter 5 ad the ast, chapter 6, gives the discussio ad cocusio. II. MECHANISM OF WHEELCHAIR The wheechair was buit usig rectage hoow stee. Diesio of the wheechair is 60 c 78 c 0 c. The foowig figures show echais of the wheechair. Keywords speech recogitio, iear predictive codig, eucidea squared distace, wheechair G I. INTRODUCTION eeray, a otorized wheechair usuay uses a joystick as the iput iterface for cotroig oveet of the wheechair. If we wat to ove forward or tur eft or tur right the wheechair, we ca do it by ovig the joystick to the sae directio. I this project, we tried to substitute the joystick with aother iput iterface. Istead of the joystick, speech siga of soe words are used to cotro oveet of the wheechair. There are seve words used to cotro oveet of the wheechair. Those words are stop, Fig. Mechais of Wheechair (frot view

Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia have to estiate the LPC coefficiets fro a short seget of the speech siga occurrig aroud tie. This probe ca be soved by usig LPC processor, which is show i the figure 3. Fig 2. Mechais of Wheechair (side view The wheechair has four whees, which cosist of two rear whees ad two frot whees. The two frot whees are pivot whee which are set free. There is o actuator which drives both frot whees. Thus, the frot whees ca ove freey i rotatio ad straight directio. Therefore, oveet of the wheechair ca oy be perfored by drivig the rear whees. Diaeter of the frot whee is 0 c ad diaeter of the rear whee is 22 c. Two DC otor are used as the actuator of the wheechair. Oe DC otor drives oe rear whee. Specificatios of the DC otors are 20 V, 2 A, ad 200 rp. The DC otor that is used to actuate rear whee, has its ow gearbo to reduce the speed of the otor i the ratio of :5. The, speed of the otors is reduced agai by usig gears ad chai syste with ratio :5. Thus, speed of the otor is totay reduced with ratio :75. O the cotrary, the torque of the otor totay icreases with ratio :75. Because the aiu gear bo output speed of the DC otor is 200 rp, the aiu iear speed of the wheechair ca be cacuated as: 200 Liear Speed π 0.22 0.46 / s.66 k / hr 5 60 III. LINEAR PREDICTIVE CODING AND EUCLIDEAN SQUARED DISTANCE The basic idea behid the LPC ode is that a give speech sape at tie, s(, ca be approiated as a iear cobiatio of the past p speech sapes, such that s a s( + a s( 2 + + a ps( ( ( 2 p where the coefficiets a, a 2,.., a p are LPC coefficiets ad p is LPC order. The probe of the LPC aaysis is to deterie the set of LPC coefficiets directy fro the speech siga. Sice the spectra characteristics of speech vary over tie, we Fig 3. Bock Diagra of LPC Processor The basic steps i the processig of LPC processor icude the foowig:. Preephasis The digitized speech siga, s(, is put through a ow order digita syste, to spectray fatte the siga ad to ake it ess susceptibe to fiite precisio effects ater i the siga processig. The output of the preephasizer etwork, ~ s (, is reated to the iput to the etwork, s (, by differece equatio: ~ s ( s( as ~ ( (2 The ost coo vaue for a ~ is aroud 0.95. 2. Frae Bockig The output of preephasis step, ~ s (, is bocked ito fraes of N sapes, with adjacet fraes beig separated by M sapes. If ( is the th frae of speech, ad there are L fraes withi etire speech siga, the ~ ( s ( M + 0,,,N 0,,,L (3 3. Widowig After frae bockig, the et step is to widow each idividua frae so as to iiize the siga discotiuities at the begiig ad ed of each frae. If we defie the widow as w(, 0 N, the the resut of widowig is the siga: ~ ( ( w( 0 N (4 Typica widow is the Haig widow, which has the for 2π w( 0.54 0.46 cos 0 N (5 N

Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia 4. Autocorreatio Aaysis The et step is to auto correate each frae of widowed siga i order to give r ( 0,,,p (6 where the highest autocorreatio vaue, p, is the order of the LPC aaysis. 5. LPC Aaysis The et processig step is the LPC aaysis, which coverts each frae of p + autocorreatios ito LPC paraeter set by usig Durbi s ethod. This ca foray be give as the foowig agorith: (0 E r(0 (7 i i r( i α j j i i k r( i j E i p (8 ( i α i ki (9 ( i ( i ( i α α k α j i- (0 j j i i j ( i 2 i E ( ki E ( By sovig the equatio 7 to recursivey for i,2,,p, the LPC coefficiet, a, is give as a N ~ 0 ( p ( ~ ( + α (2 ( p q + ( p q 2 ED 2 (6 y y Geera equatio of Eucidea distace betwee two poits, P ( p, p 2,..., p ad Q ( q, q 2,..., q, i Eucidea -space is defied as: ED ( p i q i i 2 (7 The Eucidea Squared Distace uses the sae pricipe as the Eucidea distace. The differet is the Eucidea Squared Distace does ot take the square root. The equatio of Eucidea squared distace is defied as: ESD ( p i q i i 2 (8 IV. IMPLEMENTATION OF SPEECH RECOGNITION FOR CONTROLLING WHEELCHAIR Fig 4. Bock Diagra of Traiig Syste 6. LPC Paraeter Coversio to Cepstra Coefficiets LPC cepstra coefficiets, is a very iportat LPC paraeter set, which ca be derived directy fro the LPC coefficiet set. The recursio used is k c a + ck a k k p (3 k c ck a k k p > p (4 Defiitio of Eucidea distace or Eucidea etric is the ordiary distace betwee the two poits that oe woud easure with a ruer, which ca be prove by repeated appicatio of the Pythagorea theore [4]. For two oe-diesioa poits, P ( p ad Q ( q, the Eucidea distace is cacuated by usig the foowig equatio: ( 2 ED p q (5 For two two-diesioa poits, P ( p, p y ad Q ( q,, the Eucidea distace ca be cacuated by q y usig the foowig equatio: V. EXPERIMENTAL RESULT Fig 5. Bock diagra of Recogizer Syste The approach of speech recogitio ipeeted o this syste is Liear Predictive Codig (LPC, which is cobied with Eucidea Squared Distace ethod. LPC is used as the feature etractio ethod ad Eucidea Squared Distace is used as the recogitio ethod. Bock

Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia diagra of LPC ad Eucidea Distace traiig ad recogizer syste are show at figure 4 ad figure 5 respectivey. I the traiig syste, traiig data are saped directy fro icrophoe. The, each traiig sape is processed usig LPC processor agorith (equatio 2 up to equatio 4 ad the resut of this process is a set of cepstra coefficiets. These cepstra coefficiets are used as the referece ode. Data sapig is doe at rate of 8 khz ad data is recorded i 0.5 secods tie duratio. A sipe agorith was ipeeted to detect the eistece of the speech siga. The syste reads four cosecutive sapig data ad the cacuates the average of those four data. If the average vaue is ess tha a iit vaue, it eas there is o speech siga. If the average vaue is greater tha or equa to that iit vaue, it eas there is a speech siga ad the the icrocotroer wi start to read ad record the siga i 0.5 secods. The iit vaue is defied by tria ad error eperiet. I the recogizer syste, firsty, a ukow speech siga wi be processed by usig the LPC processor too. The resut of this process is cepstra coefficiets of the ukow speech siga. The, cacuatio of Eucidea Squared Distace betwee cepstra coefficiets of the ukow speech siga ad cepstra coefficiets of the referece ode is perfored. Cacuatio of Eucidea Squared Distace is doe for each referece ode by usig equatio 8. The referece ode which has the iiu distace to the ukow speech siga is cadidate of the recogized word. A iit Eucidea Squared Distace vaue is deteried i order to ake the fia decisio. If iiu distace is ess tha the iit vaue the the ukow speech siga wi be recogized as the referece ode which has the iiu distace. Otherwise, if the iiu distace is greater tha or equa to the iit vaue, the there is o referece ode that is idicated as the recogized word ad the ukow speech siga is idicated as the utraied word. After the ukow speech siga has bee recogized, the wheechair is cotroed accordig to the recogized word. If the ukow speech siga is recogized as the utraied word, the syste wi do o thig to the wheechair ad sti eecute the ast coad. V. EXPERIMENTAL RESULTS Soe eperiets were doe i order to test the perforace of the desiged speech recogitio syste. There are two kids of the speech recogitio syste that is tested. The first is speech recogitio syste without utraied word. I this kid of the speech recogitio syste, the ukow speech siga wi defiitey be recogized as oe of the referece word. The secod is speech recogitio syste with utraied word. I this syste, the ukow speech siga ca be recogized as oe of the referece word or as utraied word. For eperiet usig syste with utraied word, the words used as the others words are ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, ad TEN. I the eperiets, either i the traiig ode or recogizer ode, soeoe utters the word directy to icrophoe for givig the coad to the wheechair. The eperiets were doe usig sape, 3 sapes, ad 5 sapes of traiig data per word. The vaues of the LPC aaysis paraeters that used i the eperiets are:. Nuber of sapes i the aaysis frae is 240. 2. Nuber of sapes shift betwee two adjacet fraes is 80. 3. LPC aaysis order is 0. 4. Diesio of LPC cepstra vector is 2. Suary of the eperieta resuts are show at tabe ad tabe 2. Tabe shows suary of eperieta resuts of speech recogitio syste without utraied word. Tabe 2 shows the suary of eperieta resuts of speech recogitio syste with utraied word. Tabe. Suary of Eperieta Resuts of Speech Recogitio Syste without Utraied Word Syste without utraied word Word Recogitio Rate (% sape 3 sapes 5 sapes I II I II I II Stop 60 40 50 90 80 00 Forward 00 80 00 00 00 90 Backward 20 80 90 00 50 80 Left 0 60 70 70 20 70 Right 0 90 40 70 60 40 Up 0 20 90 50 80 70 Dow 80 0 0 70 00 50 Others - - - - - - Average Rec. 37.4 Rate 52.86 62.86 78.57 70 7.43 Tabe 2. Suary of Eperieta Resuts of Speech Recogitio Syste with Utraied Word Recogitio Rate (% Word Syste with utraied word sape 3 sapes 5 sapes I II I II I II Stop 0 50 90 60 50 30 Forward 00 40 70 0 80 30 Backward 20 70 50 20 70 40 Left 50 50 0 0 30 0 Right 20 50 70 30 60 20 Up 70 30 0 0 80 60 Dow 70 20 70 0 70 70 Others 0 30 0 60 0 30 Average Rec. 4.25 Rate 42.5 46.25 26.25 55 36.25

Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia As show at tabe ad tabe 2, the highest recogitio rate that ca be achieved is 78.57%. Thus, the error of the syste is about 2.43%. This recogitio rate is resuted fro the eperiet usig 3 traiig data sapes per word ad syste without utraied word. The ore uber of sapes are traied, the higher the average recogitio rate is resuted, especiay i the syste without utraied word. The desiged syste does ot give a good eough resut for the words which do ot icude i the seve recogized words. This is show i the eperieta resut of the speech recogitio syste with utraied word (see tabe 2. Coparig with the syste without utraied word, the syste with utraied word does ot iprove the recogitio rate. Moreover, the recogitio rate teds to decrease. It ca be cocuded that the use of a iit vaue or threshod vaue i order to recogize other words which are ot i the recogized word database as the utraied word, does ot give a iproveet i the recogitio rate. [4] Wikipedia, the free ecycopedia. Eucidea Distace. 2006. http://e.wikipedia.org/wiki/eucidea _distace. [5] Iproved Outcoes Software. Eucidea ad Eucidea Squared Distace. http://www.iprovedoutcoes.co/docs/websitedocs /Custerig/Custerig_Paraeters/Eucidea_ad_Euc idea_squared_distace_metrics.ht [6] Ethicity Group. Cepstru Method. 998 http://www.owet.rice.edu/~eec532/projects98/ speech/cepstru/cepstru.ht VI. CONCLUSIONS Fro the eperieta resuts, it ca be cocuded as foows:. The desiged speech recogitio syste usig LPC- Eucidea Squared Distace ca cotro the wheechair we eough with error of 2.43%. 2. The highest recogitio rate that ca be achieved by the speech recogitio syste usig LPC-Eucidea Squared Distace is 78.57%. This recogitio rate is resuted fro the eperiet of the syste usig 3 traiig data sapes. The syste ca recogize the word 3. As the uber of sapes icreases, the recogitio rate aso teds to icrease. 4. The use of a iit vaue or threshod vaue i order to recogize other words which are ot i the recogized word database as the utraied word, does ot give a iproveet i the recogitio rate. Moreover, i certai eperiet, it coud ot recogize the other words at a as utraied word ad it aways recogize as oe of the words that are traied. REFERENCES [] Lawrece Rabier, ad Biig Hwag Juag, Fudaetas of Speech Recogitio. Pretice Ha, New Jersey, 993 [2] Y.M. La, M.W. Mak, ad P.H.W. Leog, Fied poit ipeetatios of Speech Recogitio Systes. Proceedigs of the Iteratioa Siga Processig Coferece. Daas. 2003 [3] Soshi Iba, Christiaa J. J. Paredis, ad Pradeep K. Khosa. Iteractive Mutioda Robot Prograig. The Iteratioa Joura of Robotics Research (24, 83 04, 2005