Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia Speech Recogitio for Cotroig Moveet of the Wheechair Thiag Eectrica Egieerig Departet, Petra Christia Uiversity Siwaakerto 2-3, Surabaya, Idoesia Eai: thiag@petra.ac.id, phoe: +62-3-29835 Abstract A otorized wheechair usuay uses a joystick as the iput iterface for cotroig oveet of the wheechair. Istead of the joystick, speech siga of soe words are used to cotro oveet of the wheechair. I order to achieve that ai, a speech recogitio syste has bee ipeeted to recogize the word ad the cotro oveet of the wheechair accordig to recogized word. The ethod used to recogize the speech siga is Liear Predictive Codig (LPC cobied with Eucidea Squared Distace. LPC is used as the feature etractio ethod ad Eucidea squared Distace is used as the recogitio ethod. This approach works o tie doai. The wheechair oveet is actuated by usig two DC otors. Both DC otors are cotroed by usig icrocotroers. A sipe ope oop cotro syste is ipeeted to cotro the speed of DC otor. Eperiets were doe to aayze perforace of the desiged syste. The eperiets were doe usig sape, 3 sapes, ad 5 sapes of traiig data per word. Eperieta resuts show that the highest average recogitio rate that ca be achieved was 78.57%. forward, backward, eft, right, up, ad dow. I order to achieve that ai, a speech recogitio syste has bee ipeeted to recogize the word ad the cotro oveet of the wheechair accordig to recogized word. The speech recogitio is ipeeted o a icrocotroer, which aso cotros the wheechair. The approach used to recogize the word i the speech siga is iear predictive codig (LPC ad eucidea squared distace. Two DC otors are used to actuate the oveet of wheechair. For ore detai, et chapter wi describe the echais of wheechair. Chapter 3 wi describe about iear predictive codig ad Eucidea squared distace. Chapter 4 wi epai about the ipeetatio of speech recogitio usig LPC ad eucidea squared distace. Eperieta resut wi be showed i the chapter 5 ad the ast, chapter 6, gives the discussio ad cocusio. II. MECHANISM OF WHEELCHAIR The wheechair was buit usig rectage hoow stee. Diesio of the wheechair is 60 c 78 c 0 c. The foowig figures show echais of the wheechair. Keywords speech recogitio, iear predictive codig, eucidea squared distace, wheechair G I. INTRODUCTION eeray, a otorized wheechair usuay uses a joystick as the iput iterface for cotroig oveet of the wheechair. If we wat to ove forward or tur eft or tur right the wheechair, we ca do it by ovig the joystick to the sae directio. I this project, we tried to substitute the joystick with aother iput iterface. Istead of the joystick, speech siga of soe words are used to cotro oveet of the wheechair. There are seve words used to cotro oveet of the wheechair. Those words are stop, Fig. Mechais of Wheechair (frot view
Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia have to estiate the LPC coefficiets fro a short seget of the speech siga occurrig aroud tie. This probe ca be soved by usig LPC processor, which is show i the figure 3. Fig 2. Mechais of Wheechair (side view The wheechair has four whees, which cosist of two rear whees ad two frot whees. The two frot whees are pivot whee which are set free. There is o actuator which drives both frot whees. Thus, the frot whees ca ove freey i rotatio ad straight directio. Therefore, oveet of the wheechair ca oy be perfored by drivig the rear whees. Diaeter of the frot whee is 0 c ad diaeter of the rear whee is 22 c. Two DC otor are used as the actuator of the wheechair. Oe DC otor drives oe rear whee. Specificatios of the DC otors are 20 V, 2 A, ad 200 rp. The DC otor that is used to actuate rear whee, has its ow gearbo to reduce the speed of the otor i the ratio of :5. The, speed of the otors is reduced agai by usig gears ad chai syste with ratio :5. Thus, speed of the otor is totay reduced with ratio :75. O the cotrary, the torque of the otor totay icreases with ratio :75. Because the aiu gear bo output speed of the DC otor is 200 rp, the aiu iear speed of the wheechair ca be cacuated as: 200 Liear Speed π 0.22 0.46 / s.66 k / hr 5 60 III. LINEAR PREDICTIVE CODING AND EUCLIDEAN SQUARED DISTANCE The basic idea behid the LPC ode is that a give speech sape at tie, s(, ca be approiated as a iear cobiatio of the past p speech sapes, such that s a s( + a s( 2 + + a ps( ( ( 2 p where the coefficiets a, a 2,.., a p are LPC coefficiets ad p is LPC order. The probe of the LPC aaysis is to deterie the set of LPC coefficiets directy fro the speech siga. Sice the spectra characteristics of speech vary over tie, we Fig 3. Bock Diagra of LPC Processor The basic steps i the processig of LPC processor icude the foowig:. Preephasis The digitized speech siga, s(, is put through a ow order digita syste, to spectray fatte the siga ad to ake it ess susceptibe to fiite precisio effects ater i the siga processig. The output of the preephasizer etwork, ~ s (, is reated to the iput to the etwork, s (, by differece equatio: ~ s ( s( as ~ ( (2 The ost coo vaue for a ~ is aroud 0.95. 2. Frae Bockig The output of preephasis step, ~ s (, is bocked ito fraes of N sapes, with adjacet fraes beig separated by M sapes. If ( is the th frae of speech, ad there are L fraes withi etire speech siga, the ~ ( s ( M + 0,,,N 0,,,L (3 3. Widowig After frae bockig, the et step is to widow each idividua frae so as to iiize the siga discotiuities at the begiig ad ed of each frae. If we defie the widow as w(, 0 N, the the resut of widowig is the siga: ~ ( ( w( 0 N (4 Typica widow is the Haig widow, which has the for 2π w( 0.54 0.46 cos 0 N (5 N
Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia 4. Autocorreatio Aaysis The et step is to auto correate each frae of widowed siga i order to give r ( 0,,,p (6 where the highest autocorreatio vaue, p, is the order of the LPC aaysis. 5. LPC Aaysis The et processig step is the LPC aaysis, which coverts each frae of p + autocorreatios ito LPC paraeter set by usig Durbi s ethod. This ca foray be give as the foowig agorith: (0 E r(0 (7 i i r( i α j j i i k r( i j E i p (8 ( i α i ki (9 ( i ( i ( i α α k α j i- (0 j j i i j ( i 2 i E ( ki E ( By sovig the equatio 7 to recursivey for i,2,,p, the LPC coefficiet, a, is give as a N ~ 0 ( p ( ~ ( + α (2 ( p q + ( p q 2 ED 2 (6 y y Geera equatio of Eucidea distace betwee two poits, P ( p, p 2,..., p ad Q ( q, q 2,..., q, i Eucidea -space is defied as: ED ( p i q i i 2 (7 The Eucidea Squared Distace uses the sae pricipe as the Eucidea distace. The differet is the Eucidea Squared Distace does ot take the square root. The equatio of Eucidea squared distace is defied as: ESD ( p i q i i 2 (8 IV. IMPLEMENTATION OF SPEECH RECOGNITION FOR CONTROLLING WHEELCHAIR Fig 4. Bock Diagra of Traiig Syste 6. LPC Paraeter Coversio to Cepstra Coefficiets LPC cepstra coefficiets, is a very iportat LPC paraeter set, which ca be derived directy fro the LPC coefficiet set. The recursio used is k c a + ck a k k p (3 k c ck a k k p > p (4 Defiitio of Eucidea distace or Eucidea etric is the ordiary distace betwee the two poits that oe woud easure with a ruer, which ca be prove by repeated appicatio of the Pythagorea theore [4]. For two oe-diesioa poits, P ( p ad Q ( q, the Eucidea distace is cacuated by usig the foowig equatio: ( 2 ED p q (5 For two two-diesioa poits, P ( p, p y ad Q ( q,, the Eucidea distace ca be cacuated by q y usig the foowig equatio: V. EXPERIMENTAL RESULT Fig 5. Bock diagra of Recogizer Syste The approach of speech recogitio ipeeted o this syste is Liear Predictive Codig (LPC, which is cobied with Eucidea Squared Distace ethod. LPC is used as the feature etractio ethod ad Eucidea Squared Distace is used as the recogitio ethod. Bock
Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia diagra of LPC ad Eucidea Distace traiig ad recogizer syste are show at figure 4 ad figure 5 respectivey. I the traiig syste, traiig data are saped directy fro icrophoe. The, each traiig sape is processed usig LPC processor agorith (equatio 2 up to equatio 4 ad the resut of this process is a set of cepstra coefficiets. These cepstra coefficiets are used as the referece ode. Data sapig is doe at rate of 8 khz ad data is recorded i 0.5 secods tie duratio. A sipe agorith was ipeeted to detect the eistece of the speech siga. The syste reads four cosecutive sapig data ad the cacuates the average of those four data. If the average vaue is ess tha a iit vaue, it eas there is o speech siga. If the average vaue is greater tha or equa to that iit vaue, it eas there is a speech siga ad the the icrocotroer wi start to read ad record the siga i 0.5 secods. The iit vaue is defied by tria ad error eperiet. I the recogizer syste, firsty, a ukow speech siga wi be processed by usig the LPC processor too. The resut of this process is cepstra coefficiets of the ukow speech siga. The, cacuatio of Eucidea Squared Distace betwee cepstra coefficiets of the ukow speech siga ad cepstra coefficiets of the referece ode is perfored. Cacuatio of Eucidea Squared Distace is doe for each referece ode by usig equatio 8. The referece ode which has the iiu distace to the ukow speech siga is cadidate of the recogized word. A iit Eucidea Squared Distace vaue is deteried i order to ake the fia decisio. If iiu distace is ess tha the iit vaue the the ukow speech siga wi be recogized as the referece ode which has the iiu distace. Otherwise, if the iiu distace is greater tha or equa to the iit vaue, the there is o referece ode that is idicated as the recogized word ad the ukow speech siga is idicated as the utraied word. After the ukow speech siga has bee recogized, the wheechair is cotroed accordig to the recogized word. If the ukow speech siga is recogized as the utraied word, the syste wi do o thig to the wheechair ad sti eecute the ast coad. V. EXPERIMENTAL RESULTS Soe eperiets were doe i order to test the perforace of the desiged speech recogitio syste. There are two kids of the speech recogitio syste that is tested. The first is speech recogitio syste without utraied word. I this kid of the speech recogitio syste, the ukow speech siga wi defiitey be recogized as oe of the referece word. The secod is speech recogitio syste with utraied word. I this syste, the ukow speech siga ca be recogized as oe of the referece word or as utraied word. For eperiet usig syste with utraied word, the words used as the others words are ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, ad TEN. I the eperiets, either i the traiig ode or recogizer ode, soeoe utters the word directy to icrophoe for givig the coad to the wheechair. The eperiets were doe usig sape, 3 sapes, ad 5 sapes of traiig data per word. The vaues of the LPC aaysis paraeters that used i the eperiets are:. Nuber of sapes i the aaysis frae is 240. 2. Nuber of sapes shift betwee two adjacet fraes is 80. 3. LPC aaysis order is 0. 4. Diesio of LPC cepstra vector is 2. Suary of the eperieta resuts are show at tabe ad tabe 2. Tabe shows suary of eperieta resuts of speech recogitio syste without utraied word. Tabe 2 shows the suary of eperieta resuts of speech recogitio syste with utraied word. Tabe. Suary of Eperieta Resuts of Speech Recogitio Syste without Utraied Word Syste without utraied word Word Recogitio Rate (% sape 3 sapes 5 sapes I II I II I II Stop 60 40 50 90 80 00 Forward 00 80 00 00 00 90 Backward 20 80 90 00 50 80 Left 0 60 70 70 20 70 Right 0 90 40 70 60 40 Up 0 20 90 50 80 70 Dow 80 0 0 70 00 50 Others - - - - - - Average Rec. 37.4 Rate 52.86 62.86 78.57 70 7.43 Tabe 2. Suary of Eperieta Resuts of Speech Recogitio Syste with Utraied Word Recogitio Rate (% Word Syste with utraied word sape 3 sapes 5 sapes I II I II I II Stop 0 50 90 60 50 30 Forward 00 40 70 0 80 30 Backward 20 70 50 20 70 40 Left 50 50 0 0 30 0 Right 20 50 70 30 60 20 Up 70 30 0 0 80 60 Dow 70 20 70 0 70 70 Others 0 30 0 60 0 30 Average Rec. 4.25 Rate 42.5 46.25 26.25 55 36.25
Proc. of the 2 d Iteratioa Cof. o Optics ad Laser Appicatios ICOLA 07, Septeber 5-7, Yogyakarta, Idoesia As show at tabe ad tabe 2, the highest recogitio rate that ca be achieved is 78.57%. Thus, the error of the syste is about 2.43%. This recogitio rate is resuted fro the eperiet usig 3 traiig data sapes per word ad syste without utraied word. The ore uber of sapes are traied, the higher the average recogitio rate is resuted, especiay i the syste without utraied word. The desiged syste does ot give a good eough resut for the words which do ot icude i the seve recogized words. This is show i the eperieta resut of the speech recogitio syste with utraied word (see tabe 2. Coparig with the syste without utraied word, the syste with utraied word does ot iprove the recogitio rate. Moreover, the recogitio rate teds to decrease. It ca be cocuded that the use of a iit vaue or threshod vaue i order to recogize other words which are ot i the recogized word database as the utraied word, does ot give a iproveet i the recogitio rate. [4] Wikipedia, the free ecycopedia. Eucidea Distace. 2006. http://e.wikipedia.org/wiki/eucidea _distace. [5] Iproved Outcoes Software. Eucidea ad Eucidea Squared Distace. http://www.iprovedoutcoes.co/docs/websitedocs /Custerig/Custerig_Paraeters/Eucidea_ad_Euc idea_squared_distace_metrics.ht [6] Ethicity Group. Cepstru Method. 998 http://www.owet.rice.edu/~eec532/projects98/ speech/cepstru/cepstru.ht VI. CONCLUSIONS Fro the eperieta resuts, it ca be cocuded as foows:. The desiged speech recogitio syste usig LPC- Eucidea Squared Distace ca cotro the wheechair we eough with error of 2.43%. 2. The highest recogitio rate that ca be achieved by the speech recogitio syste usig LPC-Eucidea Squared Distace is 78.57%. This recogitio rate is resuted fro the eperiet of the syste usig 3 traiig data sapes. The syste ca recogize the word 3. As the uber of sapes icreases, the recogitio rate aso teds to icrease. 4. The use of a iit vaue or threshod vaue i order to recogize other words which are ot i the recogized word database as the utraied word, does ot give a iproveet i the recogitio rate. Moreover, i certai eperiet, it coud ot recogize the other words at a as utraied word ad it aways recogize as oe of the words that are traied. REFERENCES [] Lawrece Rabier, ad Biig Hwag Juag, Fudaetas of Speech Recogitio. Pretice Ha, New Jersey, 993 [2] Y.M. La, M.W. Mak, ad P.H.W. Leog, Fied poit ipeetatios of Speech Recogitio Systes. Proceedigs of the Iteratioa Siga Processig Coferece. Daas. 2003 [3] Soshi Iba, Christiaa J. J. Paredis, ad Pradeep K. Khosa. Iteractive Mutioda Robot Prograig. The Iteratioa Joura of Robotics Research (24, 83 04, 2005