The Blizzard Challege 2014 1 Kishore Prahallad, 1 Aadaswarup Vadapalli, 1 Satosh Kesiraju, 2 Hema A. Murthy 3 Swara Lata, 4 T. Nagaraja, 5 Mahadeva Prasaa, 6 Hemat Patil, 7 Ail Kumar Sao 8 Simo Kig, 9 Ala W. Black ad 10 Keiichi Tokuda 1 Speech ad Visio Lab, IIIT Hyderabad, Idia 2 Departmet of CSE, IIT Madras, Idia 3 Departmet of Electroics ad Iformatio Techology, Govt. of Idia 4 Departmet of IT, SSN College of Egieerig, Idia 5 Departmet of EEE, IIT Guwahati, Idia 6 DAIICT, Idia 7 School of Computig ad Electrical Egieerig, IIT Madi, Idia 8 Ceter for Speech Techology Research, Uiversity of Ediburgh, UK 9 Laguage Techologies Istitute, Caregie Mello Uiversity, USA 10 Departmet of Computer Sciece, Nagoya Istitute of Techology, Japa Abstract The Blizzard challege 2014 was the teth aual Blizzard challege orgaized by the followig group of istitutios : IIIT Hyderabad, IIT Madras, DAIICT, SSN College of Egieerig, IIT Madi ad IIT Guwahati with support ad collaboratio from DeitY, Govermet of Idia. This paper describes the tasks i the Blizzard challege 2014. The tasks cosisted of data from six Idia laguages : Assamese, Gujarati, Hidi, Rajasthai, Tamil ad Telugu. Seve participats from aroud the world used the speech data provided as well as the correspodig text trascriptios i UTF-8, to build sythetic voices, which were the evaluated by meas of listeig tests. Idex Terms: Blizzard challege, Speech sythesis, Evaluatio of sythetic speech 1. Itroductio The Blizzard challege, origially started by Black ad Tokuda [1], is a well established challege i the field of speech sythesis. [1 11] are summary papers which provide iformatio about the previous challeges. These resources ca be foud o the Blizzard Challege website 1. This paper is a summary paper describig the tasks i the Blizzard 2014 challege. 2.1. Database used 2. Blizzard 2014 tasks Speech ad text data for six Idia laguages i) Assamese, ii) Gujarati, iii) Hidi, iv) Rajasthai, v) Tamil ad vi) Telugu were released. The speech data for each laguage was 2 hours (sampled at 16 KHz), recorded by professioal speakers i a high quality studio eviromet. Alog with the speech data the correspodig text was provoded i UTF-8 format. No other iformatio, like segmet labels was provided as part of the challege. However, there was o restrictio o the particpats to lear / use iformatio like phoesets or labels from other resources. 1 http://www.festvox.org/blizzard/ For the ature of scripts ad souds of Idia laguage please refer to [11]. 2.2. Tasks Blizzard challege 2014 cosisted of two tasks, a hub task ad ad a spoke task. Hub task 2014-IH1 : Participats were asked to build oe voice i each laguage from the provided data, i accordace of the rules of the challege. The subtasks were umbered from IH1.1 to IH1.6 correspodig to the six laguages : IH1.1 (Assamese [AS]), IH1.2 (Gujarati [GU]), IH1.3 (Hidi [HI]), IH1.4 (Rajasthai [RJ]), IH1.5 (Tamil [TA]) ad IH1.6 (Telugu [TE]). Spoke task 2014-IH2 : Participats had to sythesize multiligual seteces cotaiig Idia laguage text as well as Eglish. The subtasks were umbered from IH2.1 to IH2.6 correspodig to the six laguages : IH2.1 (Assamese [AS]), IH2.2 (Gujarati [GU]), IH2.3 (Hidi [HI]), IH2.4 (Rajasthai [RJ]), IH2.5 (Tamil [TA]) ad IH2.6 (Telugu [TE]). For the IH1 task (hub task), the sythetic voices were evaluated through listeig tests o the followig test data (for each Idia laguage) Read speech (RD) - 100 distict seteces, ot a part of the traiig data Sematically upredictable seteces (SUS) - 50 distict seteces ot a part of the RD/traiig data The SUS seteces were prepared i the followig maer. 50 seteces i each laguage were radomly selected, ad POS taggig was performed o these seteces. The words i each setece were the reordered as Subject Object Verb Cojuctio Subject Object Verb to geerate the SUS setece. For the IH2 task (spoke task), the systems were evaluated through listeig tests by sythesizig the followig test data (for each Idia laguage + Eglish combiatio) Multiligual seteces (ML) - 50 distict seteces cotaiig both Idia laguage as well as Eglish words.
No laguage tags were provided i the ML seteces. The participats were expected to idetify the laguage from the Uicode code poit. 2.3. Participats i the challege The participats i the Blizzard challege 2014 cosisted of the seve participats listed i Table 1. To aoimyze the results, the systems are idetified usig letters, with A deotig atural speech, B deotig the baselie system ad C to K deotig the systems submitted by the participats i the challege. Each participat could submit as may systems as they wished. Table 1: Participats i Blizzard challege 2014 Short ame Details Sythesis method NATURAL Natural speech BASE Baselie system HMM NITECH Nagoya Istitute of HMM Techology USTCP Natioal Egieerig Hybrid (IH1.3) / Laboratory of Speech & Laguage HMM (remaiig) Iformatio Processig (Primary system) CMU Caregie Mello Uiversity HMM S4A Simple4All project HMM + DNN cosortium ILSP Istitute for Laguage ad USS Speech Processig / Ioetics IITMS IIT Madras HMM (IH1.3,IH1.4 ad IH1.6) / (Secodary system) USS (remaiig) IITMP IIT Madras USS (IH1.3,IH1.4 ad IH1.6) / (Primary system) HMM (remaiig) MILE-TTS Dept. of Electrical Egg, USS Idia Istitute of Sciece USTCS Natioal Egieerig HMM Laboratory of Speech & Laguage Iformatio Processig (Secodary system) 2.4. Baselie systems Baselie systems were built for each laguage usig the speaker idepedet HTS-2.2 + STRAIGHT scripts 2. The data was labeled at the phoe level usig the HMM labelig script (EHMM) i FestVox 3 [12]. For letter to soud rules a set of simple aive first order approximatios were used for each laguage. 3. Evaluatio The participats were asked to sythesize the complete test set, out of which a subset was used i the listeig tests. The listeig tests for IH1.1 - IH1.6 cosisted of te sectios while the listeig tests for IH2.1 - IH2.6 cosisted of five sectios. The differet sectios of the listeig tests are described below. Listeig tests for IH1.1 - IH1.6 1. two sectios for similarity (oe sectio usig RD ad oe sectio usig SUS) 2. seve sectios for aturaless (four sectios usig RD ad three sectios usig SUS) 3. oe sectio for itelligibility usig SUS Listeig tests for IH2.1 - IH2.6 1. oe sectio for similarity 2 http://hts.sp.itech.ac.jp/?dowload 3 http://www.festvox.org 2. four sectios for aturaless The methodology of scorig i the various sectios of the listeig tests are described below. Similarity : The listeer plays a few samples of the origial speaker ad oe sythetic sample. The listeer the chooses a respose that represeted how similar the sythetic voice souded as compared to the origial speakers voice o a scale from 1 : Souds like a totally differet perso to 5 : Souds exactly like the same perso Naturaless : The listeer listees to a sample of sythetic speech ad chooses a score which represets how atural or uatural the setece souded o a scale of 1 : Completely Uatural to 5 : Completely Natural Itelligibility : Listeers liste to a utterace ad type i what they hear. Word Error Rate (WER) is computed i the same maer it is computed for speech recogitio tasks. For the list of chages made i the evaluatio portal to eable the coduct of listeig tests i Idia laguages, please refer to [11] 4. Results The followig listeer types were used for the listeig tests : Paid users Olie voluteers Apart from these types of listeers, we also experimeted with coductig listeig tests o Amazo mechaical turk (AMT). Table 2 shows the statistics of the differet listeer types for the tasks. Table 2: User statistics for the Blizzard 2014 tasks 4.1. Results Task Paid Olie AMT Users voluteers users IH1.1 + IH1.1 106 09 - IH1.2 + IH2.1 50 0 - IH1.3 + IH2.3 100 09 54 IH1.4 + IH2.4 101 09 - IH1.5 + IH2.5 100 09 55 IH1.6 + IH2.6 100 06 44 For the six laguages i the IH1 hub task (IH1.1 - IH1.6), Figures 1 to 6 ad Figures 7 to 12 show the similarity ad aturaless results o RD ad SUS respectively. The itelligibility results for the hub task (IH1.1 - IH1.6) are show i Figures 13 to 18. For the spoke task (IH2.1 - IH2.6), Figures 19 to 24 show the similarity ad aturaless results o ML.
For a detailed discussio of the results, please refer to the papers describig each system submitted by idividual participats, available o the Blizzard Challege website. 5. Coclusios The coclusios draw from the results of the Blizzard challege 2014 are : The high quality audio recordigs provided decet performaces by all systems All teams performed better tha the baselie system. This ca be attributed to the fact that ope source toolkits typically require sufficiet tuig to make them work better for ew/arbitrary laguages. There does ot seem to be much utility i computig WER as a measure of itelligibility for Idia laguages. Some teams performed better o the ML task as compared to RD ad SUS. s obtaied from Amazo mechaical turk listeers show too much oise ad variability i the score. These listeers ca ot be used as a alterative to paid listeers. 6. Refereces [1] A. W. Black ad K. Tokuda, The Blizzard Challege - 2005 : Evaluatig corpus-based speech sythesis o commo datasets, i Proceedigs of Itespeech 2005, Lisbo, 2005. [2] C. L. Beett, Large scale evaluatio of corpus-based sythesizers : Results ad lessos from the Blizzard Challege 2005, i Proceedigs of Iterspeech 2005, 2005. [3] C. L. Beett ad A. W. Black, The Blizzard Challege 2006, i Blizzard Challege Workshop, Iterspeech 2006 - ICSLP satellite evet, 2006. [4] M. Frazer ad S. Kig, The Blizzard Challege 2007, i Proceedigs Blizzard Workshop 2007 (i Proc. SSW6), 2007. [5] V. Karaiskos, S. Kig, R. Clark, ad C. Mayo, The Blizzard Challege 2008, i Proceedigs Blizzard Workshop 2008, 2008. [6] S. Kig ad V. Karaiskos, The Blizzard Challege 2009, i Proceedigs Blizzard Workshop 2009, 2009. [7], The Blizzard Challege 2010, i Proceedigs Blizzard Workshop 2010, 2010. [8], The Blizzard Challege 2011, i Proceedigs Blizzard Workshop 2011, 2011. [9], The Blizzard Challege 2012, i Proceedigs Blizzard Workshop 2012, 2012. [10], The Blizzard Challege 2013, i Proceedigs Blizzard Workshop 2013, 2013. [11] K. Prahallad, A. Vadapalli, N. Elluru, G. Matea, B. Pulugudla, P. Bhaskararao, H. A. Murthy, S. Kig, V. Karaiskos, ad A. W. Black, The Blizzard Challege 2013 Idia Laguage Tasks, i Proceedigs Blizzard Workshop 2013, 2013. [12] A. W. Black ad K. Lezo, Buildig voices i the festival speech sythesis system, 2002, available Olie: http://festvox.org/bsv. [13] R. Clark, M. Podsiadlo, M. Fraser, C. Mayo, ad S. Kig, Statistical aalysis of the Blizzard Challege 2007 listeig test results, i Proceedig Blizzard Workshop 2007 (i Proceedigs SSW6), 2007.
RD Mea Opiio s (similarity to origial speaker) IH1.1 Paid listeers, RD Mea Opiio s (aturaless) IH1.1 Paid listeers 106 106 106 106 106 106 106 106 424 424 424 424 424 424 424 424 A B C D E F G I A B C D E F G I Figure 1: Similarity ad Naturaless results o RD for IH1.1 (Assamese) RD Mea Opiio s (similarity to origial speaker) IH1.2 Paid listeers, RD Mea Opiio s (aturaless) IH1.2 Paid listeers 50 50 50 50 50 50 50 50 50 200 200 200 200 200 200 200 200 200 Figure 2: Similarity ad Naturaless results o RD for IH1.2 (Gujarati)
RD Mea Opiio s (similarity to origial speaker) IH1.3 Paid listeers, RD Mea Opiio s (aturaless) IH1.3 Paid listeers 100 100 100 100 100 100 100 100 100 100 400 400 400 400 400 400 400 400 400 400 K K Figure 3: Similarity ad Naturaless results o RD for IH1.3 (Hidi) RD Mea Opiio s (similarity to origial speaker) IH1.4 Paid listeers, RD Mea Opiio s (aturaless) IH1.4 Paid listeers 101 101 101 101 101 101 101 101 101 404 404 404 404 404 404 404 404 404 Figure 4: Similarity ad Naturaless results o RD for IH1.4 (Rajasthai)
RD Mea Opiio s (similarity to origial speaker) IH1.5 Paid listeers, RD Mea Opiio s (aturaless) IH1.5 Paid listeers 100 100 100 100 100 100 100 100 100 100 400 400 400 400 400 400 400 400 400 400 J J Figure 5: Similarity ad Naturaless results o RD for IH1.5 (Tamil) RD Mea Opiio s (similarity to origial speaker) IH1.6 Paid listeers, RD Mea Opiio s (aturaless) IH1.6 Paid listeers 100 100 100 100 100 100 100 100 100 400 400 400 400 400 400 400 400 400 Figure 6: Similarity ad Naturaless results o RD for IH1.6 (Telugu)
SUS Mea Opiio s (similarity to origial speaker) IH1.1 Paid listeers SUS Mea Opiio s (aturaless) IH1.1 Paid listeers 106 106 106 106 106 106 106 106 318 318 318 318 318 318 318 318 A B C D E F G I A B C D E F G I Figure 7: Similarity ad Naturaless results o SUS for IH1.1 (Assamese) SUS Mea Opiio s (similarity to origial speaker) IH1.2 Paid listeers SUS Mea Opiio s (aturaless) IH1.2 Paid listeers 50 50 50 50 50 50 50 50 50 150 150 150 150 150 150 150 150 150 Figure 8: Similarity ad Naturaless results o SUS for IH1.2 (Gujarati)
SUS Mea Opiio s (similarity to origial speaker) IH1.3 Paid listeers SUS Mea Opiio s (aturaless) IH1.3 Paid listeers 100 100 100 100 100 100 100 100 100 100 300 300 300 300 300 300 300 300 300 300 K K Figure 9: Similarity ad Naturaless results o SUS for IH1.3 (Hidi) SUS Mea Opiio s (similarity to origial speaker) IH1.4 Paid listeers SUS Mea Opiio s (aturaless) IH1.4 Paid listeers 101 101 101 101 101 101 101 101 101 303 303 303 303 303 303 303 303 303 Figure 10: Similarity ad Naturaless results o SUS for IH1.4 (Rajasthai)
SUS Mea Opiio s (similarity to origial speaker) IH1.5 Paid listeers SUS Mea Opiio s (aturaless) IH1.5 Paid listeers 100 100 100 100 100 100 100 100 100 100 300 300 300 300 300 300 300 300 300 300 J J Figure 11: Similarity ad Naturaless results o SUS for IH1.5 (Tamil) SUS Mea Opiio s (similarity to origial speaker) IH1.6 Paid listeers SUS Mea Opiio s (aturaless) IH1.6 Paid listeers 100 100 100 100 100 100 100 100 100 300 300 300 300 300 300 300 300 300 Figure 12: Similarity ad Naturaless results o SUS for IH1.6 (Telugu)
WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.1 Paid listeers) 106 101 102 99 103 102 103 104 A B C D E F G I WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.2 Paid listeers) 49 49 48 47 48 50 48 48 49 Itelligibility results o SUS for IH1.1 (As- Figure 13: samese) Figure 14: Itelligibility results o SUS for IH1.2 (Gujarati) WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.3 Paid listeers) 99 99 95 100 100 100 99 100 99 100 K WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.4 Paid listeers) 101 46 99 100 101 101 101 100 101 Figure 15: Itelligibility results o SUS for IH1.3 (Hidi) Itelligibility results o SUS for IH1.4 (Ra- Figure 16: jasthai)
SUS Word error rate (IH1.5 Paid listeers) SUS Word error rate (IH1.6 Paid listeers) WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 96 95 85 98 99 98 99 92 97 95 WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 100 89 86 98 98 100 99 100 95 J Figure 17: Itelligibility results o SUS for IH1.5 (Tamil) Figure 18: Itelligibility results o SUS for IH1.6 (Telugu)
ML Mea Opiio s (similarity to origial speaker) IH2.1 Paid listeers ML Mea Opiio s (aturaless) IH2.1 Paid listeers 106 106 106 106 106 424 424 424 424 424 A B C D E A B C D E Figure 19: Similarity ad Naturaless results o ML for IH2.1 (Assamese) ML Mea Opiio s (similarity to origial speaker) IH2.2 Paid listeers ML Mea Opiio s (aturaless) IH2.2 Paid listeers 50 50 50 50 50 200 200 200 200 200 A B C D E A B C D E Figure 20: Similarity ad Naturaless results o ML for IH2.2 (Gujarati)
ML Mea Opiio s (similarity to origial speaker) IH2.3 Paid listeers ML Mea Opiio s (aturaless) IH2.3 Paid listeers 100 100 100 100 100 100 100 300 300 300 300 300 300 300 A B C D E F K A B C D E F K Figure 21: Similarity ad Naturaless results o ML for IH2.3 (Hidi) ML Mea Opiio s (similarity to origial speaker) IH2.4 Paid listeers ML Mea Opiio s (aturaless) IH2.4 Paid listeers 101 101 101 101 101 101 404 404 404 404 404 404 A B C D E F A B C D E F Figure 22: Similarity ad Naturaless results o ML for IH2.4 (Rajasthai)
ML Mea Opiio s (similarity to origial speaker) IH2.5 Paid listeers ML Mea Opiio s (aturaless) IH2.5 Paid listeers 100 100 100 100 100 100 400 400 400 400 400 400 A B C D E J A B C D E J Figure 23: Similarity ad Naturaless results o ML for IH2.5 (Tamil) ML Mea Opiio s (similarity to origial speaker) IH2.6 Paid listeers ML Mea Opiio s (aturaless) IH2.6 Paid listeers 100 100 100 100 100 100 400 400 400 400 400 400 A B C D E F A B C D E F Figure 24: Similarity ad Naturaless results o ML for IH2.6 (Telugu)