The Blizzard Challenge 2014

Similar documents
Motivation. Analysis-and-manipulation approach to pitch and duration of musical instrument sounds without distorting timbral characteristics

The Communication Method of Distance Education System and Sound Control Characteristics

Mullard INDUCTOR POT CORE EQUIVALENTS LIST. Mullard Limited, Mullard House, Torrington Place, London Wel 7HD. Telephone:

Australian Journal of Basic and Applied Sciences

References and quotations

Quality improvement in measurement channel including of ADC under operation conditions

Implementation of Expressive Performance Rules on the WF-4RIII by modeling a professional flutist performance using NN

Line numbering and synchronization in digital HDTV systems

Logistics We are here. If you cannot login to MarkUs, me your UTORID and name.

Analyzing the influence of pitch quantization and note segmentation on singing voice alignment in the context of audio-based Query-by-Humming

A Novel Method for Music Retrieval using Chord Progression

EE260: Digital Design, Spring /3/18. n Combinational Logic: n Output depends only on current input. n Require cascading of many structures

Research on the Classification Algorithms for the Classical Poetry Artistic Conception based on Feature Clustering Methodology. Jin-feng LIANG 1, a

Music Scope Headphones: Natural User Interface for Selection of Music

NIIT Logotype YOU MUST NEVER CREATE A NIIT LOGOTYPE THROUGH ANY SOFTWARE OR COMPUTER. THIS LOGO HAS BEEN DRAWN SPECIALLY.

Internet supported Analysis of MPEG Compressed Newsfeeds

COLLEGE READINESS STANDARDS

Image Intensifier Reference Manual

Randomness Analysis of Pseudorandom Bit Sequences

Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Manual RCA-1. Item no fold RailCom display. tams elektronik. n n n

Recognition of Human Speech using q-bernstein Polynomials

PROBABILITY AND STATISTICS Vol. I - Ergodic Properties of Stationary, Markov, and Regenerative Processes - Karl Grill

Polychrome Devices Reference Manual

Emotional Intelligence:

RHYTHM TRANSCRIPTION OF POLYPHONIC MIDI PERFORMANCES BASED ON A MERGED-OUTPUT HMM FOR MULTIPLE VOICES

Chapter 7 Registers and Register Transfers

Before you submit your application for a speech generating device, we encourage you to take the following steps:

Working with PlasmaWipe Effects

Reliable Transmission Control Scheme Based on FEC Sensing and Adaptive MIMO for Mobile Internet of Things

Organic Macromolecules and the Genetic Code A cell is mostly water.

The new, parametrised VS Model for Determining the Quality of Video Streams in the Video-telephony Service

Comparative Study of Different Techniques for License Plate Recognition

University Student Design and Applied Solutions Competition

MODELLING PERCEPTION OF SPEED IN MUSIC AUDIO

RELIABILITY EVALUATION OF REPAIRABLE COMPLEX SYSTEMS AN ANALYZING FAILURE DATA

Image Enhancement in the JPEG Domain for People with Vision Impairment

Study Guide. Advanced Composition

VOCALS SYLLABUS SPECIFICATION Edition

Debugging Agent Interactions: a Case Study

EXEMPLARY CENTER FOR READING INSTRUCTION ECRI Effective: January 1, 2018 SCHOOL PRODUCTS PRICE LIST

A Simulation Experiment on a Built-In Self Test Equipped with Pseudorandom Test Pattern Generator and Multi-Input Shift Register (MISR)

Voice Security Selection Guide

Detection of Historical Period in Symbolic Music Text

THE Internet of Things (IoT) is likely to be incorporated

Data Marketplace The Next IoT Frontier

TOWARDS AN AUDITORY REPRESENTATION OF COMPLEXITY

Our competitive advantages : Solutions for X ray Tubes. X ray emitters. Long lifetime dispensers cathodes n. Electron gun manufacturing capability n

Volume 20, Number 2, June 2014 Copyright 2014 Society for Music Theory

Manual WIB Carriage lighting Colour of lighting: warm white. Item no tams elektronik. tams elektronik n n n

Analysis and Detection of Historical Period in Symbolic Music Data

ABSTRACT. woodwind multiphonics. Each section is based on a single multiphonic or a combination thereof distributed across the wind

Elizabeth H. Phillips-Hershey and Barbara Kanagy Mitchell

A Model of Metric Coherence

Nuachtlitir na Leabharlainne James Hardiman Library Newsletter. Issue 34, February 2009 JAMES HARDIMAN LIBRARY INSIDE

Belgrade Community & Education Company. Education Pack To support the exhibition at the Belgrade Theatre for use with young people aged 11+

ROUNDNESS EVALUATION BY GENETIC ALGORITHMS

i j k l m n o p q r s t u v w x y z a b c d e f g h i j k l m n o p q r s t u v w x y Z

PPt. AQA text. Extracts. PPT Umbrella music clip

Incidence and Progression of Astigmatism in Singaporean Children METHODS

Quantifying Domestic Movie Revenues Using Online Resources in China

Linguistic Stereotyping in Hollywood Cinema

CODE GENERATION FOR WIDEBAND CDMA

NexLine AD Power Line Adaptor INSTALLATION AND OPERATION MANUAL. Westinghouse Security Electronics an ISO 9001 certified company

42n2_2ndcorrex.qxd 04/24/ :50 AM Page cov1 cover cov1

Preview Only. Legal Use Requires Purchase

9311 EN. DIGIFORCE X/Y monitoring. For monitoring press-fit, joining, rivet and caulking operations Series 9311 ±10V DMS.

NewBlot PVDF 5X Stripping Buffer

MOBILVIDEO: A Framework for Self-Manipulating Video Streams

Higher-order modulation is indispensable in mobile, satellite,

unit 10 Community Helpers by Joni Bowman

TRAINING & QUALIFICATION PROSPECTUS

Read Only Memory (ROM)

See all Grenoble-area research events at a glance at 38 de Sciences

L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture

Practice Guide Sonata in F Minor, Op. 2, No. 1, I. Allegro Ludwig van Beethoven

T-25e, T-39 & T-66. G657 fibres and how to splice them. TA036DO th June 2011

STATE AND LOCAL GOVERNMENT RECORDS RECORDS RETENTION SCHEDULE (RC-2) See instructions before completing this form,

Index. LV Series. Multimedia Projectors FULL LINE PRODUCT GUIDE. usa.canon.com/projectors. REALiS LCOS Projectors. WUX10 Mark II D WUX10 Mark II...

Grammar 6: Sheet 1 Answer Guide

THE UNIVERSITY OF THE SOUTH PACIFIC LIBRARY Author Statement of Accessibility. Yes % %

Perspectives AUTOMATION. As the valve turns By Jim Garrison. The Opportunity to make Misteaks By Doug Aldrich, Ph.D., CFM

Using a Computer Screen as a Whiteboard while Recording the Lecture as a Sound Movie

Taking your meetings to the next level is how we re engineering a better world.

Prokofieff, Serge. Piano Sonatas. [Nos. 1 9] Edited and annotated by Irwin Freundlich. New York: Leeds Music Corporation, 1957.

UNIT 7. Could You...?

Vocabulary. Vocabulary. 1 Which things have you got in your school bag? pencil case trainers. Geography. French. 7 calculator

COMMITTEE ON THE HISTORY OF THE FEDERAL RESERVE SYSTEM. Register of Papers CHARLES SUMNER HAMLIM ( )

Sensor Data Processing and Neuro-inspired Computing

Preview Only. Legal Use Requires Purchase

ANNUAL CONFERENCE Seven Springs Mountain Resort, Seven Springs, PA

Canon Canada Builds Its New LEED Gold Certified Canadian Headquarters in Partnership with Applied Electronics

HOSANNA CELEBRATION Patrick Liebergen

Math of Projections:Overview. Perspective Viewing. Perspective Projections. Perspective Projections. Math of perspective projection

El objetivo final del arte es mostrar los tejidos internos del alma. Manuel Viola (Spanish artist) launching the unit

Facial Expression Recognition Method Based on Stacked Denoising Autoencoders and Feature Reduction

Reference Question Data Mining

Film Education Overview of Activities Value for Money and Significant Returns. Contents. Bringing film to education across the UK

Guide to condition reports for domestic electrical installations

University of South Carolina Libraries Annual Report

Transcription:

The Blizzard Challege 2014 1 Kishore Prahallad, 1 Aadaswarup Vadapalli, 1 Satosh Kesiraju, 2 Hema A. Murthy 3 Swara Lata, 4 T. Nagaraja, 5 Mahadeva Prasaa, 6 Hemat Patil, 7 Ail Kumar Sao 8 Simo Kig, 9 Ala W. Black ad 10 Keiichi Tokuda 1 Speech ad Visio Lab, IIIT Hyderabad, Idia 2 Departmet of CSE, IIT Madras, Idia 3 Departmet of Electroics ad Iformatio Techology, Govt. of Idia 4 Departmet of IT, SSN College of Egieerig, Idia 5 Departmet of EEE, IIT Guwahati, Idia 6 DAIICT, Idia 7 School of Computig ad Electrical Egieerig, IIT Madi, Idia 8 Ceter for Speech Techology Research, Uiversity of Ediburgh, UK 9 Laguage Techologies Istitute, Caregie Mello Uiversity, USA 10 Departmet of Computer Sciece, Nagoya Istitute of Techology, Japa Abstract The Blizzard challege 2014 was the teth aual Blizzard challege orgaized by the followig group of istitutios : IIIT Hyderabad, IIT Madras, DAIICT, SSN College of Egieerig, IIT Madi ad IIT Guwahati with support ad collaboratio from DeitY, Govermet of Idia. This paper describes the tasks i the Blizzard challege 2014. The tasks cosisted of data from six Idia laguages : Assamese, Gujarati, Hidi, Rajasthai, Tamil ad Telugu. Seve participats from aroud the world used the speech data provided as well as the correspodig text trascriptios i UTF-8, to build sythetic voices, which were the evaluated by meas of listeig tests. Idex Terms: Blizzard challege, Speech sythesis, Evaluatio of sythetic speech 1. Itroductio The Blizzard challege, origially started by Black ad Tokuda [1], is a well established challege i the field of speech sythesis. [1 11] are summary papers which provide iformatio about the previous challeges. These resources ca be foud o the Blizzard Challege website 1. This paper is a summary paper describig the tasks i the Blizzard 2014 challege. 2.1. Database used 2. Blizzard 2014 tasks Speech ad text data for six Idia laguages i) Assamese, ii) Gujarati, iii) Hidi, iv) Rajasthai, v) Tamil ad vi) Telugu were released. The speech data for each laguage was 2 hours (sampled at 16 KHz), recorded by professioal speakers i a high quality studio eviromet. Alog with the speech data the correspodig text was provoded i UTF-8 format. No other iformatio, like segmet labels was provided as part of the challege. However, there was o restrictio o the particpats to lear / use iformatio like phoesets or labels from other resources. 1 http://www.festvox.org/blizzard/ For the ature of scripts ad souds of Idia laguage please refer to [11]. 2.2. Tasks Blizzard challege 2014 cosisted of two tasks, a hub task ad ad a spoke task. Hub task 2014-IH1 : Participats were asked to build oe voice i each laguage from the provided data, i accordace of the rules of the challege. The subtasks were umbered from IH1.1 to IH1.6 correspodig to the six laguages : IH1.1 (Assamese [AS]), IH1.2 (Gujarati [GU]), IH1.3 (Hidi [HI]), IH1.4 (Rajasthai [RJ]), IH1.5 (Tamil [TA]) ad IH1.6 (Telugu [TE]). Spoke task 2014-IH2 : Participats had to sythesize multiligual seteces cotaiig Idia laguage text as well as Eglish. The subtasks were umbered from IH2.1 to IH2.6 correspodig to the six laguages : IH2.1 (Assamese [AS]), IH2.2 (Gujarati [GU]), IH2.3 (Hidi [HI]), IH2.4 (Rajasthai [RJ]), IH2.5 (Tamil [TA]) ad IH2.6 (Telugu [TE]). For the IH1 task (hub task), the sythetic voices were evaluated through listeig tests o the followig test data (for each Idia laguage) Read speech (RD) - 100 distict seteces, ot a part of the traiig data Sematically upredictable seteces (SUS) - 50 distict seteces ot a part of the RD/traiig data The SUS seteces were prepared i the followig maer. 50 seteces i each laguage were radomly selected, ad POS taggig was performed o these seteces. The words i each setece were the reordered as Subject Object Verb Cojuctio Subject Object Verb to geerate the SUS setece. For the IH2 task (spoke task), the systems were evaluated through listeig tests by sythesizig the followig test data (for each Idia laguage + Eglish combiatio) Multiligual seteces (ML) - 50 distict seteces cotaiig both Idia laguage as well as Eglish words.

No laguage tags were provided i the ML seteces. The participats were expected to idetify the laguage from the Uicode code poit. 2.3. Participats i the challege The participats i the Blizzard challege 2014 cosisted of the seve participats listed i Table 1. To aoimyze the results, the systems are idetified usig letters, with A deotig atural speech, B deotig the baselie system ad C to K deotig the systems submitted by the participats i the challege. Each participat could submit as may systems as they wished. Table 1: Participats i Blizzard challege 2014 Short ame Details Sythesis method NATURAL Natural speech BASE Baselie system HMM NITECH Nagoya Istitute of HMM Techology USTCP Natioal Egieerig Hybrid (IH1.3) / Laboratory of Speech & Laguage HMM (remaiig) Iformatio Processig (Primary system) CMU Caregie Mello Uiversity HMM S4A Simple4All project HMM + DNN cosortium ILSP Istitute for Laguage ad USS Speech Processig / Ioetics IITMS IIT Madras HMM (IH1.3,IH1.4 ad IH1.6) / (Secodary system) USS (remaiig) IITMP IIT Madras USS (IH1.3,IH1.4 ad IH1.6) / (Primary system) HMM (remaiig) MILE-TTS Dept. of Electrical Egg, USS Idia Istitute of Sciece USTCS Natioal Egieerig HMM Laboratory of Speech & Laguage Iformatio Processig (Secodary system) 2.4. Baselie systems Baselie systems were built for each laguage usig the speaker idepedet HTS-2.2 + STRAIGHT scripts 2. The data was labeled at the phoe level usig the HMM labelig script (EHMM) i FestVox 3 [12]. For letter to soud rules a set of simple aive first order approximatios were used for each laguage. 3. Evaluatio The participats were asked to sythesize the complete test set, out of which a subset was used i the listeig tests. The listeig tests for IH1.1 - IH1.6 cosisted of te sectios while the listeig tests for IH2.1 - IH2.6 cosisted of five sectios. The differet sectios of the listeig tests are described below. Listeig tests for IH1.1 - IH1.6 1. two sectios for similarity (oe sectio usig RD ad oe sectio usig SUS) 2. seve sectios for aturaless (four sectios usig RD ad three sectios usig SUS) 3. oe sectio for itelligibility usig SUS Listeig tests for IH2.1 - IH2.6 1. oe sectio for similarity 2 http://hts.sp.itech.ac.jp/?dowload 3 http://www.festvox.org 2. four sectios for aturaless The methodology of scorig i the various sectios of the listeig tests are described below. Similarity : The listeer plays a few samples of the origial speaker ad oe sythetic sample. The listeer the chooses a respose that represeted how similar the sythetic voice souded as compared to the origial speakers voice o a scale from 1 : Souds like a totally differet perso to 5 : Souds exactly like the same perso Naturaless : The listeer listees to a sample of sythetic speech ad chooses a score which represets how atural or uatural the setece souded o a scale of 1 : Completely Uatural to 5 : Completely Natural Itelligibility : Listeers liste to a utterace ad type i what they hear. Word Error Rate (WER) is computed i the same maer it is computed for speech recogitio tasks. For the list of chages made i the evaluatio portal to eable the coduct of listeig tests i Idia laguages, please refer to [11] 4. Results The followig listeer types were used for the listeig tests : Paid users Olie voluteers Apart from these types of listeers, we also experimeted with coductig listeig tests o Amazo mechaical turk (AMT). Table 2 shows the statistics of the differet listeer types for the tasks. Table 2: User statistics for the Blizzard 2014 tasks 4.1. Results Task Paid Olie AMT Users voluteers users IH1.1 + IH1.1 106 09 - IH1.2 + IH2.1 50 0 - IH1.3 + IH2.3 100 09 54 IH1.4 + IH2.4 101 09 - IH1.5 + IH2.5 100 09 55 IH1.6 + IH2.6 100 06 44 For the six laguages i the IH1 hub task (IH1.1 - IH1.6), Figures 1 to 6 ad Figures 7 to 12 show the similarity ad aturaless results o RD ad SUS respectively. The itelligibility results for the hub task (IH1.1 - IH1.6) are show i Figures 13 to 18. For the spoke task (IH2.1 - IH2.6), Figures 19 to 24 show the similarity ad aturaless results o ML.

For a detailed discussio of the results, please refer to the papers describig each system submitted by idividual participats, available o the Blizzard Challege website. 5. Coclusios The coclusios draw from the results of the Blizzard challege 2014 are : The high quality audio recordigs provided decet performaces by all systems All teams performed better tha the baselie system. This ca be attributed to the fact that ope source toolkits typically require sufficiet tuig to make them work better for ew/arbitrary laguages. There does ot seem to be much utility i computig WER as a measure of itelligibility for Idia laguages. Some teams performed better o the ML task as compared to RD ad SUS. s obtaied from Amazo mechaical turk listeers show too much oise ad variability i the score. These listeers ca ot be used as a alterative to paid listeers. 6. Refereces [1] A. W. Black ad K. Tokuda, The Blizzard Challege - 2005 : Evaluatig corpus-based speech sythesis o commo datasets, i Proceedigs of Itespeech 2005, Lisbo, 2005. [2] C. L. Beett, Large scale evaluatio of corpus-based sythesizers : Results ad lessos from the Blizzard Challege 2005, i Proceedigs of Iterspeech 2005, 2005. [3] C. L. Beett ad A. W. Black, The Blizzard Challege 2006, i Blizzard Challege Workshop, Iterspeech 2006 - ICSLP satellite evet, 2006. [4] M. Frazer ad S. Kig, The Blizzard Challege 2007, i Proceedigs Blizzard Workshop 2007 (i Proc. SSW6), 2007. [5] V. Karaiskos, S. Kig, R. Clark, ad C. Mayo, The Blizzard Challege 2008, i Proceedigs Blizzard Workshop 2008, 2008. [6] S. Kig ad V. Karaiskos, The Blizzard Challege 2009, i Proceedigs Blizzard Workshop 2009, 2009. [7], The Blizzard Challege 2010, i Proceedigs Blizzard Workshop 2010, 2010. [8], The Blizzard Challege 2011, i Proceedigs Blizzard Workshop 2011, 2011. [9], The Blizzard Challege 2012, i Proceedigs Blizzard Workshop 2012, 2012. [10], The Blizzard Challege 2013, i Proceedigs Blizzard Workshop 2013, 2013. [11] K. Prahallad, A. Vadapalli, N. Elluru, G. Matea, B. Pulugudla, P. Bhaskararao, H. A. Murthy, S. Kig, V. Karaiskos, ad A. W. Black, The Blizzard Challege 2013 Idia Laguage Tasks, i Proceedigs Blizzard Workshop 2013, 2013. [12] A. W. Black ad K. Lezo, Buildig voices i the festival speech sythesis system, 2002, available Olie: http://festvox.org/bsv. [13] R. Clark, M. Podsiadlo, M. Fraser, C. Mayo, ad S. Kig, Statistical aalysis of the Blizzard Challege 2007 listeig test results, i Proceedig Blizzard Workshop 2007 (i Proceedigs SSW6), 2007.

RD Mea Opiio s (similarity to origial speaker) IH1.1 Paid listeers, RD Mea Opiio s (aturaless) IH1.1 Paid listeers 106 106 106 106 106 106 106 106 424 424 424 424 424 424 424 424 A B C D E F G I A B C D E F G I Figure 1: Similarity ad Naturaless results o RD for IH1.1 (Assamese) RD Mea Opiio s (similarity to origial speaker) IH1.2 Paid listeers, RD Mea Opiio s (aturaless) IH1.2 Paid listeers 50 50 50 50 50 50 50 50 50 200 200 200 200 200 200 200 200 200 Figure 2: Similarity ad Naturaless results o RD for IH1.2 (Gujarati)

RD Mea Opiio s (similarity to origial speaker) IH1.3 Paid listeers, RD Mea Opiio s (aturaless) IH1.3 Paid listeers 100 100 100 100 100 100 100 100 100 100 400 400 400 400 400 400 400 400 400 400 K K Figure 3: Similarity ad Naturaless results o RD for IH1.3 (Hidi) RD Mea Opiio s (similarity to origial speaker) IH1.4 Paid listeers, RD Mea Opiio s (aturaless) IH1.4 Paid listeers 101 101 101 101 101 101 101 101 101 404 404 404 404 404 404 404 404 404 Figure 4: Similarity ad Naturaless results o RD for IH1.4 (Rajasthai)

RD Mea Opiio s (similarity to origial speaker) IH1.5 Paid listeers, RD Mea Opiio s (aturaless) IH1.5 Paid listeers 100 100 100 100 100 100 100 100 100 100 400 400 400 400 400 400 400 400 400 400 J J Figure 5: Similarity ad Naturaless results o RD for IH1.5 (Tamil) RD Mea Opiio s (similarity to origial speaker) IH1.6 Paid listeers, RD Mea Opiio s (aturaless) IH1.6 Paid listeers 100 100 100 100 100 100 100 100 100 400 400 400 400 400 400 400 400 400 Figure 6: Similarity ad Naturaless results o RD for IH1.6 (Telugu)

SUS Mea Opiio s (similarity to origial speaker) IH1.1 Paid listeers SUS Mea Opiio s (aturaless) IH1.1 Paid listeers 106 106 106 106 106 106 106 106 318 318 318 318 318 318 318 318 A B C D E F G I A B C D E F G I Figure 7: Similarity ad Naturaless results o SUS for IH1.1 (Assamese) SUS Mea Opiio s (similarity to origial speaker) IH1.2 Paid listeers SUS Mea Opiio s (aturaless) IH1.2 Paid listeers 50 50 50 50 50 50 50 50 50 150 150 150 150 150 150 150 150 150 Figure 8: Similarity ad Naturaless results o SUS for IH1.2 (Gujarati)

SUS Mea Opiio s (similarity to origial speaker) IH1.3 Paid listeers SUS Mea Opiio s (aturaless) IH1.3 Paid listeers 100 100 100 100 100 100 100 100 100 100 300 300 300 300 300 300 300 300 300 300 K K Figure 9: Similarity ad Naturaless results o SUS for IH1.3 (Hidi) SUS Mea Opiio s (similarity to origial speaker) IH1.4 Paid listeers SUS Mea Opiio s (aturaless) IH1.4 Paid listeers 101 101 101 101 101 101 101 101 101 303 303 303 303 303 303 303 303 303 Figure 10: Similarity ad Naturaless results o SUS for IH1.4 (Rajasthai)

SUS Mea Opiio s (similarity to origial speaker) IH1.5 Paid listeers SUS Mea Opiio s (aturaless) IH1.5 Paid listeers 100 100 100 100 100 100 100 100 100 100 300 300 300 300 300 300 300 300 300 300 J J Figure 11: Similarity ad Naturaless results o SUS for IH1.5 (Tamil) SUS Mea Opiio s (similarity to origial speaker) IH1.6 Paid listeers SUS Mea Opiio s (aturaless) IH1.6 Paid listeers 100 100 100 100 100 100 100 100 100 300 300 300 300 300 300 300 300 300 Figure 12: Similarity ad Naturaless results o SUS for IH1.6 (Telugu)

WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.1 Paid listeers) 106 101 102 99 103 102 103 104 A B C D E F G I WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.2 Paid listeers) 49 49 48 47 48 50 48 48 49 Itelligibility results o SUS for IH1.1 (As- Figure 13: samese) Figure 14: Itelligibility results o SUS for IH1.2 (Gujarati) WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.3 Paid listeers) 99 99 95 100 100 100 99 100 99 100 K WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 SUS Word error rate (IH1.4 Paid listeers) 101 46 99 100 101 101 101 100 101 Figure 15: Itelligibility results o SUS for IH1.3 (Hidi) Itelligibility results o SUS for IH1.4 (Ra- Figure 16: jasthai)

SUS Word error rate (IH1.5 Paid listeers) SUS Word error rate (IH1.6 Paid listeers) WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 96 95 85 98 99 98 99 92 97 95 WER (%) 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 100 89 86 98 98 100 99 100 95 J Figure 17: Itelligibility results o SUS for IH1.5 (Tamil) Figure 18: Itelligibility results o SUS for IH1.6 (Telugu)

ML Mea Opiio s (similarity to origial speaker) IH2.1 Paid listeers ML Mea Opiio s (aturaless) IH2.1 Paid listeers 106 106 106 106 106 424 424 424 424 424 A B C D E A B C D E Figure 19: Similarity ad Naturaless results o ML for IH2.1 (Assamese) ML Mea Opiio s (similarity to origial speaker) IH2.2 Paid listeers ML Mea Opiio s (aturaless) IH2.2 Paid listeers 50 50 50 50 50 200 200 200 200 200 A B C D E A B C D E Figure 20: Similarity ad Naturaless results o ML for IH2.2 (Gujarati)

ML Mea Opiio s (similarity to origial speaker) IH2.3 Paid listeers ML Mea Opiio s (aturaless) IH2.3 Paid listeers 100 100 100 100 100 100 100 300 300 300 300 300 300 300 A B C D E F K A B C D E F K Figure 21: Similarity ad Naturaless results o ML for IH2.3 (Hidi) ML Mea Opiio s (similarity to origial speaker) IH2.4 Paid listeers ML Mea Opiio s (aturaless) IH2.4 Paid listeers 101 101 101 101 101 101 404 404 404 404 404 404 A B C D E F A B C D E F Figure 22: Similarity ad Naturaless results o ML for IH2.4 (Rajasthai)

ML Mea Opiio s (similarity to origial speaker) IH2.5 Paid listeers ML Mea Opiio s (aturaless) IH2.5 Paid listeers 100 100 100 100 100 100 400 400 400 400 400 400 A B C D E J A B C D E J Figure 23: Similarity ad Naturaless results o ML for IH2.5 (Tamil) ML Mea Opiio s (similarity to origial speaker) IH2.6 Paid listeers ML Mea Opiio s (aturaless) IH2.6 Paid listeers 100 100 100 100 100 100 400 400 400 400 400 400 A B C D E F A B C D E F Figure 24: Similarity ad Naturaless results o ML for IH2.6 (Telugu)