RBM-PLDA subsystem for the NIST i-vector Challenge

Similar documents
A QUERY BY HUMMING SYSTEM THAT LEARNS FROM EXPERIENCE

Study on evaluation method of the pure tone for small fan

Melodic Similarity - a Conceptual Framework

Ranking Fuzzy Numbers by Using Radius of Gyration

CLASSIFICATION OF RECORDED CLASSICAL MUSIC USING NEURAL NETWORKS

R&D White Paper WHP 119. Mezzanine Compression for HDTV. Research & Development BRITISH BROADCASTING CORPORATION. September R.T.

A METRIC FOR MUSIC NOTATION TRANSCRIPTION ACCURACY

Language and Music: Differential Hemispheric Dominance in Detecting Unexpected Errors in the Lyrics and Melody of Memorized Songs

Scalable Music Recommendation by Search

A Reconfigurable Frame Interpolation Hardware Architecture for High Definition Video

H-DFT: A HYBRID DFT ARCHITECTURE FOR LOW-COST HIGH QUALITY STRUCTURAL TESTING

On the Design of LPM Address Generators Using Multiple LUT Cascades on FPGAs

Compact Beamformer Design with High Frame Rate for Ultrasound Imaging

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /VETECF.2002.

Experimental Investigation of the Effect of Speckle Noise on Continuous Scan Laser Doppler Vibrometer Measurements

A Low Cost Scanning Fabry Perot Interferometer for Student Laboratory

Stochastic analysis of Stravinsky s varied ostinati

4.5 Pipelining. Pipelining is Natural!

Music from an evil subterranean beast

Version Capital public radio. Brand, Logo and Style Guide

Grant Spacing Signaling at the ONU

LISG Laser Interferometric Sensor for Glass fiber User's manual.

Deal or No Deal? Decision Making under Risk in a Large-Payoff Game Show

Precision Interface Technology

Precision Interface Technology

C2 Vectors C3 Interactions transfer momentum. General Physics GP7-Vectors (Ch 4) 1

Cross-Cultural Music Phrase Processing:

Music Technology Advanced Subsidiary Unit 1: Music Technology Portfolio 1

EWCM 900. technical user manual. electronic controller for compressors and fans

A 0.8 V T Network-Based 2.6 GHz Downconverter RFIC

e-workbook TECHNIQUES AND MATERIALS OF MUSIC Part I: Rudiments

VOICES IN JAPANESE ANIMATION: HOW PEOPLE PERCEIVE THE VOICES OF GOOD GUYS AND BAD GUYS. Mihoko Teshigawara

Content-Based Movie Recommendation Using Different Feature Sets

Citrus Station Mimeo Report CES WFW-Lake Alfred, Florida Lake Alfred, Florida Newsletter No. 2 6.

The game of competitive sorcery that will leave you spellbound.

Auditory Stroop and Absolute Pitch: An fmri Study

other islands for four players violin, soprano sax, piano & computer nick fells 2009

Spreadsheet analysis of a hierarchical control system model of behavior. RICHARD S. MARKEN Aerospace Corporation, Los Angeles, California

(2'-6") OUTLINE OF REQUIRED CLEAR SERVICE AREA

Design of Address Generators Using Multiple LUT Cascade on FPGA

Chapter 1: Choose a Research Topic

2017 ANNUAL REPORT. Turning Dreams into Reality FORT BRAGG OUR MISSION: 1, EDUCATION EXPERIENCE EXPLORATION

Making Fraction Division Concrete: A New Way to Understand the Invert and Multiply Algorithm

Chapter 4. Minor Keys and the Diatonic Modes BASIC ELEMENTS

Jump, Jive, and Jazz! - Improvise with Confidence!

Û Û Û Û J Û . Û Û Û Û Û Û Û. Û Û 4 Û Û &4 2 Û Û Û Û Û Û Û Û. Û. Û. Û Û Û Û Û Û Û Û Û Û Û. œ œ œ œ œ œ œ œ. œ œ œ. œ œ.

MARTIN KOLLÁR. University of Technology in Košice Department of Theory of Electrical Engineering and Measurement

Texas Bandmasters Association 2016 Convention/Clinic

ABOVE CEILING. COORDINATE WITH AV INSTALLER FOR INSTALLATION OF SURGE SUPRESSION AND TERMINATION OF OUTLET IN CEILING BOX

Focus: Orff process, timbre, movement, improvisation. Audience: Teachers K-8

Multiple Bunch Longitudinal Dynamics Measurements at the Cornell Electron-Positron Storage Ring

Auburn University Marching Band

SCP725 Series. 3M It s that Easy! Picture this:

Reference. COULTER EPICS ALTRA Flow Cytometer COULTER EPICS ALTRA HyPerSort System. PN CA (August 2010)

TIMBRAL MODELING FOR MUSIC ARTIST RECOGNITION USING I-VECTORS. Hamid Eghbal-zadeh, Markus Schedl and Gerhard Widmer

Options Manual. COULTER EPICS ALTRA Flow Cytometer COULTER EPICS ALTRA HyPerSort System Flow Cytometer. PN AA (August 2010)

TABLE OF CONTENTS. Jacobson and the Meaningful Life Center. Introduction: Birthday Greeting from Rabbi Simon. Postscript: Do You Matter?

This is a repository copy of Temporal dynamics of musical emotions examined through intersubject synchrony of brain activity..

Adapting Bach s Goldberg Variations for the Organ. Siu Yin Lie

Flagger Control for Resurfacing or Moving Operation. One-Lane Two-Way Operation

FM ACOUSTICS NEWS. News for Professionals. News for Domestic Users. Acclaimed the world over: The Resolution Series TM Phono Linearizers/Preamplifiers

Flagger Control for Resurfacing or Moving Operation. One-Lane Two-Way Operation

BRASS TECHNIQUE BARITONE

SUITES AVAILABLE. TO LET Grade A Offices

Singing voice synthesis based on deep neural networks

A Practical and Historical Guide to Johann Sebastian Bach s Solo in A Minor BWV 1013

r r IN HARMONY With Nature A Pioneer Conservationist's Bungalow Home By Robert G. Bailey

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

Hidden Markov Model based dance recognition

DRIVING HOLLYWOOD BROTHERS SISTERS QUARTERLY NEWS 399 MEMBER POWER TABLE OF CONTENTS TEAMSTERS LOCAL 399. APRIL 2018 ISSUE N o 17

Improving Frame Based Automatic Laughter Detection

Keller Central Percussion

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Singer Traits Identification using Deep Neural Network

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

FOR PREVIEW REPRODUCTION PROHIBITED

De-Canonizing Music History

Please note that not all pages are included. This is purposely done in order to protect our property and the work of our esteemed composers.

CHORD GENERATION FROM SYMBOLIC MELODY USING BLSTM NETWORKS

A STRONG PAST BUILDS A BRIGHT FUTURE BROTHERS SISTERS QUARTERLY NEWS TABLE OF CONTENTS TEAMSTERS LOCAL 399. February 2019 ISSUE N o 20

arxiv: v1 [cs.sd] 18 Oct 2017

MUSI-6201 Computational Music Analysis

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

First Step Towards Enhancing Word Embeddings with Pitch Accents for DNN-based Slot Filling on Recognized Text

Semi-supervised Musical Instrument Recognition

Music Composition with RNN

UNION PROUD! QUARTERLY NEWS TABLE OF CONTENTS TEAMSTERS LOCAL 399. AUGUST 2017 ISSUE N o 14. Fraternally, Steve Dayan

Automatic Laughter Detection

SAMPLING-RATE-AWARE NOISE GENERATION

A Survey on: Sound Source Separation Methods

A Discriminative Approach to Topic-based Citation Recommendation

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Joint Image and Text Representation for Aesthetics Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Acoustic Scene Classification

Computational Modelling of Harmony

Automatic Laughter Detection

arxiv: v1 [cs.lg] 15 Jun 2016

Narrative Theme Navigation for Sitcoms Supported by Fan-generated Scripts

Modeling Annotators: A Generative Approach to Learning from Annotator Rationales. Annotators are useful. Annotators are smart!

Transcription:

INTERSPEECH 2014 RBM-PLDA subsystem fo the NIST i-vecto Challenge Segey Novoselov 1, Timu Pekhovsky 1,2, Konstantin Simonchik 1,2, Andey Shulipa 1 1 Depatment of Speake Veification and Identification, Speech Technology Cente Ltd., St. Petesbug, Russia 2 ITMO Univesity, Russia {novoselov,tim,simonchik,shulipa}@speechpo.com Abstact This pape pesents the Speech Technology Cente (STC) system submitted to NIST i-vecto challenge. The system includes diffeent subsystems based on TV-PLDA, TV-SVM, and RBM-PLDA. In this pape we focus on examining the thid RBM-PLDA subsystem. Within this subsystem, we pesent ou RBM extacto of the pseudo i-vecto. Expeiments pefomed on the dataset of NIST-2014 demonstate that although the RBM-PLDA subsystem is infeio to the fome two subsystems in tems of absolute mindcf, duing the final fusion it povides a substantial input into the efficiency of the esulting STC system eaching 0.241 at the mindcf point. Index Tems: NIST SRE, i-vecto, PLDA, SVM, RBM 1. Intoduction NIST i-vecto challenge [1], like the ealie competitions, deals with the task of speake detection. At the same time, the cuent competition diffes fom the pevious ones in thee impotant aspects: All input data ae epesented as i-vectos obtained using an i-vecto extacto unknown to the paticipants. Feedback is possible, which means that each time esults ae submitted, the paticipant is infomed about the mindcf of thei system. All taining data ae unlabeled, i.e. they have no ID labels fo speake, gende, channel, language, etc. The fist of the thee conditions is motivated by the oganizes desie to inteest the Machine Leaning community in the speake detection task. That means that all NIST paticipants ae placed in equal conditions with egad to the ability to geneate efficient i-vectos. Howeve, we think that this has stongly limited new wok by paticipants who develop speake featues based on pomising new paadigms, such as pseudo i-vectos geneated fom an MFCC steam using RBM. The second condition allows paticipants to endlessly impove thei systems (tune its paametes, thesholds, etc), which shifts the accent of the cuent eseach fom theoetical novelty of methods towads sophisticated empiical wok. The thid condition foces eseaches to deal with the poblem of clusteing the taining speake database of NIST i- vecto challenge as much as with solving the main task: This wok was patially financially suppoted by the Govenment of the Russian Fedeation, Gant 074-U01 speake detection. That is due to the fact that today the most successful veification method is the geneative PLDA method in i-vecto space [2-4]. Taining an efficient PLDA model equies a dataset that is labeled with speake IDs and session IDs. Consequently, the task of efficiently taining a PLDA model on an unlabeled taining dataset is educed to the poblem of seeking obust methods to label this dataset. Eos in the clusteing of the taining dataset lead to ceating a weak PLDA model tained on labeling noise. Fo an SVM system the poblems of an unlabeled taining dataset ae not so citical because it is possible to use the whole development set, which does not ovelap with the evaluation set. In this pape we pesent a text-independent STC speake veification system developed by Speech Technology Cente Ltd fo paticipation in the NIST i-vecto challenge competition. The key issue in this pape is the contibution of the RBM i-vecto PLDA subsystem to the efficiency of the esulting STC system fo the NIST i-vecto challenge. We demonstate that although the RBM-PLDA subsystem itself is not as efficient as the baseline subsystems, fusing it with the SVM subsystem gives us ou best esult in the NIST i-vecto challenge. 2. Baseline subsystem This section descibes the baseline subsystems used in ou wok. 2.1. Baseline system fo NIST i-vecto Challenge State-of-the-at speake veification systems ae systems opeating in the i-vecto space. Each such vecto is extacted using joint facto analysis (TV-JFA) fom a whole speake utteance and is a good epesentation of the speake fo any subsequent classifies. The baseline system povided by the oganizes of the NIST i-vecto challenge uses cosine evaluation which is standad in i-vecto technology: i, i enol cos ( ienol, i ) (1) ienol i whee i is the i-vecto of the utteance, i enol is the i- vecto of the taget speake fom the dataset. Since accoding to the conditions of the competition each taget speake has five model i-vectos, in the baseline system the i vecto is obtained by simply aveaging those five vectos. enol We should note that all i-vectos of the set must be whitened. Copyight 2014 ISCA 378 14-18 Septembe 2014, Singapoe

2.2. TV-PLDA subsystem and clusteing poblem Among state-of-the-at speake detection systems, the leading positions ae occupied by PLDA systems [3,4] opeating in the i-vecto space. Accoding to P.Kenny [3], fo an i-vecto set { i 1,..., i R }, obtained fom R utteances belonging to one speake, a standad Gaussian PLDA analyse involves the following distibution fo these vectos: i ( s) m0 Vy( s) whee m is the population mean, the hidden vecto y is the hidden speake facto and has a standad nomal pio and the esidual N( 0, ) is nomally distibuted with mean 0 and a full-covaiance matix in the case of the PLDA system clusteization. The PLDA model makes it possible to calculate P ( i 1 ta), P ( i 1 imp) the maginal likelihood fo taget and imposto hypotheses and, coespondingly, the PLDA scoe: Scoe PLDA (2) P( ienol, i ta) ln (3) P( i imp) P( i imp) enol We also obtained the taget speake i-vecto i enoll using simple aveaging of the given five model i-vectos. Fo the PLDA subsystem we used i-vecto nomalization poposed in [4]. Fo automatic i-vecto segmentation into speake clustes we used ou own modification of the classic Agglomeative Hieachical Clusteing (AHC) algoithm [5]. AHC has been widely used as a speake clusteing stategy in many speake diaization systems. In ou implementation this algoithm was a modified vesion of the classic AHC with a Mean Shift (MS) clusteing stage. A detailed desciption of this AHC modification can be found in [6]. In ou wok we used AHC stages, both with the COS metic and with the PLDA metic. Figue 1 shows the iteative scheme that we use fo clusteing the NIST-2014 devset. In ou expeiments with segmenting this dataset we used only one iteation. 0.288 on the evaluation set [6]. Fo PLDA clusteing and taining we decided to use only those development set vectos that wee built on speech segments longe than 20 seconds. We also deleted all oneelement and two-element clustes and clustes that had ove 50 vectos. As demonstated by ou expeiments, in this way we labeled half of the NIST-2014 devset which is clusteed into 1745 speakes. 2.3. SVM subsystem It is well-known that usage of a disciminative SVM method in combination with anothe geneative method, fo example PLDA [7], poduces a highly efficient speake veification system [8,9]. In ou veification system SVM was applied to the i-vectos afte LDA pojection (l-vectos) [6]. The distance fom a l- vecto l to the SVM hypeplane of the a-th speake is given below: ( a) L enoll )= k yk K(l,lk ) k= 1 (a) enoll f (l,, (4) whee l ae the i-vectos afte LDA (the L suppot vectos k obtained by taining the speake s SVM hypeplane), y k ae the taget values of two classes: {+1} fo the Taget class and {-1} fo the Imposte class fo the given speake. A linea kenel K l, l ) was used. ( k Fo the NIST i-vecto challenge ou SVM system had two chaacteistics: we used symmetization of the SVM scoes as well as S-nomalization of the esulting SVM scoes [6]. As an imposte set fo the SVM and as the S- nomalization set we used the whole development set. A detailed desciption of this subsystem can be found in [6]. 2.4. Calibation and fusion Calibation and fusion of diffeent STC subsystems in diffeent configuations was pefomed accoding to [6]. Using a quality measue function () [6] makes it possible to compensate the shift in mindcf thesholds fo diffeent speech segment duations, which impoves the mindcf value of the veification system. 0 3. RBM-PLDA subsystem Figue 1: The poposed scheme fo clusteing the NIST SRE 2014 dataset Afte the initial COS clusteing which uses thesholds = = 0.29 fo the two AHC stages, the PLDA model tained on this labeled devset yielded mindcf = 0.293 on the NIST-2014 evaluation set. Duing the next iteation we used the PLDA metic to estimate the distance between the i-vectos and did a second PLDA clusteing of the devset using the theshold = 0.20 fo the fist MS clusteing stage. Howeve, the second Bottom- Up stage was again pefomed with the cos metic with the theshold = 0.27. Afte that, the PLDA model tained on the devset of the NIST i-vecto challenge yielded mindcf = 3.1. RBM fo b-vectos extacto The ecent success of Deep Neual Netwoks (DNN, [10]) fo Automatic Speech Recognition (ASR) pompted the speake ecognition community to ty to use Resticted Boltzmann Machines (RBM) fo pseudo i-vecto geneation [11]. We also decided to this technology fo the NIST i- vecto challenge. Figue 2 shows the diagam of ou RBM fo the pseudo i-vecto (b-vecto) extacto. We will use this tem futhe on, even though, stictly speaking, we ae dealing with non-linea RBM tansfomation of the TV i-vecto. The fist visible input laye consists of = 600 eal units. The second binay hidden laye consists of H units. The softmax laye consists of S=1745 units (the numbe of taget speakes). () and () ae the weights of the coesponding links. 379

The output of ou RBM is the posteios of the softmax laye: ( h) = { } { } () whee the full input fo the softmax unit is = + () h, and h ae the states of the hidden units. This scheme is simila to Hinton s DNN [12], which has only one hidden laye, howeve in contast to [12] ou model emains an undiected model. (5) () () (h =1,) = + + whee sigm denotes the sigmoid function. In contast to the disciminative fine tuning phase, the demonstated geneative petaining scheme allows us to model input data stuctue taking into account taget speake labels. 3.2. PCA tansfomation In models with a hidden laye, effects on visible vaiables ae highly coelated, so we can expect that the outputs of ou softmax laye will be highly coelated as well. Fo this eason we apply PCA to the log of the 1745- dimensional vecto of the softmax laye output, in ode to obtain a low-dimensional pseudo i-vecto, which we will efe to as a b-vecto. Figue 2: Diagam of the poposed RBM fo the pseudo-ivecto extacto We do not use a disciminative fine tuning phase fo taining ou extacto, limiting ouselves to geneative petaining of the extacto. This petaining phase is the standad pocedue of geneative RBM taining using contastive divegence. Following [12,13] we ceate a concatenated taining set X= {, }, whee is the k-th input i-vecto out of K devset vectos that wee ealie clusteed in S = 1745 speake clustes by means of PLDA (see section 2.2.). The binay vecto s that coesponds to the s-th taget speake contains zeos, except fo the s-th component, which equals 1. Thus, we implement a supevised scheme fo geneative RBM taining, by feeding both the i-vectos and the labels of the taget speakes into the hybid binay Gaussian input laye X. Such an RBM is paametized by joint distibution of hidden and obseved vaiables: whee the enegy function E is: () (,, h) = (,, h) = (,,) (6) () h () h () (7) h () () Z is the patition function,, and ae the shifts fo the visible, hidden and softmax layes, espectively. Because of i-vecto nomalization, we suppose that standad deviations fo the Gaussian visible laye =1. The posteios of the hidden units necessay in the taining pocess, which ae defined by the obseved vectos and class labels, ae: Figue 3: The choice of the optimal dimension of the pseudo i- vecto. Figue 3 shows the dependency between the spectum of the eigen numbes λ and the numbe of the component of the output vecto of the softmax laye afte the diagonalization of the inteclass covaiance matix obtained fom taining the PCA on the half of the NIST-2014 devset clusteed by means of PLDA iteations. The gaph shows a shap fall in these dependencies. If > 600 the fall is obseved when = 600, and if < 600 the fall happens when =. Ou expeiments demonstate that the best efficiency of the RBM-PLDA subsystem can be obtained when = = 500. 3.3. RBM-PLDA subsystem Afte obtaining the pseudo i-vectos of 1745 speakes fom the labeled pat of the NIST 2014 devset (see Section 2.2), we used them fo ML taining of the standad PLDA model. [1,14]. Fo the RBM b-vecto PLDA subsystem we took a Gaussian PLDA model in the fom (2), whee the numbe of eigenvoices was = 400. Howeve, in contast to the TV i- vecto PLDA subsystem, whee the noise covaiance matix had a diagonal fom, hee it has the full covaiance fom. 380

4. Expeiments The final expeiments wee conducted on the NIST i-vecto challenge data. The data available fo the NIST i-vecto challenge ae development data fo taining systems and a sepaate evaluation set fo the challenge. The speakes used in these datasets ae disjoint. The i-vectos wee obtained fom NIST Speake Recognitions (SRE s) data fom 2004 to 2012. The dimension of the i-vectos is 600. Each vecto has meta infomation, namely the amount of speech (in seconds) used to compute the i-vecto. Segment duations wee sampled fom a log nomal distibution with a mean of 39.58 seconds. 4.1. Development and evalution sets of the NIST i- vecto challenge Development data contain a vey lage numbe of unlabeled i- vectos. These vectos wee constucted fom telephone ecodings of vaious male and female speake voices. Evaluation data consist of sets of five i-vectos defining the taget speake models and of single i-vectos epesenting segments. The numbe of taget speake models is 1,306 (compising 6,530 i-vectos) and the numbe of i-vectos is 9,634 (one i-vecto each). 4.2. Tials fo submission and scoing The full set of tials fo the challenge consists of all possible pais involving a taget speake model and a single i-vecto segment. Thus the total numbe of tials is 12,582,004. The tials ae divided into two subsets: pogess subset and evaluation subset. The pogess subset compises 40% of the tials and is used to monito pogess on the scoeboad. The emaining 60% of the tials fom the evaluation subset and is used to geneate the official final scoes detemined at the end of the challenge. 4.3. Results Tables 1 to 3 demonstate veification esults of sepaate subsystems as well as thei fusion. Fo subsystem compaison we use the values of the pogess subset. The baseline system achieves mindcf = 0.386 on this subset. Fist, the tables show that the stongest subsystem is the SVM veification subsystem, as it is the least vulneable to clusteing poblems. Second, the TV i-vecto PLDA subsystem and the RBM b- vecto PLDA subsystem show compaable veification esults. Table 2 shows that when the two PLDA subsystems ae fused, veification eo is educed by appoximately 7% elative to the best TV i-vecto PLDA subsystem. When the SVM subsystem is fused with the TV i-vecto PLDA subsystem (Table 1), mindcf is educed by 2.7% elative, while when the SVM subsystem is fused with the RBM b-vecto PLDA subsystem (Table 3), it is educed by 7.4% elative. We ague that the eason fo this is the following. The RBM classifie woks as a non-linea tansfome. It enables the tansition to a new b-vecto space, in which it is possible to successfully estimate speake factos and pefom veification using PLDA. s tained in the b-space ae decoelated with systems tained in the i-vecto space, which leads to successful fusion. Table 1. Expeimental esults fo the SVM and i-vecto PLDA subsystems. SVM subsystem 0.259 TV i-vecto PLDA subsystem 0.282 Fusion 0.252 Table 2. Expeimental esults fo the i-vecto PLDA and RBM b-vecto PLDA subsystems TV i-vecto PLDA subsystem 0.282 RBM b-vecto PLDA subsystem 0.289 Fusion 0.263 Table 3. Expeimental esults fo the SVM and RBM b-vecto PLDA subsystems. SVM subsystem 0.259 RBM b-vecto PLDA subsystem 0.289 Fusion 0.241 When all thee of ou subsystems wee fused we could not impove upon ou best mindcf = 0.241. 5. Conclusions This pape pesents the STC system submitted to NIST i- vecto challenge, which includes diffeent subsystems based on TV-PLDA, SVM and RBM-PLDA. In this pape we ed a vesion of the tansfomato TV i- vecto that was tained geneatively using taget class labels. Expeiments conducted on the NIST i-vecto challenge evaluation set show that fusing SVM and RBM-PLDA subsystems is the best option. It enabled us to achieve mindcf = 0.241. 381

6. Refeences [1] The 2013-2014 Speake Recognition- vecto Machine Leaning Challenge, http://www.nist.gov/itl/iad/mig/upload/seivectochallenge_2013-11-18_0.pdf [2] S. J. D. Pince, Pobabilistic linea disciminant analysis fo infeences about identity, in Poc. Intenational Confeence on Compute Vision (ICCV), Rio de Janeio, Bazil, 2007. [3] P. Kenny, Bayesian speake veification with heavy-tailed pios, in Poc. Odyssey 2010 - The Speake and Language Recognition Wokshop, 2010. [4] D. Gacia-Romeo and C. Y. Espy-Wilso, Analysis of i-vecto length nomalization in speake ecognition systems, in Poceedings of Intespeech, Floence, Italy, Aug. 2011. [5] R. O. Duda, P. E. Hat, and D. G. Stok, Patten classification. 2nd edition, John Wiley & Sons, 2001. [6] S. Novoselov, T. Pekhovsky, K. Simonchik STC Speake Recognition fo the NIST i-vecto Challenge (to appea) in Poc. Odyssey 2014 - The Speake and Language Recognition Wokshop, 2014. [7] Belykh, I. N., Kapustin, A. I., Kozlov, A. V., Lohanova, A. I., Matveev, Yu. N., Pekhovsky, T. S., Simonchik, K. K. and Shulipa, А. K., The speake identification system fo the NIST SRE 2010, Infomatics and its Applications, 6 (1):24-31, 2012. [8] N. Dehak et al., "Suppot Vecto Machines vesus Fast Scoing in the Low-Dimensional Total Vaiability Space fo Speake Veification," in Intespeech 2009, Bighton, UK, 2009. [9] A.Kozlov, O. Kudashev, Yu. Matveev, T. Pekhovsky, K. Simonchik and A. Shulipa, Speake ecognition system fo the NIST SRE 2012, SPIIRAS Poceedings, 2, 2013. [10] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senio, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbuy, Deep neual netwoks fo acoustic modeling in speech ecognition, IEEE Signal Pocessing Magazine, Vol. 29 (6), pp. 82-97, 2012. [11] P. Kenny, V. Gupta, T. Stafylakis, P. Ouellet and J. Alam Deep Neual Netwoks fo extacting Baum-Welch statistics fo Speake (to appea) in Poc. Odyssey 2014 - The Speake and Language Recognition Wokshop, 2014. [12] Hinton, G. E., Osindeo, S., & Teh, Y. (2006). A fast leaning algoithm fo deep belief nets. Neual Computation, 18, 1527 1554. [13] Laochelle, H., & Bengio, Y. (2008). Classification using disciminativeesticted Boltzmann machines, pp. 536 543. Helsinki, Finland. [14] T. Pekhovsky, A. Sizov, "Compaison Supevised and Unsupevised Leaning Mixtue of PLDA Models fo Speake Veification ", Patten Recognition Lettes, 34 (2013) 1307 1313 (Ap. 2013) 382