Speech Recognition Combining MFCCs and Image Features

Size: px

Start display at page:

Download "Speech Recognition Combining MFCCs and Image Features"

Lenard Henry
5 years ago
Views:

1 Speech Recognition Combining MFCCs and Image Featres S. Karlos from Department of Mathematics N. Fazakis from Department of Electrical and Compter Engineering K. Karanikola from Department of Mathematics S. Kotsiantis from Department of Mathematics K. Sgarbas from Department of Electrical and Compter Enginnering University of Patras, Greece

2 Aim Combination of adio signal and image featres Exploitation of larger frames for speech signals Increase of classification accracy withot sing complex algorithms

3 Contents Speaker Identification problem Attribtes of speech signals Examine Content Based Image Featres (CBIR) Combination of MFCCs + CBIR Experiments Conclsion

4 Speaker Identification Problem Determines the speaker from a set of registered speakers q This is called a closed set identification q Reslt is the best speaker matched What if the speaker is not in the database? q This is called an open set identification q Reslt can be a speaker or a no-match reslt Or experiment is a closed set identification problem

5 Extraction of adio characteristics Different representations of speech signals: 1. Mel-Freqency Cepstral Coefficients (MFCC) 2. Linear Predictive Codes (LPCs) 3. Perceptal Linear Prediction (PLP) 4. PLP-Relative Spectra (PLP-RASTA) Non-linear behavior of speech Need for adapting signal to hman ear scale Most efficient soltion: MFCCs featres

6 Extraction of image characteristics Spectrogram: time-freqency representation of an adio signal Short-Term Forier Transform (STFT) Different approaches of image processing : 1. Content-Based 2. Featre-Based 3. Appearance-Based Determine the similarity throgh distances of featre vectors

7 Related works Content Based Image Processing (CBIR) techniqes have been widely sed Exploitation of color content and textre information Most known approaches: 1. Local gradient featres along with PCA + HMMs 2. Delta MFCCs 3. 2D Gabor Featres + MLP 4. Featre-Finding Neral Network (FFNN) 5. Wavelet package transform + MKL 6. RANSAC algorithm

8 Proposed Techniqe 1 st view Acqire the first 25 coefficients of MFCCs (0 th has been rejected) Hamming window has been preferred Time dration of each frame eqals to 0.5 seconds Overlap factor eqals to 50% Highest band edge of Mel filters eqals to 4kHz Use of 40 warped spectral bands Logarithmical scale of magnitde spectrm Discrete Cosine Transformation (DCT)

9 Proposed Techniqe 2 nd view Use of AtoColorCorrelogramFilter (atocor) a # " I = γ # # "," I, γ "),"* I = Pr.) 0"),.* 0 p * I "2 dist p ), p * = k Spatial correlation of colors from each image is distilled Not based on prely local properties Effective in recognizing large changes of shape Efficiently compted

10 MFCCs + atocor + SVM

11 Proposed Techniqe Learning stage Spport Vector Machines (SVMs) Hyperplanes that separate two classes Maximizing the margin for redcing the generalization error Can deal with very high dimensional data Efficient implementation throgh LibSVM library Use of polynomial kernel (degree = 3)

12 Data CHAINS Corps Selected mode: Solo speech 36 speakers (28 from Eastern Ireland 8 from UK and USA) 19 different sentences ot of the 33 3 scenarios: 8, 16 and 36 speakers Eqal male and female speakers dring each scenario

13 Experimental procedre Comparison with another 9 image filters Spervised classifiers: 1. SVMs 2. Mlti-Layer Perceptron (MLP) 3. Logistic Regression (LogReg) 10-cross-validation techniqe WEKA tool was sed along with libraries of Lcene Image Retrieval (LIRe) Record comptational time (Intel i3 64bit system - 8GB RAM)

14 Experimental procedre CBIR Filters Initial Nmber of featres Usefl Nmber of featres atocor binpyr clay edhist fcth fzzy gabor jpeg phog simplehist Redction of dimensionality: Remove seless attribtes Size of datasets on instances has been redced dramatically: q 8speakers: abot > q 16speakers: abot > q 36speakers: abot > 5.818

15 Reslts 8 speakers 16 speakers 36 speakers Classifiers MFCCs MFCCs + atocor MFCCs MFCCs + atocor MFCCs MFCCs + atocor SVM Time(sec) MLP Time(sec) LogReg Time(sec)

16 Statistical comparison q q Post-hoc test of Nemenyi CD s length depicts the needed distance for significant difference

17 Experiments A boost of accracy was recorded for all the tested scenarios 11.5%, 7.8% and 9.9% improvement compared with standalone MFCCs Bilding of classification model demands a few seconds Fzzy filtering techniqes performed flctations MFCCs+atocor and MFCCs+binpyr achieved the best reslts The proposed techniqe reqires mch less comptational resorces

18 Conclsions Tackle with Atomatic Speech Recognition (ASR) tasks Increase the featre vector of adio signals Redce the training time Methods based on local featres performed poor reslts Improved generalization behavior for the most SI filters

19 Promising points Extract more specialized featres nder MFCCs + SI featres scheme Parallel implementation Apply mlti-view Semi-spervised techniqes Combination of magnitde with phase related featres (Hartley Phase Spectrm)

20 References M. Lx and S. A. Chatzichristofis, Lire: lcene image retrieval, Proceeding 16th ACM Int. Conf. Mltimed. - MM 08, p. 1085, F. Cmmins, M. Grimaldi, T. Leonard, and J. Simko, The CHAINS Speech Corps: CHAracterizing INdividal Speakers, Proc SPECOM, pp. 1 6, 2006 J. Dennis, H. D. Tran, and H. Li, Spectrogram Image Featre for Sond Event Classification in Mismatched Conditions, IEEE Signal Process. Lett., vol. 18, no. 2, pp , Feb M. Mayo, ImageFilter WEKA filter that ses LIRE to extract image featres, [Online]. Available: I. Paraskevas and M. Rangossi, The hartley phase spectrm as an assistive featre for classification, Lect. Notes Compt. Sci. (inclding Sbser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol LNAI, pp , 2010

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for