On the use of statistical tools for speech and musical audio processing

Similar documents
Essential Elements Supplemental Book LO: Hold instrument in proper rest and playing position

GRADE 2 UNIT 1: FIRST GRADING PERIOD Month / pacing Big Ideas/ Learning Indicators/ Learning Outcomes Suggested Projects September

Y7 Music Curriculum Overview

Musical Source Separation: Principles and State of the Art

Music Curriculum Map. EQ: How does practice improve performance? Establish routines and warm-ups.

FIFTH GRADE UNIT 1: FIRST GRADING PERIOD Month / pacing Big Ideas/ Learning Intentions/ Learning Outcomes Suggested Projects/Strategies September

NYS Common Core ELA & Literacy Curriculum Grade 9 Module 1 Unit 2 Lesson 3

ML= Musical Literacy, MR= Musical Response, CR= Contextual Relevancy 1

BFI/Doc Society Fund Application Form questions. These are a preview only. Please apply online here

Recycled Rhythms! Use rhythm, music composition and movement to learn about recycling!

1722A Global System Clock Streams (aka Media Clock Streams) Principles and Suggestions

TALENT ACADEMY 2017 Preparation Notes for Submission and Audition in Film International Baccalaureate Diploma Programme (IBDP) DSA-JC Applicants

A Compositional Tool for Computer-Aided Musical Orchestration. Marcelo Caetano Sound and Music Computing group

Introduction This application note describes the VSB-ENC-150E 8-VSB Modulator and its applications.

o Work Experience, General o Open Entry/Exit Distance (Hybrid Online) for online supported courses

PaperStream IP (ISIS) change history

PaperStream IP (ISIS) change history

Dearborn STEM Middle School Music Handbook

Basics How to cite in-text and at end-of-paper

Evaluating Musical Software Using Conceptual Metaphors

FILM PORTFOLIO REVIEW

o Work Experience, General o Open Entry/Exit Distance (Hybrid Online) for online supported courses

JROTCDL.com CADET 104 How to Write Effectively 1

Y9 Music BTEC Level 2 Tech Award Curriculum Overview

Dialectical Journal Template

Anthem. Subject to change based on time and needs of the class

Week One: Focus: Emotions. Aims: o o o o. Objectives: o. Introduction: o o. Development: o. Conclusion: o

MS Arts Audition Boot Camp Online Application Instructions

THE MIDDLE AGES:

SMART Podium interactive pen display

User Guide. Table Of Contents. o o o o o o o o

Subject guide for MSCIS

The following example configurations are intended to show how the

B.1: Identify the elements of music in response to aural prompts and printed music notational

How do I use SmartMusic in my everyday classroom instruction?

English 3201 Poetry Analysis - Notes 2017

!!!!!!!!!! Seventh!Grade,General!Music:!!! Creating!an!Original!Composition!in!ABA!Form!using! Garageband!! Mindy!Rubinlicht! Updated!January!2015!!!

MORE SCREENS, MORE OPTIONS TO VIEW: Q AUSTRALIAN MULTI-SCREEN REPORT

A proven case of plagiarism on an exam will result in an F for the course.

EDUCATION PROGRAM. Educate, Enlighten & IMAX EDUCATION 2009

Media Technology & Instructional Services (MTIS) - Lake Worth Campus

KEYS TO SUCCESS. September 25, PERCEPTIVE DEVICES LLC 8359 Oakdale Ct, Mason, OH 45040, USA

7 th Grade Advanced English Language Arts An investment in knowledge pays the best interest. ~ Ben Franklin

G.fast Analyzer/Field Noise Capture

VocALign PRO 4 (AAX) For Pro Tools 11. User s Manual. Manual Version 6.1. Compatible with Apple Macintosh and Microsoft Windows systems

ThinManager Certification Test Lab 3

44. Jerry Goldsmith. Planet of the Apes: The Hunt (opening)

PL208 Tort Law [Onsite]

EDUCATION PROGRAM. Educate, Enlighten & IMAX EDUCATION 2007

15PT1727. Operating Instructions

Synchronous Capture of Image Sequences from Multiple Cameras. P. J. Narayanan, Peter Rander, Takeo Kanade CMU-RI-TR-95-25

Contexts: Literary Research Essay/Independent Novel Project

Music Appreciation Grade: 3 Stage 1- Desired Results Established Goals: NJCCCS, 2009

Guidelines for Music 48 (Lessons for Credit)

Happily Ever After? A Fairy Tale Unit [1st grade]

Name Period Literary Term Glossary: English I Academic

Music has different functions in different cultures. For example, music can be used in various cultures for:

Guidelines for Private Music Lessons at Swarthmore College

Stephen Graham Bird Award

Choral Music All courses are credited as approved by the Appleton Area School District

Embedded and Ambient System Laboratory. Syllabus for FPGA measurements

PALMETTO HIGH SCHOOL SHOW CHOIR Syllabus

Gfast Analyzer/Field Noise Capture

Duke Ellington School of the Arts English Department. Senior (class of 2019) Summer Reading Task

PLEASE LET US KNOW IF YOU WOULD LIKE THESE DOCUMENTS IN ANOTHER FORMAT

Rock Music History and Appreciation. o Work Experience, General. o Open Entry/Exit. Distance (Hybrid Online) for online supported courses

Student Recital Checklist

Caritas Chorale Member Information

Guide to Using Donovan Lounge Technology

YEAR 8 Greek drama. Name Class. Level (End of SOW) Target Grade. Teacher WWW. Teacher EBI. Literacy Target: Student Action Step

Background information and performance circumstances

APPLICATIONS: TELEVISIONS

MFA Thesis Assessment Rubric

PALMETTO HIGH SCHOOL MIXED CHOIR Syllabus

FIRMWARE RELEASE NOTE

#PEAK2019 Sponsorship Opportunities

S.F.I.M APPLICATION

9.2.2 Lesson 17. Introduction. Standards D R A F T

NYS Common Core ELA & Literacy Curriculum Grade 9 Module 1 Unit 3 Lesson 16

Spatio-Temporal Edge-Based Weighted Fuzzy Filtering for Providing Interlaced Video on a Progressive Display

Madison City 6 th Grade Honor Chorus

1. Clef: Make sure you have the clef that you read indicated at the start of your piece.

VIEWING PATTERNS BROADEN IN Q2 2015: AUSTRALIAN MULTI-SCREEN REPORT

Quartet op.22 Webern

PR indicates a pre-requisite. CO indicates a co-requisite.

Release Type: Firmware Software Hardware New Product. WP-577VH Any Yes N/A

Section 28 Rehabilitative and Community Support Services KEPRO Mapping Document

Wichita State University School of Music Voice Department Handbook

TABLE OF CONTENTS CONTENTS

SCHEDULE FOR THE EVENING:

SCHOOLS AND STUDENTS PERFORMING & RECEIVING AWARDS

Sculpture Walk Jax Exhibition and Competition Entry Form Temporary Outdoor Sculpture Exhibition Main Street Park, Jacksonville, FL

MUSI-6201 Computational Music Analysis

IB THEATRE SENIORS SUMMER ASSIGNMENT 2015

Thursday, April 21st

SMART Room System for Microsoft Lync

REAL-TIME-PHOTOGRAMMETRY BY MEANS OF HIGH-SPEED-VIDEO

RF-TTC FAQs. September 24. Typical questions about timing signals generated by the RF system and transmitted over fibres to TTC system

Welcome to Palm Beach State College Boca Raton Campus. Use the buttons on the left to assist you in using the Multimedia Classroom Equipment.

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Transcription:

On the use f statistical tls fr speech and musical audi prcessing Mathieu Lagrange Analyse / Synthèse Team, IRCAM Mathieu.lagrange@ircam.fr ATIAM

Outline 1. Intrductin 1. Cntext and challenges 2. Past and Present 1. Speech 1. Mdel 2. Applicatins (cding, speaker recgnitin, speech recgnitin) 2. Audi (Music) 1. Sund mdels 2. Retrieving infrmatin within sngs 3. Retrieving infrmatin acrss sngs 3. Future 1. Plyphny handling 2. Building and using prirs 3. Jint estimatin f several musical parameters M. Lagrange Statistical Tls fr Musical Audi Prcessing. 2

Outline 1. Intrductin 1. Cntext and challenges M. Lagrange Statistical Tls fr Musical Audi Prcessing. 3

Technlgical Cntext «We are drwning in infrmatin and starving fr knwledge» R. Rger Needs: Measurement Transmissin Access Aim f a numerical representatin: Precisin Efficiency Relevance Means Mechanical bilgy Psych-acustic Cgnitin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 4

Challenges «Frty-tw! yelled Lnquawl. Is that all yu've gt t shw fr seven and a half millin years' wrk?» D. Adams Music is great t study as it is bth: An bject : arrangement de sns et de silences au curs du temps A functin: mre r less cdified frm f expressin f : Individual feelings (md) Cllective feelings (party, singing, dance) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 5

Audi Prcessing: Past and Present M. Lagrange Statistical Tls fr Musical Audi Prcessing. 6

Vcabulary? STFT MFCCs Chrmas K-means GMMs HMMs M. Lagrange Statistical Tls fr Musical Audi Prcessing. 7

Outline 1. Intrductin 1. Cntext and challenges 2. Past and Present 1. Speech 1. Mdel 2. Applicatins (cding, speaker recgnitin, speech recgnitin) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 8

Speech signal The speech signal is prduced when the air flw cming frm the lungs g thrugh the vcal chrds and the vcal tract. The size and the shape f the vcal tract as well as the vcal chrds excitatins are changing relatively slwly The speech signal can therefre be cnsidered as quasi-statinary ver shrt perid f abut 20 ms. Type f speech prductin Viced: <a>, <e>, Unviced: <s>, <ch>, Plsives: <pe>, <ke> M. Lagrange Statistical Tls fr Musical Audi Prcessing.

Surce / Filter Mdel In the case f an idealized viced speech signal, the vcal chrds are prducing a perfectly peridic harmnic signal The influence f the vcal tract can be cnsidered as a filtering with a given frequency respnse whse maximas are called frmants. M. Lagrange Statistical Tls fr Musical Audi Prcessing.

Surce / Filter Cding Algrithm : Viced / Unviced detectin; Viced case: the surce signal is apprximated with a Dirac cmb: a Dirac cmb whse successive Diracs are respectively T spaced by T as a spectrum which is a Dirac cmb whse successive cmbs are 1/T spaced. Parameters : T, gain Unviced: the surce signal is apprximated by a stchastic signal: Parameter : gain. The Surce signal is next filtered. Parameters : filter cefficients. M. Lagrange Statistical Tls fr Musical Audi Prcessing.

«Cde-Excited Linear Predictive» (CELP) Fr each frame f 20 ms : Aut-Regressive cefficients are cmputed such that the predictin errr is minimized ver the entire duratin f the frame: Quantified cefficients and an index encding the errr signal are transmitted. M. Lagrange Statistical Tls fr Musical Audi Prcessing.

«Cde-Excited Linear Predictive» (CELP) Signal AR Cefficients Residual index M. Lagrange Statistical Tls fr Musical Audi Prcessing.

Vectr Quantizatin Wrks by dividing a large set f pints (vectrs in the feature space) int grups Grups are represented by their centrid Use f standard k-means algrithms t jintly determine the grups and the centrid The set f centrids frm the cdebk At the encding stage Fr each residual frame, the clset centrid is determined The index is transmitted At the decding stage The centrid is retrieved using the index value M. Lagrange Statistical Tls fr Musical Audi Prcessing. 14

Place k-centrids at randm Iterate until stabilisatin K-means Determine assignement Cmpute centrid M. Lagrange Statistical Tls fr Musical Audi Prcessing. 15

K-means (Fig. Wikipedia) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 16

K-means (Fig. Wikipedia) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 17

K-means (Fig. Wikipedia) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 18

K-means Vide (Fig. Wikipedia) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 19

Place k-centrids at randm Iterate until stabilisatin K-means Determine assignement Cmpute centrid M. Lagrange Statistical Tls fr Musical Audi Prcessing. 20

Speaker Recgnitin Classical pattern recgnitin prblem Specific prblems: Methd Open Set / Clsed Set: rejectin prblem Identificatin / Verificatin Text Dependency s1 s3 s2 Feature extractin: mdel each speech with Mel- Frequency Cepstral Cefficients (MFCCs) and their derivatives. Classificatin Text independent: Vectr Quantizatin Cdebks r Gaussian Mixture Mdels (GMMs) Text dependent: Dynamic Time Warping (DTW) r Hidden Markv Mdel (HMM) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 21

Classificatin scheme M. Lagrange Statistical Tls fr Musical Audi Prcessing. 22

Features The decisin system is nt usually fed directly with the sund signal Infer a reduced set f features Smaller `feature space' (fewer dimensins) Simpler mdels (fewer parameters) Less training data needed Expert knwledge is necessary fr efficient inference f meaningful descriptrs r features. Meaningful means here that the features Extract frm the signal interesting prperties fr the task at hand Invariance under irrelevant mdificatin have t be nicely handled by the decisin system M. Lagrange Statistical Tls fr Musical Audi Prcessing. 23

1. Take the Furier transfrm f (a windwed excerpt f) a signal. MFCCs rules? 2. Map the pwers f the spectrum btained abve nt the mel scale, using triangular verlapping windws. 3. Take the lgs f the pwers at each f the mel frequencies. 4. Take the discrete csine transfrm (DCT) 5. The MFCCs are the amplitudes f the resulting spectrum. M. Lagrange Statistical Tls fr Musical Audi Prcessing. 24

MFCCs Rules? M. Lagrange Statistical Tls fr Musical Audi Prcessing. 25

Fr mst classificatin tasks, we put the fcus n the spectral envelpe Speech: frmant Music: genre M. Lagrange Statistical Tls fr Musical Audi Prcessing. 26

Example Audi M. Lagrange Statistical Tls fr Musical Audi Prcessing. 27

Ptentials f the DCT step Observatin f Pls that the main cmpnents capture mst f the variance using a few smth basis functins, smthing away the pitch ripples Principal cmpnents f vwel spectra n a warped frequency scale aren't s far frm the csine basis functins Decrrelates the features. This is imprtant because the MFCC are in mst cases mdelled by Gaussians with diagnal cvariance matrices M. Lagrange Statistical Tls fr Musical Audi Prcessing. 28

The MEL frequency wraping: Issues highly criticized frm a perceptual pint f view (Greenwd) cnceptually: peridicity analysis ver data that are nt peridic anymre (Camach) The Cepstral Cefficients are COSINE cefficients: cannt shift with speaker size t capture the shift in frmant frequencies that ccurs as children grw up and their vcal tracts get lnger Nt a sund representatin: n way t prvide enhancements such as speaker and channel adaptatin, backgrund nise suppressin, surce separatin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 29

Decisin System Frm hard assignment t sft assignment K-means: M. Lagrange Statistical Tls fr Musical Audi Prcessing. 30

Gaussian Mixture Mdels (GMMs) The data is mdeled as a weighted sum f Gaussians Estimatin f the weights, means and variances f the Gaussians can be dne by maximizing the lg-likelihd Usually dne with the E-M algrithm E-step : expectatin M-step : maximisatin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 31

E-M example Start (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 32

E-M example 1-st iteratin (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 33

E-M example 2-nd iteratin (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 34

E-M example 3-rd iteratin (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 35

E-M example 4-th iteratin (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 36

E-M example 5-th iteratin (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 37

E-M example 6-th iteratin (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 38

E-M example 20-th iteratin (Fig. Frm A. Mre s Tutrial) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 39

Density Estimatin (Fig. Wikipedia) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 40

GMMs fr the speaker recgnitin task Given a density f prbability estimated fr each speaker Search the ne that best explains the bserved features Recent systems are mre cmplex than this Universal Backgrund Mdel (UBM) Nuisance Attribute Prjectin (NAP) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 41

Speech recgnitin The aim f an Autmatic Speech Recgnizer (ASR) is t Output the spken wrds Using the speech signal nly. (Fig. frm HTK dcumentatin) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 42

Speech recgnitin An Autmatic Speech Recgnitin System is typically decmpsed int: Feature Extractin: MFCCs Acustic Mdels: HMMs trained fr set f phnes Each phne is mdelled with 3 states Prnunciatin dictinary: cnvert a series f phnes int a wrd Language Mdel: predict the likelihd f specific wrds ccurring ne after anther with n-grams (Fig. frm HTK dcumentatin) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 43

Summary A cnvenient way f mdeling sund is t split the sampled signal int verlapping frames within which the signal is cnsidered as statinary Speech encding: reduced set f parameters that is necessary t synthesize a perceptually similar signal Nn parametric: Vectr Quantizatin (K-means) Speaker recgnitin Need t abstract the signal int a meaningful set f parameters: MFCCs Speech recgnitin Sequentiality is imprtant: frm GMMs t HMMs M. Lagrange Statistical Tls fr Musical Audi Prcessing. 44

Outline 1. Intrductin 1. Cntext and challenges 2. Past and Present 1. Speech 1. Mdel 2. Applicatins (cding, speaker recgnitin, speech recgnitin) 2. Audi (Music) 1. Sund mdels 2. Retrieving infrmatin within sngs 3. Retrieving infrmatin acrss sngs M. Lagrange Statistical Tls fr Musical Audi Prcessing. 45

Retrieving infrmatin within sngs Harmny Pitch Tracking Meldy estimatin Multi-F0 estimatin Chrd estimatin Rhythm Onset detectin Temp estimatin Beat tracking Orchestratin Instrument recgnitin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 46

Multi-F0 Estimatin Frm an bserved spectrum, we want t estimate the fundamental frequency (f0) f each nte. Mst algrithms perfrm an iterative search: Estimate the dminant f0 Remve its cntributin (Fig. frm Klapuri 2004) (Fig. frm Yeh 2010) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 47

Multi-F0 Estimatin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 48

Multi-F0 Estimatin (Fig. frm Yeh 2010) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 49

Chrd Estimatin Relies n a Chrma representatin (Fig. frm Padappuls 2007) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 50

Chrd Estimatin Matches bserved chrmas t chrds templates M. Lagrange Statistical Tls fr Musical Audi Prcessing. 51

Issues with template mdels Wrld is dirty, s are ur mdels Our mdels are clean, let us clean the wrld M. Lagrange Statistical Tls fr Musical Audi Prcessing. 52

Structure Analysis Aims at estimating the musical structure Ex.: Intr, Verse, Chrus, Verse, Chrus, Chrus, Outr Methd: Cmpute features Cmpute the similarity between every features Perfrm segmentatin based n Nvelty Hmgeneity Repetitin Paulus 10 Pauls J., Muller M. and Klapuri A.. AUDIO-BASED MUSIC STRUCTURE ANALYSIS 11th Internatinal Sciety fr Music Infrmatin Retrieval Cnference (ISMIR 2010) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 53

Feature extractin Structure Analysis Timbre: MFCCs Harmny: Chrmas Rhythm. (Fig. frm Paulus 2010) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 54

Self-Similarity Matrix Structure Analysis Fr the 3 features At different granularity (Fig. frm Paulus 2010) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 55

Structure Analysis (Fig. frm Paulus 2010) segmentatin based n Nvelty Hmgeneity Repetitin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 56

What is an nset? Hw d we extract it? Onset Detectin Bell & al J. P. Bell, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler, A tutrial n nset detectin in music signals, Speech and Audi Prcessing, IEEE Transactins n, vl. 13, n. 5, pp. 1035 1047, Sep. 2005. M. Lagrange Statistical Tls fr Musical Audi Prcessing. 57

Onset detectin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 58

Temp estimatin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 59

Beat tracking Frm an estimate f the temp Infer the beat psitin Pssibly the dwn beat M. Lagrange Statistical Tls fr Musical Audi Prcessing. 60

Retrieving infrmatin acrss sngs Classificatin Genre recgnitin Tag inference Similarity Music similarity Cver detectin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 61

Classificatin Methd: [Tzanetakis 02] Agree n mutually exclusive set f tags (the ntlgy) Extract features frm audi (MFCCs and variatins) Train statistical mdels: Due t the high dimensinality f the feature vectrs discriminatives appraches are prefered (SVMs) Segmentatin Smthing decisin using dynamic prgramming (DP) Tzanetakis 02 Tzanetakis, G. Ck, P. Musical Genre Classificatin f Audi Signals IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING 2002 (Fig. frm [Ramna07]) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 62

Frm parametric t nn-parametric K-Nearest Neighbrs: simple but effective nn parametric apprach t classificatin Assuming given A metric that defines the similarity f 2 items Sme labeled items k=15 k=1 M. Lagrange Statistical Tls fr Musical Audi Prcessing. 63

Issues with k-nn D nt scale t large prblems (lts f items) In high dimensins (lts f features), due t the curse f dimensinality, items tend t be equally far frm each thers neighbrs tend t be nn lcal and meaningless M. Lagrange Statistical Tls fr Musical Audi Prcessing. 64

Supprt Vectr Machines (SVMs) Discriminative apprach twards classificatin In the linear case, SVMs aim at maximizing the distance between margin hyperplanes (dashed lines), called the margin M. (Fig. frm [Ramna Phd]) Allws t minimize the structural risk by jintly minimize the Empirical risk Dimensin de Vapnik et Chervnenkis M. Lagrange Statistical Tls fr Musical Audi Prcessing. 65

Kernel-based SVMs Data is usually nn-linearly separable wich lead t the use f sme kernel functin t prject the data int higher dimensinal space M. Lagrange Statistical Tls fr Musical Audi Prcessing. 66

Kernel-based SVMs Let us cnsider a plynmial kernel (Vide) such as K([x1, x2], [y1, y2]) = x1x2+y1y2+(x1^2+y1^2)(x2^2+y2^ 2) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 67

Multi-Class Discriminative Classificatin Usually perfrmed by cmbining binary classifiers Tw appraches: One-vs-all: Fr each class build a classifier fr that class versus the rest s1 s3 s2 Often very imbalanced classifiers (use asymmetric regularizatin) All-vs-all Build a classifier fr each cuple f class A priri a large number f classifiers t build but the pairwise classificatin are faster and the classificatins are balanced (easier t find the best regularizatin) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 68

Multi-Label Discriminative Classificatin Each bject may be tagged using several labels Cmputatinal appraches Pwer Sets C1 C3 C2 Binary Relevance (equivalent t ne-vs-all) Multiple criteria: «Flattening» the ntlgy Research trend: cnsidering the ntlgy structure t benefit frm c-ccurrence labels f different semantic criterin M. Lagrange Statistical Tls fr Musical Audi Prcessing. 69

Music Similarity Questin t slve: «Given a seed sng, prvide us with the entries f the database which are the mst similar» Anntatin type: Artist / Album Methd: Sngs are mdeled as Gaussian mdels f MFCCs prximity f GMMs are cnsidered as similiarity measure Diagnal cvariance GMMs [Aucuturier 04]: Likelihd (requires access t the MFCCs) Mnte carl sampling Full cvariance Gaussian: KL divergence [Aucuturier 04] J.-J. Aucuturier and F. Pachet. Imprving Timbre Similarity: Hw High is the Sky? Jurnal f Negative Results in Speech and Audi Sciences, 1 (1), 2004. M. Lagrange Statistical Tls fr Musical Audi Prcessing. 70

Cver Versin Detectin Questin t slve: «Given a seed sng, prvide us with the entries f the database which are cver versins» Anntatin: cannical sng Methd: [Serra 08] Sngs are mdeled as a time series f Chrmas Cmputatin f the similarity matrix between the tw time series Similarity is measured using Dynamic Prgramming Lcal Alignment [Serra 08] Chrma Binary Similarity and Lcal AlignmentApplied t Cver Sng Identificatin, IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2008 M. Lagrange Statistical Tls fr Musical Audi Prcessing. (Fig. frm [Serr08]) 71

Cver Sng Detectin Chrmagram f Day Tripper Chrma similarity f Day Tripper with a cver (left) nt a cver (right) M. Lagrange Statistical Tls fr Musical Audi Prcessing. 72

Future 1. Issues 1. Descriptin f audi and music 1. Plyphnic 2. Multiple shapes varying in varius ways 2. Statistical Mdeling 1. Curse f dimensinality 2. Sense f structure relevant at multiple levels f temprality 2. Research Trends 1. Plyphny handling by Surce separatin 2. Building and using prirs by perfrming Auditry Scene Analysis (ASA) 3. Jint estimatin f several musical parameters M. Lagrange Statistical Tls fr Musical Audi Prcessing. 73

Vcabulary? STFT MFCCs Chrmas K-means GMMs HMMs M. Lagrange Statistical Tls fr Musical Audi Prcessing. 74

References Music infrmatin Retrieval (MIR) is an emerging field Brwse ISMIR prceedings using Ggle Schlar and use citatin index Stay tuned via Music-IR mailing list G t the MIREX webpage t see hw things wrk and are evaluated M. Lagrange Statistical Tls fr Musical Audi Prcessing. 75

Live cding in Matlab Yu can find the surce here: http://recherche.ircam.fr/equipes/analyse-synthese/lagrange/teaching/atiam11/ cursatiam2011intr.m Yu will need sme external dependencies, web lcatins are prvided in the cde The cde uses cell mde, please lk at the Matlab dcumentatin fr usage M. Lagrange Statistical Tls fr Musical Audi Prcessing. 76