Beat-Synchronous hroma Representations for Music nalysis an Ellis Laboratory for Recognition and Organization of Speech and udio ept. Electrical Eng., olumbia Univ., NY US dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. hroma eatures. Beat Tracking 3. Matching over Songs. rtist Identification Beat-hroma Representations - Ellis 7-5-3-1 /3
Beyond Ms... Ms have been useful in udio Music IR timbral similarity artist I, segmentation, thumbnailing, singing... Separate tradition of Symbolic MIR melody matching, chord detection, meter analysis It s time to bring them together... with robust audio mid-level representations... that capture tonal (melodic-harmonic) content freq / khz 3 Let It Be / Beatles / verse 1 freq / khz 3 Let It Be / Nick ave / verse 1 1 1 6 8 1 time / sec = beat-synchronous features Beat-hroma Representations - Ellis 7-5-3 - /3 6 8 1 time / se
Piano scale 1. hroma eatures hroma features map spectral energy into one canonical octave freq / khz 3 1 i.e. 1 semitone bins 6 8 1 time / sec 1 3 5 6 7 time / frames an resynthesize as Shepard Tones all octaves at once level / db Piano tic scale 1 Shepard tone spectra -1 - -3 - -5-6 5 1 15 5 freq / Hz Beat-hroma Representations - Ellis 7-5-3-3 /3 freq / khz 3 1 I Shepard tone resynth 6 8 1 time / sec
alculating hroma eatures Method 1: Map every STT bin blurs non-tonal energy freq / khz 3 1 5 1 15 fft bin 6 8 1 time / sec 5 1 15 5 3 time / frame Method : Map only STT peaks still blurry at low frequencies freq / khz 3 1 5 1 15 fft bin 6 8 1 time / sec 5 1 15 5 3 time / frame Method 3: Instantaneous requency / t escapes frequency resolution limit ( 3 ) 1 freq / khz Beat-hroma Representations - Ellis 7-5-3 - /3 6 8 1 time / sec 5 1 15 5 3 time / frame
. Beat Tracking (1) oal: One feature vector per beat (tatum) for tempo normalization, efficiency Onset Strength Envelope sumf(max(, difft(log X(t, f) ))) freq / mel 3 1 5 1 15 time / sec utocorr. + window global tempo estimate 168.5 BPM 1 3 5 6 7 8 9 1 lag / ms samples Beat-hroma Representations - Ellis 7-5-3-5 /3
Beat Tracking () ynamic Programming finds beat times {t i } optimizes i O(t i ) + i W((t i+1 t i p )/ ) where O(t) is onset strength envelope (local score) W(t) is a log-aussian window (transition cost) p is the default beat period per measured tempo incrementally find best predecessor at every time backtrace from largest final score to get beats *(t) O(t) τ t *(t) = γ O(t) + (1 γ)max{w((τ τ p )/β)*(τ)} τ P(t) = argmax{w((τ τ p )/β)*(τ)} τ Beat-hroma Representations - Ellis 7-5-3-6 /3
freq / Bark band freq / Bark band Beat Tracking Results P will bridge gaps (non-causal) 3 1 there is always a best path... nd place in MIREX 6 Beat Tracking compared to McKinney & Moelants human data 3 1 lanis Morissette - ll I Want - gap + beats 18 18 186 188 19 19 time / sec test (Bragg) - McKinney + Moelants Subject data Subject # 5 1 time / s 15 Beat-hroma Representations - Ellis 7-5-3-7 /3
Beat-Synchronous hroma eatures Beat + features / 3ms frames average within each beat compact; sufficient? &# 3,5-.-6,7 %# $# "# 89/,)-/),9:); # ;8+-1*9/ ;8+-1*9/ "$ "# ( ' & $ #! "# )*+,-.-/, "! "$ "# ( ' & $! "# "! $# $! %# %! )*+,-.-1,)/ Beat-hroma Representations - Ellis 7-5-3-8 /3
freq / khz 3. over Song etection over Songs = reinterpretation of a piece different instrumentation, character no match with timbral features 3 Let It Be - The Beatles Let It Be / Beatles / verse 1 freq / khz 3 Let It Be - Nick ave Let It Be / Nick ave / verse 1 with raham Poliner 1 1 6 8 1 time / sec Need a different representation! beat-synchronous features Beat-sync features 6 8 1 Beat-sync features time / se 5 1 15 5 beats 5 1 15 5 beat Beat-hroma Representations - Ellis 7-5-3-9 /3
bins E Matching (1): Little ragments over versions may change song structure multiple local matches at different alignments Match query and target as many small pieces? extract Query 1 3 5 beats cross-correlate andidate how big are the pieces? how do we combine individual scores? do we have all day? bins E 1 3 5 beats Beat-hroma Representations - Ellis 7-5-3-1/3
Matching (): lobal orrelation ross-correlate entire beat- matrices... at all possible transpositions implicit combination of match quality and duration bins bins skew / semitones E E +5 Elliott Smith - Between the Bars 1 3 5 beats @81 BPM len Phillips - Between the Bars ross-correlation -5-5 - -3 - -1 1 3 skew / beats One good matching fragment is sufficient...? Beat-hroma Representations - Ellis 7-5-3-11/3
iltered ross-orrelation Raw correlation not as important as precise local match looking for large contrast at ±1 beat skew i.e. high-pass filter skew / semitones ross-correlation +5-5 -5 - -3 - -1 1 3 skew / beats ross-correlation @ skew = + semitones.6 raw.. filtered -5 - -3 - -1 1 3 skew / beats Beat-hroma Representations - Ellis 7-5-3-1/3
Results (1): Ellis 3 set 3 pairs of cover songs from uspop +... one correct match per query Query Take_Me_To_The_River/annie_lennox Let_It_Be/nick_cave I_Love_You/faith_hill I_an_t_et_No_Satisfaction/rolling_stones Hush/milli_vanilli rand_illusion/styx old_ust_woman/sheryl_crow od_only_knows/brian_wilson aith/limp_bizkit Enjoy_The_Silence/tori_amos ay_tripper/cheap_trick ome_together/beatles ocaine/nazareth laudette/roy_orbison ecilia/simon_and_garfunkel aroline_no/brian_wilson Blue_ollar_Man/styx Between_The_Bars/glen_phillips Before_You_ccuse_Me/eric_clapton merica/simon_and_garfunkel ll_long_the_watchtower/dave_matthews_band ddicted_to_love/tina_turner bracadabra/sugar_ray over Songs - dpwe3-1/3 correct b d l m Be Be Bl a e l o o a En a o o r Hu I_ I_ Le Ta Beat-hroma Representations - Ellis 7-5-3-13/3 Test
Results (): MIREX 6 over song contest 3 songs x 11 versions of each (!) (data has not been disclosed) # true covers in top 1 8 systems compared ( cover song + similarity) ound 761/33 = 3% recall next best: 11% guess: 3% song-set (each row is one query song) 1 3 5 6 7 8 9 1 11 1 13 1 15 16 17 18 19 1 3 5 6 7 8 9 3 MIREX 6 over Song Results: # overs retrieved per song per system S E KL1 KL KWL KWT LR TP cover song systems similarity systems 8 6 correct matches retrieved Beat-hroma Representations - Ellis 7-5-3-1/3
Where are the matches? Look inside global cross-correlation to find matching fragments... xcorr = t f ( 1 (t, f) (t, f)) - view along time Let It Be / Beatles (beats 11-1) 5 1 15 5 3 35 Let It Be / Nick ave (beats 13-3) time / beats 5 1 15 5 3 35 time / beats.. -. 5 1 15 5 3 35 time / beats Beat-hroma Representations - Ellis 7-5-3-15/3
What are the mistakes? alse reject - missed true match cover version is too different, beat tracking wrong... alse alarm - invalid match ocaine (lapton) vs. Satisfaction (Stones) Eric lapton - ocaine - beats 17:17 1 3 5 6 7 8 9 1 Rolling Stones - Satisfaction - beats 1:111 1 3 5 6 7 8 9 1 1-1 - 1 3 5 6 7 8 9 1 Beat-hroma Representations - Ellis 7-5-3-16/3
. rtist Identification (I) Baseline system: Bag of (timbral) frames M frames, model as aussian or MM distance by likelihood or KL ataset: [Mandel et al. 6] 18 artists x 5 or 6 albums each 18x3 albums for training, 18x for test, 1x1 dev u tina_turner roxette rolling_stones queen pink_floyd metallica madonna green_day genesis garth_brooks fleetwood_mac depeche_mode dave_matthews_band ence_clearwater_revival bryan_adams beatles aerosmith ae be br crdade fl gage gr mamepi qu ro ro t track 15 1 5-5 -1 true u ti ro ro qu pi me ma gr ge ga fl de da cr br be ae aebebr cr dade fl gage grmamepi quro ro t recog 15 1 5 Beat-hroma Representations - Ellis 7-5-3-17/3
Beat hroma eatures for I? rtists may use tonality in particular ways... density, variety particular chords (influence of instruments on features) Northern Lad (1998) @ 1:35 (tatum=38 BPM) 1 1 8 6 6 8 ars and uitars (5) @ 1:5 (tatum=333 BPM) 1 1 8 6 6 8 Try bag-of-frames on beat- rep n use several consecutive beats? key-normalization of each piece? Beat-hroma Representations - Ellis 7-5-3-18/3
Key Normalization ould try matching at all possible rotations.... or just transpose every piece initially single aussian model of one piece find ML rotation of other pieces model all transposed pieces iterate until convergence aligned Taxman Eleanor Rigby I'm Only Sleeping Love You To ligned lobal model Yellow Submarine She Said She Said ood ay Sunshine nd Your Bird an Sing aligned Beat-hroma Representations - Ellis 7-5-3-19/3
Timbre+hroma I Preliminary Mandel18 rtist I accuracy: eature Model T win cc Exec. time M ullov 1 8% 1 s M 6 MM 1 33% 195 s hroma ullov 1 15% 6 s hroma ullov 1% 117 s hroma 6MM 1 % 85 s hroma 6MM 15% s hromakn ullov 1 17% 11 s hromakn ullov 1% 58 s hromakn 6MM 1 5% 533 s hromakn 6MM 16% 583 s M + hroma fusion 5% Beat-hroma Representations - Ellis 7-5-3 - /3
rtist ragments Idea: ind the most discriminant beat- fragments per artist k-means cluster 16 beat fragments within piece with ourtenay otton keep fragments largest ratio (avg. similarity to same artist)/(avg. sim. to others) classify test pieces by I of best-scoring fragment! Beat-hroma Representations - Ellis 7-5-3-1/3
rtist ragment Results Preliminary, 5 way artist I, ~3% correct need to search more fragments way to choose phrase beginnings? a basis set for all tonal content?! Beat-hroma Representations - Ellis 7-5-3 - /3
onclusions and uture Work Beat-synchronous features are successful for matching cover songs captures melody-harmony, not instruments urther uses: Beat- fragments as musical building blocks e.g. VQ over large body of music find recurrent motifs artist identification? ode available! oogle matlab features Beat-hroma Representations - Ellis 7-5-3-3/3