LN 4896 MUSI SINL PROSSIN Lecture 11: hroma and hords 1. eatures for Music udio 2. hroma eatures 3. hord Recognition an llis ept. lectrical ngineering, olumbia University dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e4896/ 4896 Music Signal Processing (an llis) 213-4-8-1 /18
1. eatures for Music udio hallenges of large music databases how to find what we want... uclidean metaphor music tracks as points in space What are the dimensions? sound - timbre, instruments M melody, chords hroma rhythm, tempo Rhythmic bases 4896 Music Signal Processing (an llis) 213-4-8-2 /18
Ms The standard feature for speech recognition Logan 2 Sound spectra audspec cepstra Ms T X[k] Mel scale freq. warp log X[k] IT Truncate.5.5.25.255.26.265.27 time / s 1 5 x 1 4 1 2 3 freq / Hz 2 1 5 1 15 freq / Mel 1 5 5 1 15 freq / Mel 2 2 1 2 3 quefrency 4896 Music Signal Processing (an llis) 213-4-8-3 /18
M xample Resynthesize by imposing spectrum on noise Ms capture instruments, not notes freq / Hz Let It Be - log-freq specgram (LIB-1) 6 14 3 coefficient Ms 12 1 8 6 4 2 freq / Hz Noise excited M resynthesis (LIB-2) 6 14 3 5 4896 Music Signal Processing (an llis) 1 15 2 25 time / sec 213-4-8-4 /18
M rtist lassification 2 rtists x 6 albums each train models on 5 albums, classify tracks from last Model as M mean + covariance per artist single aussian model 2 (mean) + 1 x 19 (covariance) parameters 55% correct (guessing ~5%) onfusion: Ms (acc 55.13%) u2 tori_amos suzanne_vega steely_dan roxette radiohead queen prince metallica madonna led_zeppelin green_day garth_brooks fleetwood_mac depeche_mode dave_matthews_b cure creedence_c_r beatles aerosmith true de da cu cr be ae gr fl ga qu pr me le ma llis 27 to su st ro ra 4896 Music Signal Processing (an llis) 213-4-8-5 /18 u2
2. hroma eatures What about modeling tonal content (notes)? melody spotting chord recognition cover songs... Ms exclude tonal content Polyphonic 75 transcription 7 65 is too hard 6 e.g. sinusoidal 55 5 tracking: Recognized 45 confused by True 4 harmonics hroma features as solution... MII note number 22 24 26 28 3 32 34 4896 Music Signal Processing (an llis) 213-4-8-6 /18
hroma eatures Idea: Project all energy onto 12 semitones regardless of octave maintains main musical distinction invariant to musical equivalence no need to worry about harmonics? ujishima 1999 chroma freq / khz 4 3 2 1 chroma 5 1 15 fft bin 2 4 6 8 time / sec 5 1 15 2 25 time / frame (b) = N M k= B(12 log 2 (k/k ) b)w (k) X[k] W(k) is weighting, B(b) selects every ~ mod12 4896 Music Signal Processing (an llis) 213-4-8-7 /18
Better hroma Problems: blurring of bins close to edges limitation of T bin resolution Solutions: peak picking - only keep energy at center of peaks chroma ( ) freq / khz 4 3 2 1 chroma 2 freq / Hz 2 4 6 8 time / sec 5 1 15 2 time / frame Instantaneous requency - high-resolution estimates adapt tuning center based on histogram of pitches 4896 Music Signal Processing (an llis) 213-4-8-8 /18
level / db hroma Resynthesis hroma describes the notes in an octave... but not the octave an resynthesize by presenting all octaves... with a smooth envelope Shepard tones - octave is ambiguous y b (t) = M o=1 12 Shepard tone spectra -1-2 -3-4 -5-6 5 1 15 2 25 freq / Hz freq / khz endless sequence illusion W (o + b b ) cos 2o+ 12 w t 12 4 3 2 1 Shepard tone resynth llis & Poliner 27 2 4 6 8 1 time / sec 4896 Music Signal Processing (an llis) 213-4-8-9 /18
hroma xample Simple Shepard tone resynthesis can also reimpose broad spectrum from Ms freq / Hz Let It Be - log-freq specgram (LIB-1) 6 14 3 chroma bin hroma features B freq / Hz Shepard tone resynthesis of chroma (LIB-3) 6 14 3 freq / Hz M-filtered shepard tones (LIB-4) 6 14 3 5 4896 Music Signal Processing (an llis) 1 15 2 25 time / sec 213-4-8-1/18
Beat-Synchronous hroma rastically reduce data size by recording one chroma frame per beat Let It Be - log-freq specgram (LIB-1) Bartsch & Wakefield 21 freq / Hz 6 14 3 Onset envelope + beat times chroma bin B Beat-synchronous chroma Beat-synchronous chroma + Shepard resynthesis (LIB-6) freq / Hz 6 14 3 5 1 15 2 25 time / sec 4896 Music Signal Processing (an llis) 213-4-8-11/18
chroma bin 3. hord Recognition Beat synchronous chroma look like chords 5 1 15 2 -- B-- -- ---... can we transcribe them? Two approaches manual templates (prior knowledge) learned models (from training data) time / sec 4896 Music Signal Processing (an llis) 213-4-8-12/18
hord Recognition System nalogous to speech recognition aussian models of features for each chord Hidden Markov Models for chord transitions Sheh & llis 23 Beat track udio 1-16 Hz BP hroma beat-synchronous chroma features HMM Viterbi chord labels test train 25-4 Hz BP hroma Root normalize aussian Unnormalize 24 auss models B B maj B c min Labels Resample b B a g ount transitions 24x24 transition matrix f e d c B B c d e f g a b 4896 Music Signal Processing (an llis) 213-4-8-13/18
HMMs Hidden Markov Models are good for inferring hidden states.8.8 S.1 underlying Markov.1 B generative model.1.1.1.1 each state has.1 emission distribution.7 observations tell us something about state... infer smoothed state sequence p(x q) p(x q).8.6.4.2.8.6.4.2 mission distributions q = q = B q = q = q = B q = 1 2 3 4 observation x 4896 Music Signal Processing (an llis) 213-4-8-14/18 x n x n 3 2 1 3 2 1 p(q n+1 q n ) q n S B State sequence BBBBBBBBBBBBBBBBBB Observation sequence q n+1 S B 1.8.1.1.1.8.1.1.1.7.1 1 S B B B B B B B B B B B B B B B 1 2 3 time step n
S S B.9.1 Model M 1.7.2 q q 1 q 2 q 3 q 4 S S B S B B S B B B HMM Inference HMM defines emission distribution p(x q) and transition probabilities p(q n q n 1 ) Likelihood of observed given state sequence: p({x n } {q n })= p(x n q n )p(q n q n 1 ) B.1.8.2 S B.9.1.7.2.1.8.2 1 x n p(x B) p(x ) States B Observations x 1, x 2, x 3.1.9 S 1 2 3 4 time n ll possible 3-emission paths Q k from S to 1 2 3 Paths.8.2.7 n Observation likelihoods p(x q) x 1 x 2 x 3 2.5.2.1 q{ B.1 2.2 2.3 p(q M) = Πn p(q n q n-1 ) p(x Q,M) = Πn p(x n q n ) p(x,q M).9 x.7 x.7 x.1 =.441.9 x.7 x.2 x.2 =.252.9 x.2 x.8 x.2 =.288.1 x.8 x.8 x.2 =.128 2.5 x.2 x.1 =.5 2.5 x.2 x 2.3 = 1.15 2.5 x 2.2 x 2.3 = 12.65.1 x 2.2 x 2.3 =.56.22.29.3643.65 Σ =.119 Σ = p(x M) =.42 n.8.2.2.1.7 By dynamic programming, we can also identify the best state sequence given just the observations 4896 Music Signal Processing (an llis) 213-4-8-15/18
Key Normalization hord transitions depend on key of piece dominant, relative minor, etc... Taxman leanor Rigby I'm Only Sleeping hord transition probabilities should be key-relative estimate main key of piece rotate all chroma features learn models aligned chroma Love You To ligned lobal model Yellow Submarine She Said She Said ood ay Sunshine nd Your Bird an Sing aligned chroma 4896 Music Signal Processing (an llis) 213-4-8-16/18
hord Recognition Often works: udio freq / Hz 2416 761 Let It Be/6-Let It Be 24 round truth chord Beatsynchronous chroma Recognized B :min :min/b7 :maj7 :maj6 :min a a :min/b7 :maj7 2 4 6 8 1 12 14 16 18 But only about 6% of the time 4896 Music Signal Processing (an llis) 213-4-8-17/18
Summary Music udio eatures capture information useful for classification hroma eatures 12 bins to robustly summarize notes hord Recognition Sometimes easy, sometimes subtle 4896 Music Signal Processing (an llis) 213-4-8-18/18
References B. Logan, Mel frequency cepstral coefficients for music modeling, in Proc. Int. Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2.. llis, lassifying Music udio with Timbral and hroma eatures, in Proc. Int. Symp. Music Inf. Retrieval ISMIR-7, pp. 339-34, Vienna, October 27. T. ujishima, Realtime chord recognition of musical sound: system using common lisp music, In Proc. Int. omp. Music onf., pp. 464 467, Beijing, 1999.. llis and. Poliner, Identifying over Songs With hroma eatures and ynamic Programming Beat Tracking, Proc. ISSP-7, pp. IV-1429-1432, Hawai'i, pril 27. M.. Bartsch and. H. Wakefield, To catch a chorus: Using chroma-based representations for audio thumbnailing, in Proc. I WSP, Mohonk, October 21.. Sheh and. llis, hord Segmentation and Recognition using M-Trained Hidden Markov Models, Int. Symp. Music Inf. Retrieval ISMIR-3, pp. 185-191, Baltimore, October 23. 4896 Music Signal Processing (an llis) 213-4-8-19/18