Data Driven Music Understanding

ata riven Music Understanding an Ellis Laboratory for Recognition and Organization of Speech and udio ept. Electrical Engineering, olumbia University, NY US http://labrosa.ee.columbia.edu/ 1. Motivation: What is Music? 2. Eigenrhythms 3. Melodic-Harmonic Fragments 4. Example pplications ata-riven Music Understanding - an Ellis 2013-05-15 1/31

LabROS : Machine Listening Extracting useful information from sound escribe utomatic Narration Emotion Music Recommendation lassify Environment wareness SR Music Transcription Sound Intelligence V Speech/Music Environmental Sound Speech ectect Task... like (we) animals do ata-riven Music Understanding - an Ellis Music omain 2013-05-15 2 /31

1. Motivation: What is music? What does music evoke in a listener s mind?? Which are the things that we call music? ata-riven Music Understanding - an Ellis 2013-05-15 3 /31

Oodles of Music Bertin-Mahieux et al. 09 What can you do with a million tracks? ata-riven Music Understanding - an Ellis 2013-05-15 4 /31

www.nature.com/scientificreports www.nature.com/scientificreports Re-use in Music Serrà et al. 2012 What are the most popular chord progressions in pop music? ata-riven Music Understanding - an Ellis 2013-05-15 5 /31

Potential pplications ompression lassification Manipulation ata-riven Music Understanding - an Ellis 2013-05-15 6 /31

2. Eigenrhythms: rum Track Structure To first order, all pop music has the same beat: Ellis & rroyo 04 an we learn this from examples? ata-riven Music Understanding - an Ellis 2013-05-15 7 /31

Basis Sets ombine a few basic patterns to make a larger dataset data weights X = W H patterns 1 = 0-1 ata-riven Music Understanding - an Ellis 2013-05-15 8 /31

rum Pattern ata Tempo normalization + downbeat alignment ata-riven Music Understanding - an Ellis 2013-05-15 9 /31

NMF Eigenrhythms Posirhythm 1 Posirhythm 2 HH HH SN SN B B Posirhythm 3 Posirhythm 4 HH HH SN SN B B Posirhythm 5 Posirhythm 6 HH HH SN SN B B 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 samples (@ 2 1 2 3 4 1 2 3 4 beats (@ 120 Nonnegative: only add beat-weight 0.1 0-0.1 ata-riven Music Understanding - an Ellis 2013-05-15 10/31

Eigenrhythm BeatBox Resynthesize rhythms from eigen-space ata-riven Music Understanding - an Ellis 2013-05-15 11/31

3. Melodic-Harmonic Fragments How similar are two pieces? an we find all the pop-music clichés? ata-riven Music Understanding - an Ellis 2013-05-15 12/31

MF Features Used in speech recognition Let It Be (LIB-1) - log-freq specgram freq / Hz 5915 1396 329 MFs oefficient 12 10 8 6 4 2 Noise excited MF resynthesis (LIB-2) freq / Hz 5915 1396 329 0 5 10 15 20 25 time / sec ata-riven Music Understanding - an Ellis 2013-05-15 13/31

hroma Features Idea: Project onto 12 semitones regardless of octave Fujishima 1999 chroma F freq / khz 4 3 2 1 chroma F 0 50 100 150 fft bin 2 4 6 8 time / sec 50 100 150 200 250 time / frame M X = mapping signal chroma ata-riven Music Understanding - an Ellis 2013-05-15 14/31

hroma Features To capture musical content Let It Be - log-freq specgram (LIB-1) freq / Hz 6000 1400 300 chroma bin B E hroma features Shepard tone resynthesis of chroma (LIB-3) freq / Hz 6000 1400 300 MF-filtered shepard tones (LIB-4) freq / Hz 6000 1400 300 0 5 10 15 20 25 time / sec ata-riven Music Understanding - an Ellis 2013-05-15 15/31

Beat-Synchronous hroma ompact representation of harmonies Let It Be - log-freq specgram (LIB-1) freq / Hz 6000 1400 300 Onset envelope + beat times chroma bin B E Beat-synchronous chroma Beat-synchronous chroma + Shepard resynthesis (LIB-6) freq / Hz 6000 1400 300 0 5 10 15 20 25 time / sec ata-riven Music Understanding - an Ellis 2013-05-15 16/31

hord Recognition chroma bin E Beat synchronous chroma look like chords 0 5 10 15 20 -E- B-- --E ---F... can we transcribe them? time / sec Two approaches manual templates (prior knowledge) learned models (from training data) ata-riven Music Understanding - an Ellis 2013-05-15 17/31

hord Recognition System nalogous to speech recognition aussian models of features for each chord Hidden Markov Models for chord transitions Sheh & Ellis 2003 Ellis & Weller 2010 Beat track udio 100-1600 Hz BPF hroma beat-synchronous chroma features HMM Viterbi chord labels test train 25-400 Hz BPF hroma Root normalize aussian Unnormalize 24 auss models B E B E maj E B c min Labels Resample b E B a g ount transitions 24x24 transition matrix f e d c B F E E F B c d e f g a b ata-riven Music Understanding - an Ellis 2013-05-15 18/31

hord Recognition Often works: udio freq / Hz 2416 761 Let It Be/06-Let It Be 240 round truth chord Beatsynchronous chroma Recognized E F B :min :min/b7 F:maj7 F:maj6 F :min a F F a :min/b7 F:maj7 0 2 4 6 8 10 12 14 16 18 But only 60-80% of the time ata-riven Music Understanding - an Ellis 2013-05-15 19/31

What did the models learn? hord model centers (means) indicate chord templates : 0.4 0.35 PP_ROT family model means (train18) IM OM7 MJ MIN MIN7 0.3 0.25 0.2 0.15 0.1 0.05 E F B 0 0 5 10 15 20 25 (for -root chords) ata-riven Music Understanding - an Ellis 2013-05-15 20/31

freq / khz 4 3 2 1 Finding over Songs Little similarity in surface audio... Let It Be - The Beatles Let It Be / Beatles / verse 1 freq / khz 4 3 2 1 Ellis & Poliner 07 Ravuri & Ellis 10 Let It Be - Nick ave Let It Be / Nick ave / verse 1 0 2 4 6 8 10 time / sec 0 2 4 6 8 10 time / se.. but appears in beat-chroma Beat-sync chroma features Beat-sync chroma features chroma F chroma F 5 10 15 20 25 beats 5 10 15 20 25 beat ata-riven Music Understanding - an Ellis 2013-05-15 21/31

Large-Scale over Recognition 2 Fourier Transform Magnitude (2FTM) fixed-size feature to capture essence of chromagram: Bertin-Mahieux & Ellis 12 First results on finding covers in 1M songs verage rank meanp random 500,000 0.000 jumpcodes 2 308,369 0.002 2FTM (50 P) 137,117 0.020 ata-riven Music Understanding - an Ellis 2013-05-15 22/31

Finding ommon Fragments luster beat-synchronous chroma patches chroma bins F F F F #32273-13 instances #51917-13 instances #65512-10 instances #9667-9 instances #10929-7 instances #61202-6 instances #55881-5 instances 5 10 15 20 a20-top10x5-cp4-4p0 #68445-5 instances 5 10 15 time / beats ata-riven Music Understanding - an Ellis 2013-05-15 23/31

lustered Fragments chroma bins F depeche mode 13-Ice Machine 199.5-204.6s roxette 03-Fireworks 107.9-114.8s F roxette 04-Waiting For The Rain 80.1-93.5s tori amos 11-Playboy Mommy 157.8-171.3s 5 10 15 20 5 10 15 time / beats... for a dictionary of common themes? ata-riven Music Understanding - an Ellis 2013-05-15 24/31

4. Example pplications: Music iscovery Berenzweig & Ellis 03 onnecting listeners to musicians ata-riven Music Understanding - an Ellis 2013-05-15 25/31

Playlist eneration Mandel, Poliner, Ellis 06 Incremental learning of listeners preferences ata-riven Music Understanding - an Ellis 2013-05-15 26/31

MajorMiner: Music Tagging escribe music using words Mandel & Ellis 07, 08 ata-riven Music Understanding - an Ellis 2013-05-15 27/31

lassification Results lassifiers trained from top 50 tags 01 Soul Eyes freq / Hz 2416 1356 761 427 240 135 _90s club trance end drum_bass singing horns punk samples silence quiet noise solo strings indie house alternative r_b funk soft ambient british distortion drum_machine country keyboard saxophone fast instrumental electronica 80s voice beat slow rap hip_hop jazz piano techno dance female bass vocal pop electronic rock synth male guitar drum 50 100 150 200 250 300 40 80 120 160 200 240 280 320 time / s ata-riven Music Understanding - an Ellis 2013-05-15 28/31 1.5 1 0.5 0 0.5 1 1.5 2

Music Transcription feature representation Poliner & Ellis 05, 06, 07 feature vector Training data and features: MII, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). Normalized magnitude STFT. classification posteriors lassification: N-binary SVMs (one for ea. note). Independent frame-level classification on 10 ms grid. ist. to class bndy as posterior. hmm smoothing Temporal Smoothing: Two state (on/off) independent HMM for ea. note. Parameters learned from training data. Find Viterbi sequence for ea. note. ata-riven Music Understanding - an Ellis 2013-05-15 29/31

MEPsoft Music Engineering rt Projects collaboration between EE and omputer Music enter with ouglas Repetto, Ron Weiss, and the rest of the MEP team ata-riven Music Understanding - an Ellis 2013-05-15 30/31

onclusions Low-level features lassification and Similarity browsing discovery production Music audio Melody and notes Key and chords Tempo and beat Music Structure iscovery modeling generation curiosity Lots of data + noisy transcription + weak clustering musical insights? ata-riven Music Understanding - an Ellis 2013-05-15 31/31