Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Music Information Retrieval 2. Automatic Tagging 3. Musical Content 4. Future Work MIR for Jazz - Dan Ellis 212-11-15 1 /29
Machine Listening Extracting useful information from sound Describe Automatic Narration Emotion Music Recommendation Classify Environment Awareness ASR Music Transcription Sound Intelligence VAD Speech/Music Environmental Sound Speech Dectect Task... like (we) animals do MIR for Jazz - Dan Ellis Music Domain 212-11-15 2 /29
1. The Problem We have a lot of music Can computers help? Applications archive organization musicological insight? music recommendation MIR for Jazz - Dan Ellis 212-11-15 3 /29
Music Information Retrieval (MIR) Small field that has grown since ~2 musicologists, engineers, librarians significant commercial interest MIR as musical analog of text IR find stuff in large archives Popular tasks genre classification chord, melody, full transcription music recommendation Annual evaluations Standard test corpora - pop MIR for Jazz - Dan Ellis 212-11-15 4 /29
2. Automatic Tagging Statistical Pattern Recognition: Finding matches to training examples 16 14 signal segment Sensor Pre-processing/ segmentation Feature extraction 12 1 8 Need: Feature design 6 2 4 6 8 1 12 feature vector Labeled training examples Applications Genre Instrumentation Artist Studio... MIR for Jazz - Dan Ellis 212-11-15 5 /29 class Classification Post-processing
Features: MFCC Mel-Frequency Cepstral Coefficients the standard features from speech recognition Sound spectra audspec cepstra MFCCs FFT X[k] Mel scale freq. warp log X[k] IFFT Truncate.5.5.25.255.26.265.27 time / s 1 5 x 1 4 1 2 3 freq / Hz 2 1 5 1 15 freq / Mel 1 5 5 1 15 freq / Mel 2 2 1 2 3 quefrency MIR for Jazz - Dan Ellis 212-11-15 6 /29
Representing Audio MFCCs are short-time features (25 ms) Sound is a trajectory in MFCC space Describe whole track by its statistics Audio MFCC features freq / khz MFCC bin 8 7 6 5 4 3 2 1 2 18 16 14 12 1 8 6 4 2 VTS_4_1 - Spectrogram 1 2 3 4 5 6 7 8 9 time / sec 1 2 3 4 5 6 7 8 9 time / sec 3 2 1-1 -2 level / db 2 15 1 5-5 -1-15 -2 value MFCC dimension 2 18 16 14 12 1 8 6 4 2 MFCC Covariance Matrix MFCC covariance 5 1 15 2 MFCC dimension 5-5 MIR for Jazz - Dan Ellis 212-11-15 7 /29
MFCCs for Music Can resynthesize MFCCs by shaping noise gives an idea about the information retained Freddy Freeloader Original freq / Hz 3644 177 86 418 23 MFCCs mel quefrency 12 1 8 6 4 2 Resynthesis freq / Hz 3644 177 86 418 23 1 2 3 4 5 6 time / s MIR for Jazz - Dan Ellis 212-11-15 8 /29
Ground Truth MajorMiner: Free-text tags for 1s clips 4 users, 75 unique tags, 7, taggings Mandel & Ellis 8 Example: drum, bass, piano, jazz, slow, instrumental, saxophone, soft, quiet, club, ballad, smooth, soulful, easy_listening, swing, improvisation, 6s, cool, light MIR for Jazz - Dan Ellis 212-11-15 9 /29
Classification Ground truth Sound Chop into 1 sec blocks MFCC (2 dims) 2 Mean Covariance µ (6 dims) Σ (399 samples) Standardize across train set C, γ One-vs-all SVM Average Prec. MFCC features + human ground truth + standard machine learning tools MIR for Jazz - Dan Ellis 212-11-15 1/29
Classification Results Classifiers trained from top 5 tags 1 Soul Eyes freq / Hz 2416 1356 761 427 24 135 _9s club trance end drum_bass singing horns punk samples silence quiet noise solo strings indie house alternative r_b funk soft ambient british distortion drum_machine country keyboard saxophone fast instrumental electronica 8s voice beat slow rap hip_hop jazz piano techno dance female bass vocal pop electronic rock synth male guitar drum 5 1 15 2 25 3 4 8 12 16 2 24 28 32 time / s MIR for Jazz - Dan Ellis 212-11-15 11/29 1.5 1.5.5 1 1.5 2
3. Musical Content MFCCs (and speech recognizers) don t respect pitch pitch is important visible in spectrogram freq / Hz 3644 177 86 418 23 46 48 5 52 54 time / s Pitch-related tasks note transcription chord transcription matching by musical content ( cover songs ) MIR for Jazz - Dan Ellis 212-11-15 12/29
Note Transcription feature representation Poliner & Ellis 5, 6, 7 feature vector Training data and features: MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). Normalized magnitude STFT. classification posteriors Classification: N-binary SVMs (one for ea. note). Independent frame-level classification on 1 ms grid. Dist. to class bndy as posterior. hmm smoothing Temporal Smoothing: Two state (on/off) independent HMM for ea. note. Parameters learned from training data. Find Viterbi sequence for ea. note. MIR for Jazz - Dan Ellis 212-11-15 13/29
Polyphonic Transcription Real music excerpts + ground truth Frame-level transcription Estimate the fundamental frequency of all notes present on a 1 ms grid 1.25 1..75.5.25 Precision Recall Acc Etot Esubs Emiss Efa Note-level transcription Group frame-level predictions into note-level transcriptions by estimating onset/offset 1.25 1..75.5.25 MIREX 27 Precision Recall Ave. F-measure Ave. Overlap MIR for Jazz - Dan Ellis 212-11-15 14/29
Chroma Features Idea: Project onto 12 semitones regardless of octave maintains main musical distinction invariant to musical equivalence no need to worry about harmonics? Fujishima 1999 Warren et al. 23 chroma G F D C freq / khz 4 3 2 1 chroma G F D C A 5 1 15 fft bin 2 4 6 8 A time / sec 5 1 15 2 25 time / frame C(b) = N M k= B(12 log 2 (k/k ) b)w (k) X[k] W(k) is weighting, B(b) selects every ~ mod12 MIR for Jazz - Dan Ellis 212-11-15 15/29
level / db Chroma Resynthesis Chroma describes the notes in an octave... but not the octave Can resynthesize by presenting all octaves... with a smooth envelope Shepard tones - octave is ambiguous y b (t) = M o=1 12 Shepard tone spectra -1-2 -3-4 -5-6 5 1 15 2 25 freq / Hz freq / khz endless sequence illusion W (o + b b ) cos 2o+ 12 w t 12 4 3 2 1 Shepard tone resynth Ellis & Poliner 27 2 4 6 8 1 time / sec MIR for Jazz - Dan Ellis 212-11-15 16/29
Chroma Example Simple Shepard tone resynthesis can also reimpose broad spectrum from MFCCs freq / Hz 3644 177 86 418 23 Freddie chroma A G E D C freq / Hz 3644 177 86 418 23 1 2 3 4 5 6 time / s MIR for Jazz - Dan Ellis 212-11-15 17/29
Simplest thing is energy envelope Onset detection freq / khz 8 6 e(n )= Harnoncourt W/2 n= W/2 Bello et al. 25 w[n] x(n + n ) 2 Maracatu emphasis on high frequencies? f X(f,t) level / db 4 2 6 5 4 3 2 1 f f X(f,t) level / db 6 5 4 3 2 1 2 4 6 8 1 time / sec 2 4 6 8 1 time / sec MIR for Jazz - Dan Ellis 212-11-15 18/29
Tempo Estimation Beat tracking (may) need global tempo period τ otherwise problem is not optimal substructure Pick peak in onset envelope autocorrelation after applying human preference window check for subbeat 4 2-2 8 8.5 9 9.5 1 1.5 11 11.5 12 time / s 4 2 2 1 Onset Strength Envelope (part) Raw Autocorrelation Windowed Autocorrelation -1.5 1 1.5 2 2.5 3 3.5 4 Secondary Tempo Period lag / s Primary Tempo Period MIR for Jazz - Dan Ellis 212-11-15 19/29
Beat Tracking by Dynamic Programming To optimize C({t i })= N i=1 O(t i )+ N i=2 F (t i t i 1, p) define C*(t) as best score up to time t then build up recursively (with traceback P(t)) O(t) C*(t) τ t C*(t) = O(t) + max{αf(t τ, τ p ) + C*(τ)} τ P(t) = argmax{αf(t τ, τ p ) + C*(τ)} τ final beat sequence {ti} is best C* + back-trace MIR for Jazz - Dan Ellis 212-11-15 2/29
Beat Tracking Results Prefers drums & steady tempo Soul Eyes 5 1 15 2 25 time / s 1 2 3 4 5 6 7 8 9 period / 4 ms samp MIR for Jazz - Dan Ellis 212-11-15 21/29
freq / Hz Beat-Synchronous Chroma Record one chroma vector per beat compact representation of harmonies 3644 177 86 418 23 Freddy Beat-sync chroma B A G E D C 2 4 6 8 1 12 time / beat Resynth 3644 177 86 418 23 Resynth +MFCC 3644 177 86 418 23 1 2 3 4 5 6 time / s MIR for Jazz - Dan Ellis 212-11-15 22/29
Chord Recognition chroma bin G E D C A Beat synchronous chroma look like chords 5 1 15 2 C-E-G B-D-G A-C-E A-C-D-F... can we transcribe them? Two approaches manual templates (prior knowledge) learned models (from training data) time / sec MIR for Jazz - Dan Ellis 212-11-15 23/29
Chord Recognition System Analogous to speech recognition Gaussian models of features for each chord Hidden Markov Models for chord transitions Sheh & Ellis 23 Beat track Audio 1-16 Hz BPF Chroma beat-synchronous chroma features HMM Viterbi chord labels test train 25-4 Hz BPF Chroma Root normalize Gaussian Unnormalize 24 Gauss models B A G E D C B A G E C maj C D E G A B c min Labels Resample b D C C D E G A B a g Count transitions 24x24 transition matrix f e d c B A G F E D C C D E F G A B c d e f g a b MIR for Jazz - Dan Ellis 212-11-15 24/29
Chord Recognition Often works: Audio freq / Hz 2416 761 Let It Be/6-Let It Be Ground truth chord Beatsynchronous chroma 24 G E F D C B A C G A:min A:min/b7 F:maj7 Recognized But only about 6% of the time F:maj6 C G F C C G A:min C G a F C G F C G a 2 4 6 8 1 12 14 16 18 A:min/b7 F:maj7 MIR for Jazz - Dan Ellis 212-11-15 25/29
What did the models learn? Chord model centers (means) indicate chord templates :.4.35 PCP_ROT family model means (train18) DIM DOM7 MAJ MIN MIN7.3.25.2.15.1.5 C D E F G A B C 5 1 15 2 25 (for C-root chords) MIR for Jazz - Dan Ellis 212-11-15 26 /29
Chords for Jazz How many types? freq / Hz 2416 761 Freddy logf sgram 24 1 2 3 4 5 6 Freddy beat sync chroma time / sec chroma A#B G# A F# G D# EF C# D C Freddy chord likelihoods + Viterbi path chords a# g# f# e A# cd G# F# D E C Freddy chord based chroma reconstruction chroma A#B G# A F# G D# EF C# D C 2 4 6 8 1 12 time / beats MIR for Jazz - Dan Ellis 212-11-15 27/29
Future Work Matching items cover songs / standards similar instruments, styles chroma bin 12 1 8 6 4 2 12 1 8 6 4 2 12 1 8 6 4 2 Between the Bars Elliot Smith pwr.25 12 13 14 15 16 17 Between the Bars Glenn Phillips pwr.25 17 beats trsp 2 12 13 14 15 16 17 pointwise product (sum(12x3) = 19.34) 12 13 14 15 16 17 time / beats Analyzing musical content solo transcription & modeling musical structure And so much more... MIR for Jazz - Dan Ellis 212-11-15 28/29
Summary Finding Musical Similarity at Large Scale Low-level features Classification and Similarity browsing discovery production Music audio Melody and notes Key and chords Tempo and beat Music Structure Discovery modeling generation curiosity MIR for Jazz - Dan Ellis 212-11-15 29/29