Music Information Retrieval for Jazz

Size: px

Start display at page:

Download "Music Information Retrieval for Jazz"

Audrey Pitts
6 years ago
Views:

Music Information Retrieval for Jazz Dan Ellis Laboratory

Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.

Music Information Retrieval 2. Automatic Tagging 3.

1 Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA 1. Music Information Retrieval 2. Automatic Tagging 3. Musical Content 4. Future Work MIR for Jazz - Dan Ellis /29

2 Machine Listening Extracting useful information from sound Describe Automatic Narration Emotion Music Recommendation Classify Environment Awareness ASR Music Transcription Sound Intelligence VAD Speech/Music Environmental Sound Speech Dectect Task... like (we) animals do MIR for Jazz - Dan Ellis Music Domain /29

3 1. The Problem We have a lot of music Can computers help? Applications archive organization musicological insight? music recommendation MIR for Jazz - Dan Ellis /29

4 Music Information Retrieval (MIR) Small field that has grown since ~2 musicologists, engineers, librarians significant commercial interest MIR as musical analog of text IR find stuff in large archives Popular tasks genre classification chord, melody, full transcription music recommendation Annual evaluations Standard test corpora - pop MIR for Jazz - Dan Ellis /29

5 2. Automatic Tagging Statistical Pattern Recognition: Finding matches to training examples signal segment Sensor Pre-processing/ segmentation Feature extraction Need: Feature design feature vector Labeled training examples Applications Genre Instrumentation Artist Studio... MIR for Jazz - Dan Ellis /29 class Classification Post-processing

6 Features: MFCC Mel-Frequency Cepstral Coefficients the standard features from speech recognition Sound spectra audspec cepstra MFCCs FFT X[k] Mel scale freq. warp log X[k] IFFT Truncate time / s 1 5 x freq / Hz freq / Mel freq / Mel quefrency MIR for Jazz - Dan Ellis /29

Representing Audio MFCCs are short-time features

Describe whole track by its statistics Audio

1 2 18 16 14 12 1 8 6 4 2 VTS_4_1 - Spectrogram

time / sec 3 2 1-1 -2 level / db 2 15 1 5-5

6 4 2 MFCC Covariance Matrix MFCC covariance 5 1

7 Representing Audio MFCCs are short-time features (25 ms) Sound is a trajectory in MFCC space Describe whole track by its statistics Audio MFCC features freq / khz MFCC bin VTS_4_1 - Spectrogram time / sec time / sec level / db value MFCC dimension MFCC Covariance Matrix MFCC covariance MFCC dimension 5-5 MIR for Jazz - Dan Ellis /29

8 MFCCs for Music Can resynthesize MFCCs by shaping noise gives an idea about the information retained Freddy Freeloader Original freq / Hz MFCCs mel quefrency Resynthesis freq / Hz time / s MIR for Jazz - Dan Ellis /29

9 Ground Truth MajorMiner: Free-text tags for 1s clips 4 users, 75 unique tags, 7, taggings Mandel & Ellis 8 Example: drum, bass, piano, jazz, slow, instrumental, saxophone, soft, quiet, club, ballad, smooth, soulful, easy_listening, swing, improvisation, 6s, cool, light MIR for Jazz - Dan Ellis /29

10 Classification Ground truth Sound Chop into 1 sec blocks MFCC (2 dims) 2 Mean Covariance µ (6 dims) Σ (399 samples) Standardize across train set C, γ One-vs-all SVM Average Prec. MFCC features + human ground truth + standard machine learning tools MIR for Jazz - Dan Ellis /29

Classification Results Classifiers trained from top 5 tags 1 Soul Eyes freq / Hz 2416 1356

solo strings indie house alternative r_b funk soft ambient british distortion drum_machine

jazz piano techno dance female bass vocal pop electronic rock synth male guitar drum 5 1 15

11 Classification Results Classifiers trained from top 5 tags 1 Soul Eyes freq / Hz _9s club trance end drum_bass singing horns punk samples silence quiet noise solo strings indie house alternative r_b funk soft ambient british distortion drum_machine country keyboard saxophone fast instrumental electronica 8s voice beat slow rap hip_hop jazz piano techno dance female bass vocal pop electronic rock synth male guitar drum time / s MIR for Jazz - Dan Ellis /

3. Musical Content MFCCs (and speech recognizers) don t respect pitch pitch is important visible in spectrogram freq / Hz 3644 177 86 418 23 46 48 5 52

12 3. Musical Content MFCCs (and speech recognizers) don t respect pitch pitch is important visible in spectrogram freq / Hz time / s Pitch-related tasks note transcription chord transcription matching by musical content ( cover songs ) MIR for Jazz - Dan Ellis /29

Note Transcription feature representation Poliner & Ellis 5, 6, 7 feature vector Training data and features: MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of

13 Note Transcription feature representation Poliner & Ellis 5, 6, 7 feature vector Training data and features: MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). Normalized magnitude STFT. classification posteriors Classification: N-binary SVMs (one for ea. note). Independent frame-level classification on 1 ms grid. Dist. to class bndy as posterior. hmm smoothing Temporal Smoothing: Two state (on/off) independent HMM for ea. note. Parameters learned from training data. Find Viterbi sequence for ea. note. MIR for Jazz - Dan Ellis /29

14 Polyphonic Transcription Real music excerpts + ground truth Frame-level transcription Estimate the fundamental frequency of all notes present on a 1 ms grid Precision Recall Acc Etot Esubs Emiss Efa Note-level transcription Group frame-level predictions into note-level transcriptions by estimating onset/offset MIREX 27 Precision Recall Ave. F-measure Ave. Overlap MIR for Jazz - Dan Ellis /29

Chroma Features Idea: Project onto 12 semitones regardless of octave

need to worry about harmonics? Fujishima 1999 Warren et al.

8 A time / sec 5 1 15 2 25 time / frame C(b) = N M k= B(12 log 2 (k/k )

15 Chroma Features Idea: Project onto 12 semitones regardless of octave maintains main musical distinction invariant to musical equivalence no need to worry about harmonics? Fujishima 1999 Warren et al. 23 chroma G F D C freq / khz chroma G F D C A fft bin A time / sec time / frame C(b) = N M k= B(12 log 2 (k/k ) b)w (k) X[k] W(k) is weighting, B(b) selects every ~ mod12 MIR for Jazz - Dan Ellis /29

16 level / db Chroma Resynthesis Chroma describes the notes in an octave... but not the octave Can resynthesize by presenting all octaves... with a smooth envelope Shepard tones - octave is ambiguous y b (t) = M o=1 12 Shepard tone spectra freq / Hz freq / khz endless sequence illusion W (o + b b ) cos 2o+ 12 w t Shepard tone resynth Ellis & Poliner time / sec MIR for Jazz - Dan Ellis /29

418 23 Freddie chroma A G E D C freq / Hz 3644 177 86 418

17 Chroma Example Simple Shepard tone resynthesis can also reimpose broad spectrum from MFCCs freq / Hz Freddie chroma A G E D C freq / Hz time / s MIR for Jazz - Dan Ellis /29

25 w[n] x(n + n ) 2 Maracatu emphasis on high frequencies?

18 Simplest thing is energy envelope Onset detection freq / khz 8 6 e(n )= Harnoncourt W/2 n= W/2 Bello et al. 25 w[n] x(n + n ) 2 Maracatu emphasis on high frequencies? f X(f,t) level / db f f X(f,t) level / db time / sec time / sec MIR for Jazz - Dan Ellis /29

19 Tempo Estimation Beat tracking (may) need global tempo period τ otherwise problem is not optimal substructure Pick peak in onset envelope autocorrelation after applying human preference window check for subbeat time / s Onset Strength Envelope (part) Raw Autocorrelation Windowed Autocorrelation Secondary Tempo Period lag / s Primary Tempo Period MIR for Jazz - Dan Ellis /29

20 Beat Tracking by Dynamic Programming To optimize C({t i })= N i=1 O(t i )+ N i=2 F (t i t i 1, p) define C*(t) as best score up to time t then build up recursively (with traceback P(t)) O(t) C*(t) τ t C*(t) = O(t) + max{αf(t τ, τ p ) + C*(τ)} τ P(t) = argmax{αf(t τ, τ p ) + C*(τ)} τ final beat sequence {ti} is best C* + back-trace MIR for Jazz - Dan Ellis /29

21 Beat Tracking Results Prefers drums & steady tempo Soul Eyes time / s period / 4 ms samp MIR for Jazz - Dan Ellis /29

22 freq / Hz Beat-Synchronous Chroma Record one chroma vector per beat compact representation of harmonies Freddy Beat-sync chroma B A G E D C time / beat Resynth Resynth +MFCC time / s MIR for Jazz - Dan Ellis /29

Chord Recognition chroma bin G E D C A Beat synchronous chroma look like chords 5 1 15 2 C-E-G B-D-G A-C-E A-C-D-F... can we transcribe them?

23 Chord Recognition chroma bin G E D C A Beat synchronous chroma look like chords C-E-G B-D-G A-C-E A-C-D-F... can we transcribe them? Two approaches manual templates (prior knowledge) learned models (from training data) time / sec MIR for Jazz - Dan Ellis /29

24 Chord Recognition System Analogous to speech recognition Gaussian models of features for each chord Hidden Markov Models for chord transitions Sheh & Ellis 23 Beat track Audio 1-16 Hz BPF Chroma beat-synchronous chroma features HMM Viterbi chord labels test train 25-4 Hz BPF Chroma Root normalize Gaussian Unnormalize 24 Gauss models B A G E D C B A G E C maj C D E G A B c min Labels Resample b D C C D E G A B a g Count transitions 24x24 transition matrix f e d c B A G F E D C C D E F G A B c d e f g a b MIR for Jazz - Dan Ellis /29

Chord Recognition Often works: Audio freq / Hz 2416 761 Let It Be/6-Let It Be Ground truth chord Beatsynchronous chroma 24 G E F D C B A C G A:min A:min/b7 F:maj7

25 Chord Recognition Often works: Audio freq / Hz Let It Be/6-Let It Be Ground truth chord Beatsynchronous chroma 24 G E F D C B A C G A:min A:min/b7 F:maj7 Recognized But only about 6% of the time F:maj6 C G F C C G A:min C G a F C G F C G a A:min/b7 F:maj7 MIR for Jazz - Dan Ellis /29

26 What did the models learn? Chord model centers (means) indicate chord templates :.4.35 PCP_ROT family model means (train18) DIM DOM7 MAJ MIN MIN C D E F G A B C (for C-root chords) MIR for Jazz - Dan Ellis /29

4 5 6 Freddy beat sync chroma time / sec

likelihoods + Viterbi path chords a# g# f# e

reconstruction chroma A#B G# A F# G D# EF C# D

27 Chords for Jazz How many types? freq / Hz Freddy logf sgram Freddy beat sync chroma time / sec chroma A#B G# A F# G D# EF C# D C Freddy chord likelihoods + Viterbi path chords a# g# f# e A# cd G# F# D E C Freddy chord based chroma reconstruction chroma A#B G# A F# G D# EF C# D C time / beats MIR for Jazz - Dan Ellis /29

Future Work Matching items cover songs / standards similar instruments, styles chroma bin 12 1 8 6 4 2 12 1 8 6 4

25 17 beats trsp 2 12 13 14 15 16 17 pointwise product (sum(12x3) = 19.

28 Future Work Matching items cover songs / standards similar instruments, styles chroma bin Between the Bars Elliot Smith pwr Between the Bars Glenn Phillips pwr beats trsp pointwise product (sum(12x3) = 19.34) time / beats Analyzing musical content solo transcription & modeling musical structure And so much more... MIR for Jazz - Dan Ellis /29

29 Summary Finding Musical Similarity at Large Scale Low-level features Classification and Similarity browsing discovery production Music audio Melody and notes Key and chords Tempo and beat Music Structure Discovery modeling generation curiosity MIR for Jazz - Dan Ellis /29

Data Driven Music Understanding

Data Driven Music Understanding ata riven Music Understanding an Ellis Laboratory for Recognition and Organization of Speech and udio ept. Electrical Engineering, olumbia University, NY US http://labrosa.ee.columbia.edu/ 1. Motivation: