Music Information Retrieval for Jazz

Similar documents
Data Driven Music Understanding

Data Driven Music Understanding

Lecture 11: Chroma and Chords

Lecture 15: Research at LabROSA

Searching for Similar Phrases in Music Audio

Extracting and Using Music Audio Information

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Topic 10. Multi-pitch Analysis

MUSI-6201 Computational Music Analysis

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Content-based music retrieval

Features for Audio and Music Classification

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Effects of acoustic degradations on cover song recognition

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

THE importance of music content analysis for musical

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Tempo and Beat Analysis

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Transcription of the Singing Melody in Polyphonic Music

Extracting Information from Music Audio

Week 14 Music Understanding and Classification

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Classification-Based Melody Transcription

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Automatic music transcription

Introductions to Music Information Retrieval

Music Information Retrieval

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Lecture 12: Alignment and Matching

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Music Genre Classification and Variance Comparison on Number of Genres

Chord Classification of an Audio Signal using Artificial Neural Network

CSC475 Music Information Retrieval

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Singing Pitch Extraction and Singing Voice Separation

Computational Modelling of Harmony

Outline. Why do we classify? Audio Classification

Music Radar: A Web-based Query by Humming System

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

Classification-based melody transcription

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Content-based Music Structure Analysis with Applications to Music Semantics Understanding

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Automatic Labelling of tabla signals

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Timing In Expressive Performance

Automatic Rhythmic Notation from Single Voice Audio Sources

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Singer Recognition and Modeling Singer Error

MUSIC is a ubiquitous and vital part of the lives of billions

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

/$ IEEE

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

An Accurate Timbre Model for Musical Instruments and its Application to Classification

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

Music Genre Classification

Toward Automatic Music Audio Summary Generation from Signal Analysis

WE ADDRESS the development of a novel computational

Neural Network for Music Instrument Identi cation

Automatic Piano Music Transcription

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Lecture 9 Source Separation

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Music Alignment and Applications. Introduction

Chord Recognition with Stacked Denoising Autoencoders

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

Query By Humming: Finding Songs in a Polyphonic Database

Music Source Separation

Beat-Synchronous Chroma Representations for Music Analysis


International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Classification of Timbre Similarity

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

A Survey of Audio-Based Music Classification and Annotation

Chord Recognition. Aspects of Music. Musical Chords. Harmony: The Basis of Music. Musical Chords. Musical Chords. Music Processing.

Singer Traits Identification using Deep Neural Network

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Voice & Music Pattern Extraction: A Review

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

Music Information Retrieval Community

Music Information Retrieval with Temporal Features and Timbre

Speech To Song Classification

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music

Transcription:

Music Information Retrieval for Jazz Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,thierry}@ee.columbia.edu http://labrosa.ee.columbia.edu/ 1. Music Information Retrieval 2. Automatic Tagging 3. Musical Content 4. Future Work MIR for Jazz - Dan Ellis 212-11-15 1 /29

Machine Listening Extracting useful information from sound Describe Automatic Narration Emotion Music Recommendation Classify Environment Awareness ASR Music Transcription Sound Intelligence VAD Speech/Music Environmental Sound Speech Dectect Task... like (we) animals do MIR for Jazz - Dan Ellis Music Domain 212-11-15 2 /29

1. The Problem We have a lot of music Can computers help? Applications archive organization musicological insight? music recommendation MIR for Jazz - Dan Ellis 212-11-15 3 /29

Music Information Retrieval (MIR) Small field that has grown since ~2 musicologists, engineers, librarians significant commercial interest MIR as musical analog of text IR find stuff in large archives Popular tasks genre classification chord, melody, full transcription music recommendation Annual evaluations Standard test corpora - pop MIR for Jazz - Dan Ellis 212-11-15 4 /29

2. Automatic Tagging Statistical Pattern Recognition: Finding matches to training examples 16 14 signal segment Sensor Pre-processing/ segmentation Feature extraction 12 1 8 Need: Feature design 6 2 4 6 8 1 12 feature vector Labeled training examples Applications Genre Instrumentation Artist Studio... MIR for Jazz - Dan Ellis 212-11-15 5 /29 class Classification Post-processing

Features: MFCC Mel-Frequency Cepstral Coefficients the standard features from speech recognition Sound spectra audspec cepstra MFCCs FFT X[k] Mel scale freq. warp log X[k] IFFT Truncate.5.5.25.255.26.265.27 time / s 1 5 x 1 4 1 2 3 freq / Hz 2 1 5 1 15 freq / Mel 1 5 5 1 15 freq / Mel 2 2 1 2 3 quefrency MIR for Jazz - Dan Ellis 212-11-15 6 /29

Representing Audio MFCCs are short-time features (25 ms) Sound is a trajectory in MFCC space Describe whole track by its statistics Audio MFCC features freq / khz MFCC bin 8 7 6 5 4 3 2 1 2 18 16 14 12 1 8 6 4 2 VTS_4_1 - Spectrogram 1 2 3 4 5 6 7 8 9 time / sec 1 2 3 4 5 6 7 8 9 time / sec 3 2 1-1 -2 level / db 2 15 1 5-5 -1-15 -2 value MFCC dimension 2 18 16 14 12 1 8 6 4 2 MFCC Covariance Matrix MFCC covariance 5 1 15 2 MFCC dimension 5-5 MIR for Jazz - Dan Ellis 212-11-15 7 /29

MFCCs for Music Can resynthesize MFCCs by shaping noise gives an idea about the information retained Freddy Freeloader Original freq / Hz 3644 177 86 418 23 MFCCs mel quefrency 12 1 8 6 4 2 Resynthesis freq / Hz 3644 177 86 418 23 1 2 3 4 5 6 time / s MIR for Jazz - Dan Ellis 212-11-15 8 /29

Ground Truth MajorMiner: Free-text tags for 1s clips 4 users, 75 unique tags, 7, taggings Mandel & Ellis 8 Example: drum, bass, piano, jazz, slow, instrumental, saxophone, soft, quiet, club, ballad, smooth, soulful, easy_listening, swing, improvisation, 6s, cool, light MIR for Jazz - Dan Ellis 212-11-15 9 /29

Classification Ground truth Sound Chop into 1 sec blocks MFCC (2 dims) 2 Mean Covariance µ (6 dims) Σ (399 samples) Standardize across train set C, γ One-vs-all SVM Average Prec. MFCC features + human ground truth + standard machine learning tools MIR for Jazz - Dan Ellis 212-11-15 1/29

Classification Results Classifiers trained from top 5 tags 1 Soul Eyes freq / Hz 2416 1356 761 427 24 135 _9s club trance end drum_bass singing horns punk samples silence quiet noise solo strings indie house alternative r_b funk soft ambient british distortion drum_machine country keyboard saxophone fast instrumental electronica 8s voice beat slow rap hip_hop jazz piano techno dance female bass vocal pop electronic rock synth male guitar drum 5 1 15 2 25 3 4 8 12 16 2 24 28 32 time / s MIR for Jazz - Dan Ellis 212-11-15 11/29 1.5 1.5.5 1 1.5 2

3. Musical Content MFCCs (and speech recognizers) don t respect pitch pitch is important visible in spectrogram freq / Hz 3644 177 86 418 23 46 48 5 52 54 time / s Pitch-related tasks note transcription chord transcription matching by musical content ( cover songs ) MIR for Jazz - Dan Ellis 212-11-15 12/29

Note Transcription feature representation Poliner & Ellis 5, 6, 7 feature vector Training data and features: MIDI, multi-track recordings, playback piano, & resampled audio (less than 28 mins of train audio). Normalized magnitude STFT. classification posteriors Classification: N-binary SVMs (one for ea. note). Independent frame-level classification on 1 ms grid. Dist. to class bndy as posterior. hmm smoothing Temporal Smoothing: Two state (on/off) independent HMM for ea. note. Parameters learned from training data. Find Viterbi sequence for ea. note. MIR for Jazz - Dan Ellis 212-11-15 13/29

Polyphonic Transcription Real music excerpts + ground truth Frame-level transcription Estimate the fundamental frequency of all notes present on a 1 ms grid 1.25 1..75.5.25 Precision Recall Acc Etot Esubs Emiss Efa Note-level transcription Group frame-level predictions into note-level transcriptions by estimating onset/offset 1.25 1..75.5.25 MIREX 27 Precision Recall Ave. F-measure Ave. Overlap MIR for Jazz - Dan Ellis 212-11-15 14/29

Chroma Features Idea: Project onto 12 semitones regardless of octave maintains main musical distinction invariant to musical equivalence no need to worry about harmonics? Fujishima 1999 Warren et al. 23 chroma G F D C freq / khz 4 3 2 1 chroma G F D C A 5 1 15 fft bin 2 4 6 8 A time / sec 5 1 15 2 25 time / frame C(b) = N M k= B(12 log 2 (k/k ) b)w (k) X[k] W(k) is weighting, B(b) selects every ~ mod12 MIR for Jazz - Dan Ellis 212-11-15 15/29

level / db Chroma Resynthesis Chroma describes the notes in an octave... but not the octave Can resynthesize by presenting all octaves... with a smooth envelope Shepard tones - octave is ambiguous y b (t) = M o=1 12 Shepard tone spectra -1-2 -3-4 -5-6 5 1 15 2 25 freq / Hz freq / khz endless sequence illusion W (o + b b ) cos 2o+ 12 w t 12 4 3 2 1 Shepard tone resynth Ellis & Poliner 27 2 4 6 8 1 time / sec MIR for Jazz - Dan Ellis 212-11-15 16/29

Chroma Example Simple Shepard tone resynthesis can also reimpose broad spectrum from MFCCs freq / Hz 3644 177 86 418 23 Freddie chroma A G E D C freq / Hz 3644 177 86 418 23 1 2 3 4 5 6 time / s MIR for Jazz - Dan Ellis 212-11-15 17/29

Simplest thing is energy envelope Onset detection freq / khz 8 6 e(n )= Harnoncourt W/2 n= W/2 Bello et al. 25 w[n] x(n + n ) 2 Maracatu emphasis on high frequencies? f X(f,t) level / db 4 2 6 5 4 3 2 1 f f X(f,t) level / db 6 5 4 3 2 1 2 4 6 8 1 time / sec 2 4 6 8 1 time / sec MIR for Jazz - Dan Ellis 212-11-15 18/29

Tempo Estimation Beat tracking (may) need global tempo period τ otherwise problem is not optimal substructure Pick peak in onset envelope autocorrelation after applying human preference window check for subbeat 4 2-2 8 8.5 9 9.5 1 1.5 11 11.5 12 time / s 4 2 2 1 Onset Strength Envelope (part) Raw Autocorrelation Windowed Autocorrelation -1.5 1 1.5 2 2.5 3 3.5 4 Secondary Tempo Period lag / s Primary Tempo Period MIR for Jazz - Dan Ellis 212-11-15 19/29

Beat Tracking by Dynamic Programming To optimize C({t i })= N i=1 O(t i )+ N i=2 F (t i t i 1, p) define C*(t) as best score up to time t then build up recursively (with traceback P(t)) O(t) C*(t) τ t C*(t) = O(t) + max{αf(t τ, τ p ) + C*(τ)} τ P(t) = argmax{αf(t τ, τ p ) + C*(τ)} τ final beat sequence {ti} is best C* + back-trace MIR for Jazz - Dan Ellis 212-11-15 2/29

Beat Tracking Results Prefers drums & steady tempo Soul Eyes 5 1 15 2 25 time / s 1 2 3 4 5 6 7 8 9 period / 4 ms samp MIR for Jazz - Dan Ellis 212-11-15 21/29

freq / Hz Beat-Synchronous Chroma Record one chroma vector per beat compact representation of harmonies 3644 177 86 418 23 Freddy Beat-sync chroma B A G E D C 2 4 6 8 1 12 time / beat Resynth 3644 177 86 418 23 Resynth +MFCC 3644 177 86 418 23 1 2 3 4 5 6 time / s MIR for Jazz - Dan Ellis 212-11-15 22/29

Chord Recognition chroma bin G E D C A Beat synchronous chroma look like chords 5 1 15 2 C-E-G B-D-G A-C-E A-C-D-F... can we transcribe them? Two approaches manual templates (prior knowledge) learned models (from training data) time / sec MIR for Jazz - Dan Ellis 212-11-15 23/29

Chord Recognition System Analogous to speech recognition Gaussian models of features for each chord Hidden Markov Models for chord transitions Sheh & Ellis 23 Beat track Audio 1-16 Hz BPF Chroma beat-synchronous chroma features HMM Viterbi chord labels test train 25-4 Hz BPF Chroma Root normalize Gaussian Unnormalize 24 Gauss models B A G E D C B A G E C maj C D E G A B c min Labels Resample b D C C D E G A B a g Count transitions 24x24 transition matrix f e d c B A G F E D C C D E F G A B c d e f g a b MIR for Jazz - Dan Ellis 212-11-15 24/29

Chord Recognition Often works: Audio freq / Hz 2416 761 Let It Be/6-Let It Be Ground truth chord Beatsynchronous chroma 24 G E F D C B A C G A:min A:min/b7 F:maj7 Recognized But only about 6% of the time F:maj6 C G F C C G A:min C G a F C G F C G a 2 4 6 8 1 12 14 16 18 A:min/b7 F:maj7 MIR for Jazz - Dan Ellis 212-11-15 25/29

What did the models learn? Chord model centers (means) indicate chord templates :.4.35 PCP_ROT family model means (train18) DIM DOM7 MAJ MIN MIN7.3.25.2.15.1.5 C D E F G A B C 5 1 15 2 25 (for C-root chords) MIR for Jazz - Dan Ellis 212-11-15 26 /29

Chords for Jazz How many types? freq / Hz 2416 761 Freddy logf sgram 24 1 2 3 4 5 6 Freddy beat sync chroma time / sec chroma A#B G# A F# G D# EF C# D C Freddy chord likelihoods + Viterbi path chords a# g# f# e A# cd G# F# D E C Freddy chord based chroma reconstruction chroma A#B G# A F# G D# EF C# D C 2 4 6 8 1 12 time / beats MIR for Jazz - Dan Ellis 212-11-15 27/29

Future Work Matching items cover songs / standards similar instruments, styles chroma bin 12 1 8 6 4 2 12 1 8 6 4 2 12 1 8 6 4 2 Between the Bars Elliot Smith pwr.25 12 13 14 15 16 17 Between the Bars Glenn Phillips pwr.25 17 beats trsp 2 12 13 14 15 16 17 pointwise product (sum(12x3) = 19.34) 12 13 14 15 16 17 time / beats Analyzing musical content solo transcription & modeling musical structure And so much more... MIR for Jazz - Dan Ellis 212-11-15 28/29

Summary Finding Musical Similarity at Large Scale Low-level features Classification and Similarity browsing discovery production Music audio Melody and notes Key and chords Tempo and beat Music Structure Discovery modeling generation curiosity MIR for Jazz - Dan Ellis 212-11-15 29/29