Extracting Information from Music Audio

Similar documents
Data Driven Music Understanding

Extracting and Using Music Audio Information

Data Driven Music Understanding

MUSI-6201 Computational Music Analysis

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Content-based music retrieval

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Supervised Learning in Genre Classification

Music Information Retrieval for Jazz

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Lecture 15: Research at LabROSA

Music Information Retrieval

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Effects of acoustic degradations on cover song recognition

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Music Genre Classification and Variance Comparison on Number of Genres

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Topic 10. Multi-pitch Analysis

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Singer Recognition and Modeling Singer Error

A Survey of Audio-Based Music Classification and Annotation

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Music Recommendation from Song Sets

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Video-based Vibrato Detection and Analysis for Polyphonic String Music

THE importance of music content analysis for musical

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

Outline. Why do we classify? Audio Classification

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

Introductions to Music Information Retrieval

Lecture 11: Chroma and Chords

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Transcription of the Singing Melody in Polyphonic Music

Subjective Similarity of Music: Data Collection for Individuality Analysis

Tempo and Beat Analysis

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

Classification of Timbre Similarity

Recognising Cello Performers using Timbre Models

Automatic Rhythmic Notation from Single Voice Audio Sources

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Classification-Based Melody Transcription

Music Information Retrieval

Music Radar: A Web-based Query by Humming System

Lecture 12: Alignment and Matching

ON THE USE OF PERCEPTUAL PROPERTIES FOR MELODY ESTIMATION

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

CSC475 Music Information Retrieval

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Recognising Cello Performers Using Timbre Models

Query By Humming: Finding Songs in a Polyphonic Database

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Toward Evaluation Techniques for Music Similarity

Chroma-based Predominant Melody and Bass Line Extraction from Music Audio Signals

Singer Identification

Week 14 Music Understanding and Classification

TOWARD UNDERSTANDING EXPRESSIVE PERCUSSION THROUGH CONTENT BASED ANALYSIS

Modeling Music Similarity: Signal-based Models of Subjective Preference Daniel P.W. Ellis, Electrical Engineering, Columbia University

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Automatic Laughter Detection

A New Method for Calculating Music Similarity

Lecture 9 Source Separation

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Music Information Retrieval with Temporal Features and Timbre

Singer Traits Identification using Deep Neural Network

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

MEL-FREQUENCY cepstral coefficients (MFCCs)

Creating a Feature Vector to Identify Similarity between MIDI Files

The song remains the same: identifying versions of the same piece using tonal descriptors

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

Chord Classification of an Audio Signal using Artificial Neural Network

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

A FEATURE SELECTION APPROACH FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Searching for Similar Phrases in Music Audio

Music Information Retrieval Community

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

Semi-supervised Musical Instrument Recognition

2. AN INTROSPECTION OF THE MORPHING PROCESS

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

WE ADDRESS the development of a novel computational

Aalborg Universitet. Feature Extraction for Music Information Retrieval Jensen, Jesper Højvang. Publication date: 2009

Classification-based melody transcription

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Unifying Low-level and High-level Music. Similarity Measures

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Music Database Retrieval Based on Spectral Similarity

Music Similarity and Cover Song Identification: The Case of Jazz

Beethoven, Bach, and Billions of Bytes

Transcription:

Extracting Information from Music Audio Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. Motivation: Learning Music 2. Notes Extraction 3. Drum Pattern Modeling 4. Music Similarity Music Information Extraction - Ellis 2006-05-22 p. 1 /35

LabROSA Overview Information Extraction Music Machine Learning Recognition Separation Retrieval Speech Environment Signal Processing Music Information Extraction - Ellis 2006-05-22 p. 2 /35

1. Learning from Music A lot of music data available e.g. 60G of MP3 1000 hr of audio, 15k tracks What can we do with it? implicit definition of music Quality vs. quantity Speech recognition lesson: 10x data, 1/10th annotation, twice as useful Motivating Applications music similarity (recommendation, playlists) computer (assisted) music generation insight into music Music Information Extraction - Ellis 2006-05-22 p. 3 /35

Ground Truth Data File: /Users/dpwe/projects/aclass/aimee.wav Hz 7000 6500 6000 A lot of unlabeled 5500 5000 4500 4000 3500 music data available 3000 2500 2000 1500 manual annotation is 1000 500 t expensive and rare mus Unsupervised structure discovery possible.. but labels help to indicate what you want Weak annotation sources artist-level descriptions symbol sequences without timing (MIDI) errorful transcripts Evaluation requires ground truth limiting factor in Music IR evaluations? Music Information Extraction - Ellis 2006-05-22 p. 4 /35 f 9 Printed: Tue Mar 11 13:04:28 0:02 0:04 0:06 0:08 0:10 0:12 0:14 0:16 0:18 vox mu

Talk Roadmap Anchor models Similarity/ recommend'n 4 Music audio Semantic bases 1 2 Melody extraction Drums extraction 3 Fragment clustering Eigenrhythms Synthesis/ generation Event extraction? Music Information Extraction - Ellis 2006-05-22 p. 5 /35

2. Notes Extraction Audio Score very desirable for data compression, searching, learning Full solution is elusive signal separation of overlapping voices music constructed to frustrate! Maybe simplify problem: Dominant Melody at each time frame with Graham Poliner 4000 Frequency 3000 2000 1000 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time Music Information Extraction - Ellis 2006-05-22 p. 6 /35

Conventional Transcription Pitched notes have harmonic spectra transcribe by searching for harmonics e.g. sinusoid modeling + grouping 3000 freq / Hz 2500 2000 1500 1000 500 0 0 1 2 3 4 time / s Explicit expert-derived knowledge Music Information Extraction - Ellis 2006-05-22 p. 7 /35

Transcription as Classification Signal models typically used for transcription harmonic spectrum, superposition But... trade domain knowledge for data transcription as pure classification problem: Audio Trained classifier p("c0" Audio) p("c#0" Audio) p("d0" Audio) p("d#0" Audio) p("e0" Audio) p("f0" Audio) single N-way discrimination for melody per-note classifiers for polyphonic transcription Music Information Extraction - Ellis 2006-05-22 p. 8 /35

Melody Transcription Features Short-time Fourier Transform Magnitude (Spectrogram) Standardize over 50 pt frequency window Music Information Extraction - Ellis 2006-05-22 p. 9 /35

Training Data Need {data, label} pairs for classifier training Sources: pre-mixing multitrack recordings + hand-labeling? freq / khz synthetic music (MIDI) + forced-alignment? 2 1.5 1 0.5 0 2 1.5 30 20 10 0 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 time / sec Music Information Extraction - Ellis 2006-05-22 p. 10/35

Melody Transcription Results Trained on 17 examples.. plus transpositions out to +/- 6 semitones All-pairs SVMs (Weka) Tested on ISMIR MIREX 2005 set includes foreground/background detection Rank Participant Overall Accuracy Voicing d Raw Pitch Raw Chroma Runtime / s 1 Dressler 71.4% 1.85 68.1% 71.4% 32 2 Ryynänen 64.3% 1.56 68.6% 74.1% 10970 3 Poliner 61.1% 1.56 67.3% 73.4% 5471 3 Paiva 2 61.1% 1.22 58.5% 62.0% 45618 5 Marolt 59.5% 1.06 60.1% 67.1% 12461 6 Paiva 1 57.8% 0.83 62.7% 66.7% 44312 7 Goto 49.9%* 0.59* 65.8% 71.8% 211 8 Vincent 1 47.9%* 0.23* 59.8% 67.6%? 9 Vincent 2 46.4%* 0.86* 59.6% 71.1% 251 10 Brossier 3.2%* 0.14 * 3.9% 8.1% 41 Example... Music Information Extraction - Ellis 2006-05-22 p. 11/35

Polyphonic Transcription Train SVM detectors for every piano note same features & classifier but different labels 88 separate detectors, independent smoothing Use MIDI syntheses, player piano recordings Bach 847 Disklavier freq / pitch A6 A5 A4 A3 A2 20 10 0-10 A1-20 0 1 2 3 4 5 6 7 8 9 time / sec about 30 min training data level / db Music Information Extraction - Ellis 2006-05-22 p. 12/35

Piano Transcription Results Significant improvement from classifier: frame-level accuracy results: Algorithm Errs False Pos False Neg d SVM 43.3% 27.9% 15.4% 3.44 Klapuri&Ryynänen 66.6% 28.1% 38.5% 2.71 Marolt 84.6% 36.5% 48.1% 2.35 Breakdown by frame type: Classification error % 120 100 80 60 40 20 False Negatives False Positives 0 1 2 3 4 5 6 7 8 # notes present http://labrosa.ee.columbia.edu/projects/melody/ Music Information Extraction - Ellis 2006-05-22 p. 13/35

Melody Clustering Goal: Find fragments that recur in melodies.. across large music database.. trade data for model sophistication Training data Melody extraction 5 second fragments VQ clustering Data sources pitch tracker, or MIDI training data Melody fragment representation DCT(1:20) - removes average, smoothes detail Top clusters Music Information Extraction - Ellis 2006-05-22 p. 14/35

Melody clustering results Clusters match underlying contour: Some interesting matches: e.g. Pink + Nsync Music Information Extraction - Ellis 2006-05-22 p. 15/35

3. Eigenrhythms: Drum Pattern Space Pop songs built on repeating drum loop variations on a few bass, snare, hi-hat patterns with John Arroyo Eigen-analysis (or...) to capture variations? by analyzing lots of (MIDI) data, or from audio Applications music categorization beat box synthesis insight Music Information Extraction - Ellis 2006-05-22 p. 16/35

Aligning the Data Need to align patterns prior to modeling... tempo (stretch): by inferring BPM & normalizing downbeat (shift): correlate against mean template Music Information Extraction - Ellis 2006-05-22 p. 17/35

Eigenrhythms (PCA) Need 20+ Eigenvectors for good coverage of 100 training patterns (1200 dims) Eigenrhythms both add and subtract Music Information Extraction - Ellis 2006-05-22 p. 18/35

Posirhythms (NMF) Posirhythm 1 Posirhythm 2 HH HH SN SN BD BD Posirhythm 3 Posirhythm 4 HH HH SN SN BD BD Posirhythm 5 Posirhythm 6 HH HH SN SN BD BD 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 samples (@ 2 1 2 3 4 1 2 3 4 beats (@ 120 0.1 0-0.1 Nonnegative: only adds beat-weight Capturing some structure Music Information Extraction - Ellis 2006-05-22 p. 19/35

Eigenrhythms for Classification 10 5 0-5 Projections in Eigenspace / LDA space PCA(1,2) projection (16% corr) 6 blues country 4 disco hiphop2 house newwave rock 0 pop punk -2 rnb LDA(1,2) projection (33% corr) -10-20 -10 0 10-4 -8-6 -4-2 0 2 10-way Genre classification (nearest nbr): PCA3: 20% correct LDA4: 36% correct Music Information Extraction - Ellis 2006-05-22 p. 20/35

Eigenrhythm BeatBox Resynthesize rhythms from eigen-space Music Information Extraction - Ellis 2006-05-22 p. 21/35

4. Music Similarity Can we predict which songs sound alike to a listener?.. based on the audio waveforms? many aspects to subjective similarity Applications query-by-example automatic playlist generation discovering new music Problems the right representation modeling individual similarity with Mike Mandel and Adam Berenzweig Music Information Extraction - Ellis 2006-05-22 p. 22/35

Music Similarity Features Need timbral features: Mel-Frequency Cepstral Coeffs (MFCCs) auditory-like frequency warping log-domain discrete cosine transform orthogonalization!"e$tr'(r)m +el-freq0en$2!"e$tr'(r)m Music Information Extraction - Ellis 2006-05-22 p. 23/35 +el-3req0en$2 4e"str)l 4'effi$ients

Timbral Music Similarity Measure similarity of feature distribution i.e. collapse across time to get density p(x i ) compare by e.g. KL divergence e.g. Artist Identification learn artist model p(x i artist X) (e.g. as GMM) classify unknown song to closest model Training MFCCs GMMs Artist 1 Artist 2 KL KL Min Artist Test Song Music Information Extraction - Ellis 2006-05-22 p. 24/35

Anchor Space Acoustic features describe each song.. but from a signal, not a perceptual, perspective.. and not the differences between songs Use genre classifiers to define new space prototype genres are anchors Audio Input (Class i) Audio Input (Class j) Anchor Anchor Anchor Anchor Anchor Anchor n-dimensional vector in "Anchor Space" p(a 1 x) p(a n-dimensional vector 2 x) in "Anchor Space" p(a 1 x) p(a n x) p(a 2 x) Conversion to Anchorspace p(a n x) GMM Modeling GMM Modeling Similarity Computation KL-d, EMD, etc. Conversion to Anchorspace Music Information Extraction - Ellis 2006-05-22 p. 25/35

Anchor Space Frame-by-frame high-level categorizations compare to raw features? fifth cepstral coef 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 Cepstral Features madonna bowie 1 0.5 0 0.5 third cepstral coef properties in distributions? dynamics? Electronica 0 5 10 15 Anchor Space Features madonna bowie 15 10 5 Country Music Information Extraction - Ellis 2006-05-22 p. 26/35

Playola Similarity Browser Music Information Extraction - Ellis 2006-05-22 p. 27/35

Ground-truth data Hard to evaluate Playola s accuracy user tests... ground truth? Musicseer online survey: ran for 9 months in 2002 > 1,000 users, > 20k judgments http://labrosa.ee.columbia.edu/ projects/musicsim/ Music Information Extraction - Ellis 2006-05-22 p. 28/35

Evaluation Compare Classifier measures against Musicseer subjective results triplet agreement percentage Top-N ranking agreement score: s i = Average Dynamic Recall?(Typke et al.) First-place agreement percentage - simple significance test N α r rα k r c r=1 α r = ( )1 1 3 2 α c = α 2 r % 80 70 60 50 SrvKnw 4789x3.58 40 SrvAll 6178x8.93 GamKnw 7410x3.96 30 GamAll 7421x8.92 20 10 0 cei cmb erd e3d opn kn2 rnd ANK Music Information Extraction - Ellis 2006-05-22 p. 29/35

Using SVMs for Artist ID Support Vector Machines (SVMs) find hyperplanes in a high-dimensional space relies only on matrix of distances between points much smarter than nearest-neighbor/overlap want diversity of reference vectors... (w x) + b = 1 yi = 1 x 2 w (w x) + b = + 1 x 1 y i = +1 (w x) + b = 0 Music Information Extraction - Ellis 2006-05-22 p. 30/35

Song-Level SVM Artist ID Instead of one model per artist/genre, use every training song as an anchor then SVM finds best support for each artist Training Artist 2 Artist 1 MFCCs Song Features D D D D D D DAG SVM Artist Test Song Music Information Extraction - Ellis 2006-05-22 p. 31/35

Artist ID Results ISMIR/MIREX 2005 also evaluated Artist ID 148 artists, 1800 files (split train/test) from uspop2002 Song-level SVM clearly dominates using only MFCCs! MIREX 05 Audio Artist (USPOP2002) Rank Participant Raw Accuracy Normalized Runtime / s 1 Mandel 68.3% 68.0% 10240 2 Bergstra 59.9% 60.9% 86400 3 Pampalk 56.2% 56.0% 4321 4 West 41.0% 41.0% 26871 5 Tzanetakis 28.6% 28.5% 2443 6 Logan 14.8% 14.8%? 7 Lidy Did not complete Music Information Extraction - Ellis 2006-05-22 p. 32/35

Playlist Generation SVMs are well suited to active learning solicit labels on items closest to current boundary Automatic player with skip = Ground truth data collection active-svm automatic playlist generation Music Information Extraction - Ellis 2006-05-22 p. 33/35

5. Artistic Application Compositional applications of automatic music analysis with Douglas Repetto, Ron Weiss, and the rest of the MEAP team o music reformulation automatic mashup generator Music Information Extraction - Ellis 2006-05-22 p. 34/35

Conclusions Anchor models Similarity/ recommend'n Semantic bases Music audio Melody extraction Drums extraction Fragment clustering Eigenrhythms Synthesis/ generation Event extraction Lots of data + noisy transcription + weak clustering musical insights? Music Information Extraction - Ellis 2006-05-22 p. 35/35?