Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Similar documents
Subjective Similarity of Music: Data Collection for Individuality Analysis

Automatic Music Similarity Assessment and Recommendation. A Thesis. Submitted to the Faculty. Drexel University. Donald Shaul Williamson

A New Method for Calculating Music Similarity

MUSI-6201 Computational Music Analysis

COMBINING FEATURES REDUCES HUBNESS IN AUDIO SIMILARITY

Content-based music retrieval

Music Recommendation from Song Sets

Classification of Timbre Similarity

ISMIR 2008 Session 2a Music Recommendation and Organization

EVALUATION OF FEATURE EXTRACTORS AND PSYCHO-ACOUSTIC TRANSFORMATIONS FOR MUSIC GENRE CLASSIFICATION

Supervised Learning in Genre Classification

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

A Language Modeling Approach for the Classification of Audio Music

The song remains the same: identifying versions of the same piece using tonal descriptors

Limitations of interactive music recommendation based on audio content

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

THE importance of music content analysis for musical

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Music Genre Classification and Variance Comparison on Number of Genres

Recognising Cello Performers using Timbre Models

Recognising Cello Performers Using Timbre Models

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

Unifying Low-level and High-level Music. Similarity Measures

D3.4.1 Music Similarity Report

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Music Information Retrieval

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Topic 10. Multi-pitch Analysis

Aalborg Universitet. Feature Extraction for Music Information Retrieval Jensen, Jesper Højvang. Publication date: 2009

Music Information Retrieval. Juan P Bello

Extracting Information from Music Audio

ON RHYTHM AND GENERAL MUSIC SIMILARITY

Enhancing Music Maps

Features for Audio and Music Classification

Data Driven Music Understanding

Timing In Expressive Performance

SIMAC: SEMANTIC INTERACTION WITH MUSIC AUDIO CONTENTS

Automatic music transcription

Outline. Why do we classify? Audio Classification

An Examination of Foote s Self-Similarity Method

SONG-LEVEL FEATURES AND SUPPORT VECTOR MACHINES FOR MUSIC CLASSIFICATION

A Survey of Audio-Based Music Classification and Annotation

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

What Sounds So Good? Maybe, Time Will Tell.

Music Information Retrieval for Jazz

Week 14 Music Understanding and Classification

CS229 Project Report Polyphonic Piano Transcription

UNIVERSITY OF MIAMI FROST SCHOOL OF MUSIC A METRIC FOR MUSIC SIMILARITY DERIVED FROM PSYCHOACOUSTIC FEATURES IN DIGITAL MUSIC SIGNALS.

Repeating Pattern Extraction Technique(REPET);A method for music/voice separation.

Music Structure Analysis

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

2. AN INTROSPECTION OF THE MORPHING PROCESS

Music Information Retrieval Community

HIDDEN MARKOV MODELS FOR SPECTRAL SIMILARITY OF SONGS. Arthur Flexer, Elias Pampalk, Gerhard Widmer

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. X, NO. X, MONTH Unifying Low-level and High-level Music Similarity Measures

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Music Similarity and Cover Song Identification: The Case of Jazz

Effects of acoustic degradations on cover song recognition

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

CSC475 Music Information Retrieval

Tempo and Beat Analysis

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Audio Feature Extraction for Corpus Analysis

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

MODELS of music begin with a representation of the

Speech and Speaker Recognition for the Command of an Industrial Robot

Research Article A Model-Based Approach to Constructing Music Similarity Functions

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

From Low-level to High-level: Comparative Study of Music Similarity Measures

CS 591 S1 Computational Audio

Music Database Retrieval Based on Spectral Similarity

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Large-Scale Pattern Discovery in Music. Thierry Bertin-Mahieux

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Introductions to Music Information Retrieval

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:

PLAYSOM AND POCKETSOMPLAYER, ALTERNATIVE INTERFACES TO LARGE MUSIC COLLECTIONS

Automatic Music Genre Classification

Transcription of the Singing Melody in Polyphonic Music


A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

MEL-FREQUENCY cepstral coefficients (MFCCs)

Melody Retrieval On The Web

th International Conference on Information Visualisation

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Transcription:

Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional, subjective, and context-dependent. This talk focuses on simplified computational models of similarity based on audio signal analysis. Such models can be used to help users discover, organize, and enjoy the contents of large music collections. The topics of this talk include an introduction to the topic, a review of related work, a review of current state-of-the-art technologies, a discussion of evaluation procedures, a demonstration of applications (including playlist generation and the organization of music collections), and finally a discussion of limitations, opportunities, and future directions. 2005/10/27, Osaka, SIGMUS Outline 2 1. Introduction - Context - Definition of similarity - Playlist generation demonstration - Alternative approaches - Related research, history 2. Techniques 3. Evaluation 4. Application (MusicRainbow)

Context 3 Abundance of (Digital) Music new commercial music released every week back-catalogues creative commons (garage bands etc.) library music, Technological Possibilities storage practically unlimited size of music collections bandwidth music can be accessed via Internet, mobile phones, portable music players etc. music is always present CPU complex computations are feasible algorithms (many years of related research, e.g. MFCCs) GOAL: use existing and develop new technologies to make music more accessible for active exploration as well as passive consumption Perception of Music Similarity 4 1. subjective 2. context-dependant 3. multi-dimensional E.g.: Timbre Instrumentation Structure Complexity Melody Harmony Rhythm Tempo Sociocultural Background Lyrics Mood

Music Similarity: Definition 5 Songs A and song B are similar if - Playlist generation: users think A and B fit into the same playlist. - Recommendation: users who like A also like B. - Organization: users would expect to find A in the same category as B. User centered view Problem: difficult to evaluate Music Similarity: Definition 6 Example: playlist generation Specific Scenario - Music: private collection (< 20,000 songs) - Hardware: e.g. mobile audio player - User: minimal interaction ( lazy ) Basic Idea use audio-based similarity and user feedback to create playlist (Demonstration uses state of the art similarity measure.)

Music Similarity: Definition 7 Demonstration: Simple Playlist Generator [Pampalk & Gasser, ISMIR 2006] Alternatives to Audio-based Music Similarity 8 Specific case of playlist generation: (personalized internet radio) Experts (e.g. http://pandora.com) BUT: expensive! (human: 20-30 minutes per song) Communities (e.g. http://last.fm) BUT: many problems with collaborative approaches Ideal Solution: Combination with audio-based approaches

Advantages of Audio-based Similarity 9 - Fast & Cheap On this laptop (Centrino 2GHz): < 2 seconds to analyze one song ~ 0.1 milliseconds to compare two songs can be applied to huge music collections - Objective & consistent Audio-based Similarity: Related Fields 10 Audio (signal processing) Self-similarity, segmentation, summarization, extracting semantic descriptors (rhythm, harmony, melody, ), genre classification, Web (collaborative filtering, web-crawling, ) Artist similarity, lyrics similarity, describing music with words, Symbolic (MIDI etc.) Melodic similarity, genre classification,

Audio-based Similarity: Brief History 11 Genre classification 1996: audio classification (Wold et al.) 2001: music classification (Tzanetakis & Cook) 2004: first genre classification contest (ISMIR) Music similarity 1999: retrieval (Foote) 2001: organization (Frühwirth; Pampalk) playlist generation (Logan & Salomon) 2004: glass ceiling (Aucouturier & Pachet) 2006: first music similarity contest (MIREX) Young research field BUT: no major quality improvements since 2004! Outline 12 1. Introduction 2. Techniques - Basics - Zero Crossing Rate (ZCR) walkthrough - Spectral similarity - Fluctuation patterns - Combination of different similarity measures 3. Evaluation 4. Application

Music Similarity: Schema 13 Feature Extraction Computation (e.g. Euclidean) Audio 1 (PCM) Features 1 (Various) (Float) Audio 2 (PCM) Features 2 (Various) Genre Classification Audio (PCM) Features (Various) Black Box (e.g. SVM) Genre Label specific to training set (requires training data) Audio Features: Type and Scope 14 Type - single numerical value (e.g. ZCR) - vector (e.g. MFCCs) - matrix or n-dimensional histograms (e.g. fluctuation patterns) - multivariate probability distribution (e.g. spectral similarity) - anything else (e.g. sequence of chords) Scope - frame (e.g. 20ms, usually: 10ms-100ms) - segment (e.g. note, bar, phrase, chorus ) - song - set of songs (e.g. album, artist, collection )

Computation 15 Features: numerical, vector, matrix Euclidean, cosine, Minkowski, Features: probability distributions Earth Mover s distance, Monte Carlo sampling, Kullback Leibler divergence, Alternatives (e.g.): - use genre classification results to compute similarity - use any form of combination Audio Features in this Talk 16 Zero Crossing Rate (ZCR) simple walkthrough illustrates problem of generalization Timbre related introduction to MFCCs spectral similarity State of the Art Rhythm related fluctuation patterns

Audio-based Music Similarity: Walkthrough 17 Zero Crossing Rate (ZCR) = 3/ms 0.4 0.2 Amplitude 0-0.2 = 15 / 5ms -0.4-0.6 0 1 2 3 4 5 Time [ms] 18 2.10 2.66 2.82 3.87 5.31 7.34 ZCR

19 Similarity = Feature Extraction + Computation Typical schema in feature extraction research (generalization problem) 1. find feature that works good on current set of music (e.g. 4 pieces) 2. later on, find out that there are other pieces where feature fails ( go back to step 1) ZCR (and many other low-level audio statistics, incl. e.g. RMS) + simple + can create interesting results sometimes - only weakly connected (if at all) to human perception of audio - generally musically not really meaningful (noise/pitch?) meaningful descriptors require higher level analysis. one typical intermediate representation is the spectrogram (time domain frequency domain) Spectral Similarity (Timbre Related) 20 Spectrum References: - Logan & Salomon, ICME 2001 (+ Patent) - Aucouturier & Pachet, ISMIR 2002 - Mandel & Ellis, ISMIR 2005

21 Mel Frequency Cepstrum Coefficients (MFCCs) MFCCs are one of the most common representations used for Spectra in MIR Given audio signal (e.g. 23 milliseconds, 22kHz mono) 1. apply window function 2. compute power spectrum (with FFT) 01a w = hann(512); 01b wwav = wav.*w; 02a X = fft(wwav); 02b Y = X(1:512/2+1); 02c P = abs(y).^2; 0 window 0 FFT db wav 1 256 512 e.g. 23ms window at 22kHz input (512 samples) 1 0.5 w 0 1 256 512 wwav 1 256 512 window function (e.g. Hann) log10(p) 0 1 128 256 1 st bin: 0Hz 257 th bin: 22kHz/2 22 Mel Frequency Cepstrum Coefficients (MFCCs) 3. apply Mel filter bank 4. apply Discrete Cosine Transform (DCT) MFCCs 03 mel = melfb * P; %% size(melfb) == [36 257] 04 mfcc = DCT * log10(mel); %% size(dct) == [20 36] db Mel DCT mfcc mel log10(p) 0 0 1 128 256 10 20 30 0 5 10 15 20 1 0 10 0 10 1 10 2 Mel filter bank weights (melfb) 20 1 36 DCT matrix

23 Mel Frequency Cepstrum Coefficients (MFCCs) Advantages - simple and fast (compared to other auditory models) - well tested, many implementations available (speech processing) - compressed representation, yet easy to handle (e.g. Euclidean distance can be used on MFCCs) Important characteristics - non-linear loudness (usually db) - non-linear filter bank (Mel scale) - spectral smoothing (DCT; depends on number of coefficients used) simple approximation of psychoacoustic spectral masking effects 05 mel_reconstructed = DCT * mfcc; DCT mfcc = 0 mel 10 20 30 0 5 10 15 20 0 10 20 30 mel_reconstructed Spectral Similarity (Timbre related) 24 Spectrograms

Spectral Similarity (Timbre related) 25 Spectrograms Typical Spectra Summarize Spectra k-means, GMM-EM, or mean (and covariance) 64.1% 18.4% 17.6% 64.1% 18.4% 17.6% 26 54.7% 32.0% 13.4% 41.7% 29.3% 29.0% 49.1% 27.8% 23.1% 55.8% 34.5% 9.7% 42.6% 30.0% 27.4%

Computing s between Typical Spectra 27 1. Earth Mover s + Kullback Leibler Divergence (k-means clustering, diagonal covariance) Logan & Salomon, ICME 01 64.1% 18.4% 17.6% 2. Monte Carlo sampling (GMM-EM, diagonal covariance) Aucouturier & Pachet, ISMIR 02 3. Kullback Leibler Divergence (mean, full covariance) Mandel & Ellis, ISMIR 05? 54.7% 32.0% 13.4% Recommended article Aucouturier & Pachet: Improving timbre similarity: How high is the sky? Journal of Negative Results in Speech and Audio Sciences, 1(1), 2004. Spectral Similarity, Matrix 28 Matrix 1 2 3 4 5 6 1 2 3 4 5 6 Problem: the beats don t seem to have enough impact on the similiarity measure

Fluctuation Patterns (Rhythm Related) 29 Frequency Band 20 15 10 5 Mel/dB Spectrogram 20 Loudness amplitude in one Frequency Band Loudness 10 0 0 2 4 6 8 10 Seconds Fluctuation Patterns (Rhythm Related) 30 Frequency Bands analyze peridocities remove phase information with e.g. FFT (or autocorrelation, or comb-filter) 20 15 10 5 FP 3.3 6.6 10 Modulation Frequency (Hz) Loudness References: Frühwirth, 2001 Pampalk, 2001 Pampalk et al., 2002

31 Fluctuation Patterns: Demonstration Fluctuation Patterns (Rhythm Related) 32 FP

Fluctuation Patterns (Rhythm Related) 33 computation FP1 FP2? Euclidean distance (L2 norm) d = sqrt(sum((fp1(:)-fp2(:)).^2)); %% e.g. size(fp1) == [24 60] %% size(fp1(:)) == [1440 1] Fluctuation Patterns (Rhythm Related) 34 1 2 3 4 5 6 1 2 3 4 5 6 combine with spectral similarity

Features Extracted from FPs 35 FP.B: Modulations in bass frequency bands (e.g. <200Hz) FP.G: Center of Gravity on the horizontal axis (related to perceived tempo) Max, mean, variance, [Pampalk 2001; Pampalk et al. 2005; Lidy & Rauber 2005; Pampalk 2006] Linearly Combined s 36 Song A Song B Kullback-Leibler Divergence Weights S S? FP FP.B FP FP.B?? Sum FP.G FP.G? Euclidean (computationally very cheap)

Outline 37 1. Introduction 2. Techniques 3. Evaluation (and Optimization) - Different types of evaluations - Genre-based evaluation - Listening tests, MIREX 06 4. Application 4 Basic Evaluation Types 38 Evaluation within context of application - only way to find out about acceptance - very specific (results cannot be generalized to other applications) - very difficult to evaluate a large number of similarity measures Listening test: full similarity matrix - seems infeasible for larger numbers of songs - once similarity matrix is defined: fast & cheap evaluation and measuring perceptual significance of differences Listening test: based on rankings by algorithms - allows measuring perceptual significance of differences - difficult to evaluate a large number of similarity measures Genre-based - fast & cheap - can be used to evaluate very large parameter spaces - DANGER: very easy to do overfitting & not so easy to measure performance correctly

Genre-based Evaluation 39 Assumption: similar pieces belong to the same genre. Seems to hold in general! [Pampalk 2006; Novello et al. 2006; MIREX 2006] Basic Procedure (e.g.): 1. Given a query song: 2. Count number of pieces from the same genre within top N results Typical genres used include rock, classic, jazz, blues, rap, pop, electronic, heavy metal, Genre-based Evaluation 40 + Advantages genre labels easy to collect, cheap, fast possible to evaluate large parameter spaces! should always be the first sanity check of a similarity measure (before using listening tests!) if done correctly, good approximation of results from listening test! [Pampalk 2006; MIREX 2006] - Problems - danger of overfitting!! - genre taxonomies are inconsistent, - similarity is not measured directly, (assumption does not always hold)

Genre-based Evaluation: Avoiding Overfitting Problems 41 Artist filter: test set and training set must not contain pieces from the same artist. otherwise artist identification performance is measured (focus on singers voice etc.). In addition: production effects (record studio etc.) might have unwanted effects on the evaluation. Different music collections (3 or more): from different sources. Performance of similarity measure can change a lot depending on the collection used. at least 2 collections should be used for development, and at least 1 for final conclusions (to test generalization). [Pampalk et al. 2005; Pampalk 2006] Linearly Combined s 42 Song A Song B Kullback-Leibler Divergence Weights? S S? FP FP.B FP FP.B?? Sum FP.G FP.G? Euclidean (computationally very cheap)

43 Linearly Combined s (G1C) 44 Song A Song B Kullback-Leibler Divergence Weights S S 70% FP FP.B FP FP.B 10% 10% Sum FP.G FP.G 10% Euclidean (computationally very cheap) State-of-the art: highest score at MIREX 06 audio-based similarity evaluation

Listening Tests 45 allows measuring the perceptual significance of differences Select query song Ask algorithms to retrieve most similar songs Ask human listeners to rate similarity of these given the query Assumption: Different people rate similarity of songs consistently. Seems to hold in general! [Logan & Salomon 2001; Pampalk 2006; Novello et al. 2006; MIREX 2006] What scale should be used to rate similarity? What about the context of the question? Which songs should be selected? (Stimuli) Listening Test: G1 vs. G1C 46 100 queries 2 algorithms (G1, G1C) for each query each algorithm retrieves the most similar song from the music collection (using artist filter) given 3 songs (query Q, A, B) listeners are asked to rate the similarity of Q-A, and Q-B on a scale from 1 to 9. (1 = terrible, 9 = perfect) 3 listeners per song pair (to measure consistency) [Pampalk 2006]

47 G1C G1C average rating: 6.37 Listening test result: On a scale from 1 to 9 the difference is only about 0.6! G1 G1 average rating: 5.73 Listening Test: MIREX 06 48 60 queries 6 algorithms (4 different research groups) for each query, each algorithm retrieved the 5 most similar songs (using artist filter) given 31 songs (query + 6 x 5 candidates) listeners are asked to rate the similarity of each query/candidate pair on a scale from 0 to 10. (0 = terrible, 10 = perfect) 3 listeners per query/candidate pair

49 G1C G1* FP* Computation Time: Feature extraction: 5000 songs computation: 5000x5000 Outline 50 1. Introduction - Playlist generation 2. Techniques 3. Evaluation 4. Application - MusicRainbow

MusicRainbow 51 Use audio-based similarity measure to compute artist similarity. [Pampalk & Goto, ISMIR 2006] Artist Similarity and Organization 52 X X Y Y G1C Similarity Space Projection X Songs from Artist X Songs from Artist Y Y Artist Similarity Shortest Path

Conclusions 53 Current Situation: Low-level features are not enough Slow progress in the last years glass ceiling since 2004 however, computational complexity has been reduced by several magnitudes (factor 1000 faster!) Many unexplored questions [Novello et al., ISMIR 2006] Similarity: Future Directions 54 Improve linear combination model Use higher level semantic descriptors Rhythm, harmony, Context-dependant similarity Different parameters for different types of music and different users Combine audio-based similarity with other sources (e.g. collaborative filtering) e.g. [Yoshii et al., ISMIR 2006] Explore applications which can deal with erroneous similarity measures (e.g. playlist generation)

References: Starting Points 55 - ISMIR Proceedings - MIREX 2006 webpages - J.-J. Aucouturier: Ten Experiments on the Modelling of Polyphonic Timbre, PhD Thesis, 2006 - E. Pampalk: Computational Models of Music Similarity and their Application in Music Information Retrieval, PhD Thesis, 2006