Music Similarity and Cover Song Identification: The Case of Jazz

Similar documents
Computational Modelling of Harmony

MUSI-6201 Computational Music Analysis

Outline. Why do we classify? Audio Classification

Music Information Retrieval

The song remains the same: identifying versions of the same piece using tonal descriptors

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

Introductions to Music Information Retrieval

Music Genre Classification and Variance Comparison on Number of Genres

Effects of acoustic degradations on cover song recognition

Singer Traits Identification using Deep Neural Network

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

The Million Song Dataset

Robert Alexandru Dobre, Cristian Negrescu

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

Automatic Music Clustering using Audio Attributes

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Subjective Similarity of Music: Data Collection for Individuality Analysis

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

arxiv: v1 [cs.lg] 15 Jun 2016

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

Automatic Rhythmic Notation from Single Voice Audio Sources

Curriculum Standard One: The student will listen to and analyze music critically, using vocabulary and language of music.

CSC475 Music Information Retrieval

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Analysing Musical Pieces Using harmony-analyser.org Tools

Homework 2 Key-finding algorithm

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Curriculum Standard One: The student will listen to and analyze music critically, using the vocabulary and language of music.

Tempo and Beat Analysis

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Improvised Duet Interaction: Learning Improvisation Techniques for Automatic Accompaniment

Beethoven, Bach, and Billions of Bytes

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Chord Classification of an Audio Signal using Artificial Neural Network

Statistical Modeling and Retrieval of Polyphonic Music

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Audio Structure Analysis

A Survey of Audio-Based Music Classification and Annotation

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

Music Information Retrieval (MIR)

Pitfalls and Windfalls in Corpus Studies of Pop/Rock Music

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

Voice & Music Pattern Extraction: A Review

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Year 7 Curriculum Overview Subject: Music

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Audio Structure Analysis

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Music Genre Classification

Music Information Retrieval

ABSOLUTE OR RELATIVE? A NEW APPROACH TO BUILDING FEATURE VECTORS FOR EMOTION TRACKING IN MUSIC

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

CSC475 Music Information Retrieval

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Searching for Similar Phrases in Music Audio

MUSIC CURRICULM MAP: KEY STAGE THREE:

Creating a Feature Vector to Identify Similarity between MIDI Files

Data Driven Music Understanding

GROUPING RECORDED MUSIC BY STRUCTURAL SIMILARITY

Rhythm related MIR tasks

Audio Structure Analysis

Musical Data Bases Semantic-oriented Comparison of Symbolic Music Documents

Audio Feature Extraction for Corpus Analysis

THE importance of music content analysis for musical

Music Information Retrieval. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

Greeley-Evans School District 6 High School Vocal Music Curriculum Guide Unit: Men s and Women s Choir Year 1 Enduring Concept: Expression of Music

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Detecting Musical Key with Supervised Learning

Automatic Piano Music Transcription

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Autumn. A: Plan, develop and deliver a music product B: Promote a music product C: Review the management of a music product

SAMPLE ASSESSMENT TASKS MUSIC CONTEMPORARY ATAR YEAR 11

PERFORMING ARTS. Head of Music: Cinzia Cursaro. Year 7 MUSIC Core Component 1 Term

Unit summary. Year 9 Unit 6 Arrangements

Music Curriculum Map

SAMPLE ASSESSMENT TASKS MUSIC GENERAL YEAR 12

MorpheuS: constraining structure in automatic music generation

Sample assessment task. Task details. Content description. Task preparation. Year level 9

Recognising Cello Performers using Timbre Models

Music Structure Analysis

CS 591 S1 Computational Audio

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

Music Assessment Key Stage 3. Moving towards next step: A (creating and evaluating) Developing at that step: C (remembering and understanding)

The Effect of DJs Social Network on Music Popularity

Transcription of the Singing Melody in Polyphonic Music

Unit title: Music First Study: Composition (SCQF level 7)

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

Musical Developmental Levels Self Study Guide

Tempo and Beat Tracking

Jazz Melody Generation and Recognition

The Human Features of Music.

Music Curriculum. Rationale. Grades 1 8

Week 14 Music Understanding and Classification

Transcription:

Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University of London

Outline Musical similarity and cover song detection Standard approaches to cover song detection Information-theoretic measures of similarity (Foster, Dixon and Klapuri, IEEE/ACM Trans. ASLP 2015) Concluding thoughts Simon Dixon (C4DM) Jazz and Cover Songs 2 / 18

MIR and Similarity The music industry thinks you will buy music that is similar to music that you have bought in the past This has inspired the Music Information Retrieval community (computer scientists, music psychologists/librarians/experts, engineers, etc.) to invest considerable effort investigating similarity in music Assessing the similarity of pairs of audio recordings is of particular interest, as it solves the cold-start problem (lack of data concerning new or unknown music items) For music recommendation and playlist generation, the task is often expressed as: given a seed song, return a song or songs that is/are (most) similar What is music similarity? What makes two recordings similar? Simon Dixon (C4DM) Jazz and Cover Songs 3 / 18

Defining Similarity Music has many dimensions or aspects: melody, rhythm, harmony, instrumentation, timbre, lyrics, genre, style, mood Assessing similarity along any one dimension is subjective Reducing similarity to a scalar value makes it extremely ill-posed Casey et al. (Proc. IEEE, 2008) describe a spectrum of specificity in MIR tasks Highly specific: identification of specific recordings (fingerprinting) for copyright monitoring, royalty assignment Remixes, versions and imitations Performances of the same piece Pieces by the same artist or composer, that sound similar, or match the same user s listening profile Low specificity: similar mood, genre, instrumentation; influence The context or application defines what is meant by similarity Simon Dixon (C4DM) Jazz and Cover Songs 4 / 18

Cover Songs Most pop songs have a canonical original recorded version Other musicians who perform the song are creating a cover version or just cover Motivation may be as tribute, parody, or means of artistic expression, or to obtain recognition Covers range from being almost indistinguishable from the original to being unrecognisable Some aspects of the original are preserved, some are modified When looking for covers, we don t know in advance which aspects are going to be preserved Genre dependence: within pop, harmony and (to a lesser extent) melody are likely to be preserved, along with lyrics Simon Dixon (C4DM) Jazz and Cover Songs 5 / 18

Standards and Covers In jazz and many traditional/folk music styles (e.g. flamenco, Irish), there is a shared repertoire of commonly-performed works, allowing musicians who have never met to perform together These may be passed on orally, or captured in notation (usually no more than melody, chords and lyrics, i.e. the lead sheet) Collections of lead sheets appear in real/fake books Performances of such standards can be considered as cover versions, even where there is no definitive original version Jazz allows/expects ornamentation and transformation of the melody as well as substitution of chords in the harmony Good MIR task: the ground truth is relatively easy to determine Simon Dixon (C4DM) Jazz and Cover Songs 6 / 18

Example Simon Dixon (C4DM) Jazz and Cover Songs 7 / 18

Standard Approaches in MIR Bag-of-features approaches are suitable mainly for low-specificity tasks (e.g. genre, mood) Temporal features can represent time-varying tonal content frequency, pitch or chroma predominant melody or harmony (pitches or chords) Adapt features to allow for variation in key or tempo circular shift of chroma vectors beat synchronous features Perform pairwise sequence matching dynamic programming (edit distance) on similarity matrix correlation local alignment: as there is no guarantee that different versions share the same structure Simon Dixon (C4DM) Jazz and Cover Songs 8 / 18

Information-Theoretic Approach to Similarity Similarity can be viewed as predictability That is, given information about one piece, how predictable does the other piece become If the underlying composition is the same, the given information should increase the predictability of the second piece Music psychologists have reflected on modelling prediction with information theory We compare various approaches: discrete valued vs continuous data; compression vs prediction; correlation and bag-of-feature baselines Our approach is based on representing pieces by sequences of features encoding the harmonic content (chroma) Simon Dixon (C4DM) Jazz and Cover Songs 9 / 18

Discrete-Valued Approaches In information-thoeretic terms, predictability means redundancy Predictability can be estimated by data compression: an optimal compression algorithm removes all redundancy, leaving only the information content Joint compressibility quantifies the similarity between pairs of sequences Normalised compression distance (NCD) approximates algorithmic information content (Kolmogorov complexity): NCD(x, y) = max{c(xy) C(x), C(yx) C(y)} max{c(x), C(y)} Extend with interleaving of sequences rather than concatenation, with circular shift to maximise the correlation of the sequences Simon Dixon (C4DM) Jazz and Cover Songs 10 / 18

Continuous-Valued Approaches Predictive approach based on previous context (self-prediction) or model of other sequence (cross-prediction), or both (conditional self-prediction) Temporal context is encoded with time-delay embedding, and prediction performed using the nearest neighbour The sequence of prediction errors is normalised and statistics of the sequence are computed This approach can also be applied to discrete-valued sequences Alternative approach using conditional entropy (predictability) instead of compressibility Simon Dixon (C4DM) Jazz and Cover Songs 11 / 18

Data Jazz box sets from zweitausendeins.de With metadata in CSV file () 300 recordings of 97 pieces Relaxed definition of cover : we do not attempt to distinguish the original, nor the artist (self-covers are allowed) Two recordings are a cover pair iff their titles are identical Further (large-scale) experiments were performed on the Million Song Dataset: see our paper for details Simon Dixon (C4DM) Jazz and Cover Songs 12 / 18

Features 12-dimensional beat-synchronous chroma (preferred beat rate: 240 BPM) pitch adjusted within ±0.5 semitones to allow for reference frequencies other than A4 = 440 Hz one sequence transposed to maximise inner product of global average chroma between the pair K-means applied (with various codebook sizes up to 48) for discrete methods Simon Dixon (C4DM) Jazz and Cover Songs 13 / 18

Methods Compression: off-the-shelf standard algorithms (LZ, BW, PPM) Prediction: PPMC, LZ78 Continuous prediction: various parameterisations of time-delay embedding Normalisation to remove hubs For larger scale experiments, a filter-and-refine approach was used: fast histogram-based approach then the temporal sequence methods on the best matches Simon Dixon (C4DM) Jazz and Cover Songs 14 / 18

Results: Discrete-Valued Approaches Evaluated in terms of mean average precision (MAP) Numbers are in the paper: not comparable across datasets Compression-based: Interleaving of sequences helped, except for block-based compression algorithms, but... Histogram (BoF) baseline outperformed compression-based techniques (but not on the larger dataset) Discrete cross-prediction was even better in most cases Simon Dixon (C4DM) Jazz and Cover Songs 15 / 18

Results: Continuous-Valued Approaches Continuous cross-prediction was better than conditional self-prediction and better than all discrete approaches Baseline cross-prediction method performed equally well Combination of our approach with the baseline gave significant improvement Baseline cross-correlation approach also performed well on the jazz set, but not on the extended set State-of-art results were obtained on the Million Song Dataset Simon Dixon (C4DM) Jazz and Cover Songs 16 / 18

Conclusions and Future Work Information-theoretic measures do capture some aspects of medium-specificity music similarity Codebook for discrete-valued approach is not musically motivated Future work: compare with a more musical representation: automatically generated chord symbols Continuous-valued vectors are also uninterpreted in these approaches Extend experiments to complete jazz dataset (10000 recordings, 1 31 covers) Full methods are slow: test filter-and-refine for jazz Simon Dixon (C4DM) Jazz and Cover Songs 17 / 18

Any Questions? Acknowledgements / references Peter Foster, Simon Dixon and Anssi Klapuri: Identifying Cover Songs Using Information-Theoretic Measures of Similarity, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, No. 6, 2015, pp 993-1005 Peter Foster: Information-Theoretic Measures of Predictability for Music Content Analysis, PhD Thesis, Queen Mary University of London, School of Electronic Engineering and Computer Science, 2015 Simon Dixon (C4DM) Jazz and Cover Songs 18 / 18