Lecture 12: Alignment and Matching

Similar documents
Lecture 15: Research at LabROSA

Data Driven Music Understanding

Beat-Synchronous Chroma Representations for Music Analysis

Effects of acoustic degradations on cover song recognition

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

The Million Song Dataset

A repetition-based framework for lyric alignment in popular songs

Audio Structure Analysis

Music Information Retrieval for Jazz

Audio Structure Analysis

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Music Similarity and Cover Song Identification: The Case of Jazz

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

Music Structure Analysis

Audio Structure Analysis

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Data Driven Music Understanding

Lecture 11: Chroma and Chords

Content-based music retrieval

Music Information Retrieval (MIR)

SHEET MUSIC-AUDIO IDENTIFICATION

Tempo and Beat Analysis

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Music Information Retrieval

A Query-by-singing Technique for Retrieving Polyphonic Objects of Popular Music

MUSIC SHAPELETS FOR FAST COVER SONG RECOGNITION

Music Radar: A Web-based Query by Humming System

Can Song Lyrics Predict Genre? Danny Diekroeger Stanford University

Introductions to Music Information Retrieval

Music Information Retrieval (MIR)

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

The song remains the same: identifying versions of the same piece using tonal descriptors

Music Alignment and Applications. Introduction

ALIGNING SEMI-IMPROVISED MUSIC AUDIO WITH ITS LEAD SHEET

Analysing Musical Pieces Using harmony-analyser.org Tools

Music Processing Audio Retrieval Meinard Müller

Detecting Musical Key with Supervised Learning

Retrieval of textual song lyrics from sung inputs

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Automatic Music Genre Classification

Singer Traits Identification using Deep Neural Network

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 marl music and audio research lab

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Robert Alexandru Dobre, Cristian Negrescu

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Automatic Music Clustering using Audio Attributes

Searching for Similar Phrases in Music Audio

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Beethoven, Bach, and Billions of Bytes

Extracting Information from Music Audio

CS 591 S1 Computational Audio

Music Structure Analysis

Music Information Retrieval. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

Using Genre Classification to Make Content-based Music Recommendations

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Music Structure Analysis

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL

CS229 Project Report Polyphonic Piano Transcription

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

MUSI-6201 Computational Music Analysis

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Query By Humming: Finding Songs in a Polyphonic Database

Tempo and Beat Tracking

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Timing In Expressive Performance

Extracting and Using Music Audio Information

A prototype system for rule-based expressive modifications of audio recordings

Automatic Piano Music Transcription

Informed Feature Representations for Music and Motion

10 Visualization of Tonal Content in the Symbolic and Audio Domains

Music Processing Introduction Meinard Müller

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

New Developments in Music Information Retrieval

Meinard Müller. Beethoven, Bach, und Billionen Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

gresearch Focus Cognitive Sciences

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Pattern Based Melody Matching Approach to Music Information Retrieval

AUTOMATIC PRACTICE LOGGING: INTRODUCTION, DATASET & PRELIMINARY STUDY

Lecture 9 Source Separation

EXPRESSIVE TIMING FROM CROSS-PERFORMANCE AND AUDIO-BASED ALIGNMENT PATTERNS: AN EXTENDED CASE STUDY

Audio Cover Song Identification using Convolutional Neural Network

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

DISCOVERY OF REPEATED VOCAL PATTERNS IN POLYPHONIC AUDIO: A CASE STUDY ON FLAMENCO MUSIC. Univ. of Piraeus, Greece

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam

Music Database Retrieval Based on Spectral Similarity

Music Information Retrieval

Shades of Music. Projektarbeit

A Multimodal Way of Experiencing and Exploring Music

Musical Examination to Bridge Audio Data and Sheet Music

Representing, comparing and evaluating of music files

MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases

Transcription:

ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 12: Alignment and Matching 1. Music Alignment 2. Cover Song Detection 3. Echo Nest Analyze Dan Ellis Dept. Electrical Engineering, Columbia University dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e4896/ E4896 Music Signal Processing (Dan Ellis) 2013-04-15-1 /22

1. Music Alignment Often have versions of the same music with unmatched time axes different performances performance vs. score Various applications for aligning them synchronizing different tracks (with TSM) synchronized score display ground truth transcriptions Kurth et al., 2007 E4896 Music Signal Processing (Dan Ellis) 2013-04-15-2 /22

The Similarity Matrix Point-to-point comparison of sequences Foote 1999 e.g. Euclidean distance d euc (i, j) = or normalized inner product (cosine distance) d cos (i, j) =1 k x i (k) y j (k) 2 k x i(k)y j (k) k x i(k) 2 k y j(k) 2 Let It Be - The Beatles 32 48 time / beats 16 j Euclidean Distance dij 6 5 4 3 2 1 G AB D E C 16 32 48 time / beats E4896 Music Signal Processing (Dan Ellis) 2013-04-15-3 /22 G AB D E C Let It Be - Nick Cave i

Dynamic Programming Find best path combining local + transitions works for any kind of similarity matrix Allowable transitions T(1,0) = 0.1 {i k, j k } j 8 Local costs dij ; C* ; paths Bellman 1957 2.7 2.8 2.5 2.3 2.0 1.7 1.6 1.5 dij 1 0.9 T(1,1) = 0.0 T(0,1) = 0.1 7 2.1 2.2 2.0 1.8 1.5 1.3 1.2 1.6 0.8 {i k-1, j k-1 } 6 1.4 1.5 1.3 1.2 1.0 0.9 1.3 1.5 0.7 Finds path {ik, jk} to minimize cost... C imax,j max = k... recursively d(i k,j k ) +T (i k i k 1,j k j k 1 ) 5 4 3 2 1 1.2 1.2 1.0 1.0 0.7 0.9 1.2 1.5 0.8 0.8 0.7 0.6 0.7 1.2 1.8 2.4 0.5 0.5 0.4 0.5 0.9 1.5 2.2 2.7 0.3 0.3 0.4 0.6 1.0 1.5 2.1 2.6 0.1 0.4 0.6 0.8 1.1 1.6 2.3 2.9 C i,j = 0 1 2 3 4 5 6 7 8 i min d(i, j)+t (x, y)+c i x,j y x,y={(1,1),(1,0),(0,1)} E4896 Music Signal Processing (Dan Ellis) 2013-04-15-4 /22 0.6 0.5 0.4 0.3 0.2 0.1

Audio-to-Audio Alignment Dynamic programming to get time mapping + phase vocoder time scaling 500 450 400 350 300 250 200 150 100 50 50 100 150 200 250 300 350 400 E4896 Music Signal Processing (Dan Ellis) 2013-04-15-5 /22

Audio-Score Alignment Aligning a score representation (e.g. MIDI) is a proxy for polyphonic transcription Let It Be + aligned MIDI labels 1000 800 freq / Hz 600 400 200 0 0 2 4 6 8 10 12 14 16 18 20 22 time / sec E4896 Music Signal Processing (Dan Ellis) 2013-04-15-6 /22

Peak Structure Distance How do we match spectra to score notes? synthesize audio from MIDI & compare audio? Peak Structure distance : is energy where we expect? MIDI Piano roll Synthesized audio Predicted spectrum = mask M[k] Peak Structure = energy blw mask freq / bins freq / khz note freq / bins C6 C5 C4 C3 C2 0 50 100 150 200 250 300 350 400 time / frames 1 0.5 0 0 5 10 15 20 80 60 40 20 d psd =1 Orio & Schwartz 2001 50 100 150 200 250 300 350 400 450 80 60 40 20 0 0 50 100 150 200 250 300 350 400 k M[k] X[k] k X[k] time / sec time / frames E4896 Music Signal Processing (Dan Ellis) 2013-04-15-7 /22

2. Cover Song Detection Musicians are fond of cover versions usually alter melody, harmony, instrumentation, rhythm, style can be hard to spot even for a human! Can try to match via alignment.. with some threshold on best alignment cost? E4896 Music Signal Processing (Dan Ellis) 2013-04-15-8 /22

Smith-Waterman Local Alignment Cover version may have different form different number, ordering of verse/ chorus/brige want to find any large aligned regions Local alignment measure S i,j = max x,y time / beats 200 180 160 140 120 100 80 60 40 20 Beatles vs. Carol Woods cosine dist 50 100 150 max{0,s(i, j) P (x, y)+s i x,j y } Smith Waterman cd/2,.96.1.2 want largest score S* similarity s(i, j) must exceed penalty P(x,y) on avg. (e.g. 0.96 for diagonal, 1.2 for off-diagonal) 50 100 time / beats E4896 Music Signal Processing (Dan Ellis) 2013-04-15-9 /22

Local Alignment Cover Detection Serrà & Gòmez, 2008 Smith-Waterman needs predictable values use binary similarity based on best transposition Euclidean Binary Non-cover E4896 Music Signal Processing (Dan Ellis) 2013-04-15-10/22

Cross-correlation Covers System DP is good for time-warping, but expensive beat-timing is tempo independent (if it works) simply cross-correlate beat-chroma patches? chroma bins G E D C A extract Query 100 200 300 400 500 beats cross-correlate Candidate how big are the pieces? how do we combine individual scores? also expensive chroma bins G E D C A 100 200 300 400 500 beats E4896 Music Signal Processing (Dan Ellis) 2013-04-15-11/22

Global Cross-Correlation Cross-correlate entire beat-chroma matrices... at all possible transpositions (circular) chroma bins chroma bins skew / semitones Ellis & Poliner, 2007 implicit combination of match quality and duration G E D C A G E D C A +5 0 Elliott Smith - Between the Bars 100 200 300 400 500 beats @281 BPM Glen Phillips - Between the Bars Cross-correlation -5-500 -400-300 -200-100 0 100 200 300 400 skew / beats One good matching fragment is sufficient...? E4896 Music Signal Processing (Dan Ellis) 2013-04-15-12/22

Filtered Cross-Correlation Raw correlation not as important as precise local match looking for large contrast at ±1 beat skew i.e. high-pass filter skew / semitones Cross-correlation +5 0-5 -500-400 -300-200 -100 0 100 200 300 400 skew / beats Cross-correlation @ skew = +2 semitones 0.6 raw 0.4 0.2 0 filtered -500-400 -300-200 -100 0 100 200 300 400 skew / beats E4896 Music Signal Processing (Dan Ellis) 2013-04-15-13/22

Cover Song Results 23 Covers found in 8700 song uspop2002 Take_Me_To_The_River/annie_lennox Let_It_Be/nick_cave I_Love_You/faith_hill I_Can_t_Get_No_Satisfaction/rolling_stones Hush/milli_vanilli Grand_Illusion/styx Gold_Dust_Woman/sheryl_crow God_Only_Knows/brian_wilson Faith/limp_bizkit Cover Songs - dpwe23-12/23 correct Query Enjoy_The_Silence/tori_amos Day_Tripper/cheap_trick Come_Together/beatles Cocaine/nazareth Claudette/roy_orbison Cecilia/simon_and_garfunkel Caroline_No/brian_wilson Blue_Collar_Man/styx Between_The_Bars/glen_phillips Before_You_Accuse_Me/eric_clapton America/simon_and_garfunkel All_Along_The_Watchtower/dave_matthews_band Addicted_To_Love/tina_turner Abracadabra/sugar_ray Ab Ad Al Am Be Be Bl Ca Ce Cl Co Co Da En Fa Go Go Gr Hu I_ I_ Le Ta popular decoys normalization issues E4896 Music Signal Processing (Dan Ellis) 2013-04-15-14/22 Test

Analyzing Cover Song Correlation Look inside global cross-correlation to find matching fragments... xcorr = t f (C1(t, f) C2(t, f)) - view along time Let It Be / Beatles (beats 11-441) chroma G F D C A 50 100 150 200 250 300 350 400 Let It Be / Nick Cave (beats 13-443) time / beats chroma G F D C A 50 100 150 200 250 300 350 400 time / beats 0.4 0.2 0-0.2 0 50 100 150 200 250 300 350 400 time / beats E4896 Music Signal Processing (Dan Ellis) 2013-04-15-15/22

Cover Song False Alarm Correlation can be weak Cocaine (Clapton) vs. Satisfaction (Stones) Eric Clapton - Cocaine - beats 17:1027 chroma G F D C A 100 200 300 400 500 600 700 800 900 1000 Rolling Stones - Satisfaction - beats 1:1011 chroma G F D C A 100 200 300 400 500 600 700 800 900 1000 2 1 0-1 -2 0 100 200 300 400 500 600 700 800 900 1000 E4896 Music Signal Processing (Dan Ellis) 2013-04-15-16/22

3. Echo Nest Analyze Web service to provide beat, chroma,... analysis (and much more) TRKUYPW128F92E1FC0 - Tori Amos - Smells Like Teen Spirit register for free API key http:// developer.echonest.c om/account/register/ upload MP3, get back XML with analysis data freq / Hz freq / Hz 2416 761 240 B A G E D C 2416 761 240 Original EN Features Resynth 0 2 4 6 8 10 12 time / sec 14 E4896 Music Signal Processing (Dan Ellis) 2013-04-15-17/22

EN Analyze Usage Matlab wrapper function E4896 Music Signal Processing (Dan Ellis) 2013-04-15-18/22

Million Song Dataset (MSD) Commercial-scale dataset available to MIR researchers 1M pop songs 250 GB of features (6 years of listening) Thierry Bertin-Mahieux EN Analyze features +... Lyrics, Tags, Covers, Listeners... http://labrosa.ee.columbia.edu/millionsong E4896 Music Signal Processing (Dan Ellis) 2013-04-15-19/22

MSD Metadata EN Metadata artist: 'Tori Amos' release: 'LIVE AT MONTREUX' title: 'Smells Like Teen Spirit' id: 'TRKUYPW128F92E1FC0' key: 5 mode: 0 loudness: -16.6780 tempo: 87.2330 time_signature: 4 duration: 216.4502 sample_rate: 22050 audio_md5: '8' 7digitalid: 5764727 familiarity: 0.8500 year: 1992 SHS Covers %5489,4468, Smells Like Teen Spirit TRTUOVJ128E078EE10 Nirvana TRFZJOZ128F4263BE3 Weird Al Yankovic TRJHCKN12903CDD274 Pleasure Beach TRELTOJ128F42748B7 The Flying Pickets TRJKBXL128F92F994D Rhythms Del Mundo feat. Shanade TRIHLAW128F429BBF8 The Bad Plus TRKUYPW128F92E1FC0 Tori Amos Last.fm Tags 100.0 cover 57.0 covers 43.0 female vocalists 42.0 piano 34.0 alternative 14.0 singer-songwriter 11.0 acoustic 8.0 tori amos 7.0 beautiful 6.0 rock 6.0 pop 6.0 Nirvana 6.0 female vocalist 6.0 90s 5.0 out of genre covers 12 hello 11 i 10 a 9 and 7 it 6 are 6 we 6 now 5.0 cover songs 4.0 soft rock 4.0 nirvana cover 4.0 Mellow 4.0 alternative rock 3.0 chick rock 3.0 Ballad 3.0 Awesome Covers 2.0 melancholic 2.0 k00l ch1x 2.0 indie 2.0 female vocalistist 2.0 female 2.0 cover song 2.0 american MxM Lyric Bag-of-Words 6 here 6 us 6 entertain 4 the 4 feel 4 yeah 3 to 3 my 3 is 3 with 3 oh 3 out 3 an 3 light 3 less 3 danger E4896 Music Signal Processing (Dan Ellis) 2013-04-15-20/22

Summary Music Alignment Dynamic Programming finds correspondence Cover Songs DP, or cross-correlation for efficiency EN Analyze Web service to analyze audio E4896 Music Signal Processing (Dan Ellis) 2013-04-15-21/22

References R. Bellman, Dynamic Programming, Princeton University Press. 1957. D. Ellis and G. Poliner, Identifying Cover Songs With Chroma Features and Dynamic Programming Beat Tracking, Proc. ICASSP-07, Hawai'i, pp. IV-1429-1432, 2007. J. Foote, Visualizing Music and Audio using Self-Similarity, In Proc. ACM Multimedia, Orlando, pp. 77-80, 1999. Frank Kurth, Meinard Müller, Christian Fremerey, Yoon ha Chang, and Michael Clausen, Automated synchronization of scanned sheet music with audio recordings, Proc. 8th International Conference on Music Information Retrieval (ISMIR), Vienna, pp. 261-266, 2007. N. Orio & D. Schwarz, Alignment of monophonic and polyphonic music to a score, Proc. Int. Comp. Music Conf., Havana, pp.155-158, 2001. J. Serrà, E. Gómez, P. Herrera, X. Serra, Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification, IEEE Trans. on Audio, Speech and Lang. Proc., 16(6), pp. 1138-1151, 2008. E4896 Music Signal Processing (Dan Ellis) 2013-04-15-22/22