Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 marl music and audio research lab

Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009

Sequence-based analysis

Structure discovery Cooper, M. & Foote, J. (2002), Automatic Music Summarization via Similarity Analysis Foote, J. et al. (2002), Audio Retrieval by Rhythmic Similarity Peeters, G. (2007), Sequence Representation of Music Structure Using Higher-Order Similarity Matrix and Maximum- Likelihood Approach Bartsch, M. and Wakefield (2001), G.H., "To Catch a Chorus: Using Chroma-Based Representations For Audio Thumbnailing. Dannenberg, Roger, and Hu (2002) ``Pattern Discovery Techniques for Music Audio,' Geoffroy Peeters, Amaury La Burthe and Xavier Rodet (2002), "Toward Automatic Music Audio Summary Generation from Signal Analysis. J. Wellhausen and H. Crysandt (2003), "Temporal Audio Segmentation Using MPEG-7 Descriptors. Masataka Goto (2003), A Chorus-Section Detecting Method for Musical Audio Signals Chai, Wei and Vercoe, Barry (2003). "Structural Analysis Of Musical Signals For Indexing and Thumbnailing." Lie Lu, Stan Li, Wen-Yin Liu, Hong-Jiang Zhang (2002), " Audio Textures Michael Casey: musicstructure.com Lukashevich, H. (2008), Towards Quantitative Measures of Evaluating Song Segmentation Jensen, K. et al. (2005), Rhythm-Based Segmentation of Popular Chinese Music Müller, M. & Clausen, M. (2007), Transposition- Invariant Self-Similarity Matrices Paulus, J. & Klapuri, A. (2008), Music Structure Analysis Using a Probabilistic Fitness Measure and an Integrated Musicological Model Peeters, G. et al. (2002), Toward Automatic Music Audio Summary Generation from Signal Analysis others

Self-similarity Self-similarity for music retrieval (Izumitani and Kashino, ICME-07/ ISMIR-08; Martin et al, ISMIR-09)

Structural similarity Music signals -> trajectories in an N-dimensional feature space Trajectories are fully characterized by the self-similarity matrix (rotation and time-invariance) Structural similarity -> distances between matrices

Representation track i feature extraction self-similarity quantization object i MFCC or chroma, averaged between beats, LPF and standardized Euclidean or cosine distance, normalized Uniformly quantized: 1, 2, 3 or 4 bits

Similarity We use the Normalized Compression Distance (NCD - Li and Vitanyi 1997; Cilibrasi, Vitanyi and dewolf 2004; Cilibrasi and Vitanyi 2005) { } { } NCD(o 1,o 2 ) = C(o 1o 2 ) min C(o 1 ),C(o 2 ) max C(o 1 ),C(o 2 ) C(.) is the size of a compressed object using a standard algorithm. o 1 o 2 is the concatenation of objects o 1 and o 2. Why NCD? It is (quasi) universal It is topic/parameter free (other than the choice of compressor: gzip, bzip2, PPMz)

Experimental setup P56 (Widmer et al. 2003): 56 piano music recordings, 25 pianists (1946-1998) 8 works, 3 composers (Beethoven, Mozart, Chopin), each work has 3-13 renditions S67 67 recordings symphonic music, 34 conductors (1948-2008) 11 works, 7 composers (Beethoven, Berlioz, Brahms, Mahler, Mendelssohn, Mozart, Tchaikovsky), each work has 6-7 renditions Goal: Cluster performances of a given work together

Example 1: beats

Example 2: length 2 nd shortest (5 42) 3 rd shortest (5 51) Shortest (2 41) Mozart_kv282_1 (7 16-8 04)

Example 2: length Mozart_kv279_3 (3 14-3 32) 2 nd longest (4 35) Shortest (1 54) Longest (5 37)

Example 2: length 4 14 4 31 Chopin_op15_1 4 07 5 07 5 41 Shortest (3 39)

Best results Eliminate beat-tracking and re-sample feature matrix to a fixed length Moving average filter across diagonals (Müller and Kurth 07) and binary encoding

Beethoven_op67_1 Mahler_sym4_2 Brahms_op98_3 Mozart_k550_3 Mahler sym1_2 Berlioz_op14_4 Mendelssohn op90_4 Mozart_k385_4 Brahms_op68_3 Tchaikovsky_op74_2 Beethoven_op68_4

NCD limitations

Concluding remarks Contributions: Rotation and time-invariant representation of music structure Parameterization, inc. quantization strategy to aid generalization A simple method for computing similarity Proof-of-concept (small-scale) evaluation on expressive music Results suggest that: Intermediate processes (e.g. beat tracking) degrade performance (Serra et al, 2008). Global similarity limiting robustness to structural changes (e.g. Berlioz) Current and future work: Scalability and noise-robustness Characterization of local structures is desirable (LSH, DP, MCMO solutions)

Thanks a lot! E-mail: jpbello@nyu.edu Web: homepages.nyu.edu/~jb2843 This work is made possible by grants from the U.S. Institute of Museum and Library Services and the National Science Foundation Also many thanks to: Ernest Li for his ideas; Gerhard Widmer and Werner Goebl for the P56 dataset; Craig Sapp for the CHARM dataset; Dan Ellis and the CompLearn team for making their code available.