MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases

1 MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases Gus Xia Tongbo Huang Yifei Ma Roger B. Dannenberg Christos Faloutsos Schools of Computer Science Carnegie Mellon University

2 Introduction: Background What is MIDI? Musical Instrument Digital Interface. A MIDI file doesn t carry the actual sound but rather the control informa=on. For piano pieces: pitch, velocity, start =me, and ending =me. What are similar MIDI files? Different performance versions of the same composi=on, including the pure quan=zed version. Why find similar MIDI files? Important for musicians and music amateurs. Widely distributed online.

3 Introduction: Goal Context: MIDI files are difficult to search by metadata due to careless or casual labeling. Idea: content-based retrieval Our goal: Given a query MIDI file, find all different performance versions (including pure quantized version) of the same composition The search should be effective and fast to deal with 1 million MIDI files.

4 Introduction: General approach

5 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

6 Search Quality Goal: Design features and corresponding measurements to reveal the similarity between different MIDI files. General Methods: Euclidean distance for Bag- of- words feature Modified Levenshtein distance for melody string feature

7 Search Quality: ED for BOW feature Bag- of- words feature: Word: note, where its octave and dura=on are ignored. Word count: normalized appearance =mes of a note. BOW feature: 12- dim vector, an empirical distribu=on over the pitch classes (0.3, 0, 0.1, 0.05, 0.1, 0.1, 0.02,0.2, 0,0.02, 0.01, 0.1 ) (C, C#, D, D#, E, F, F#,G, G#, A, A#, B ) Euclidean Distance for two vectors: 12 i=1 ED(a,b) = (a i b i ) 2

8 Search Quality: modified Levenshtein distance for melody string feature Melody string feature: Dis=nc=ve element to help people tell different music We simply use highest pitches at any given =me as the melody, where the dura=ons are ignored. Levenshtein distance for two strings:

Search Quality: cons of Levenshtein Problem: The distance correlates with the melody length The distribu=on over the length of melody strings follows a power law, with the mean of 1303 and standard devia=on of 1240 800 600 count 400 200 0 0 2000 4000 6000 8000 10000 12000 14000 Length of melody string 9

10 Search Quality: Lev-400 Solu=on: Turn melody strings into equal length Chopping and concatena=ng the first and last 200 notes Don t modify the strings which are shorter than 400, but scale up the Levenshtein distance Insights: A unified length will leads to a unified threshold Similar melodies tend to agree more at the beginning and the ending part.

11 Search Quality: Lev-400SC Observa=on: For similar melody strings, the string edi=ng path of smallest distances stays close to diagonal. Idea: We don t need to fill up the whole matrix Solu=on: Use a diagonal Sakoe- Chiba Band Sakoe, H. & Chiba, S. (1978). Dynamic programming spoken word recognition algorithm optimization for

12 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

13 Search Scalability Goal: Speed up the searching process since naïve linear scanning is very slow. General Methods: Combine different similarity measurements Use M- tree indexing

14 Search Scalability: MF-Q Idea: Combine ED and Lev- 400 First do linear scan for ED, filtering out most candidates Then do linear scan for Lev- 400 on the surviving candidates Speed- up factor: BOW filtering: a frac=on p remains, we speed up 1/p m n Clipped melody representa=on: 400 2

15 Search Scalability: MF-SC Idea: Combine ED and Lev- 400SC BOW filtering works the same Use diagonal Sakoe- Chiba Band. Set the bandwidth: b = max{10% min{m,n,400},20} Speed- up factor: Most melody strings are longer than 400, A factor of 10 b = 40

16 Search Scalability: MF Idea: Further speed up for ED computa=on Use M- tree indexing for range query Speed- up factor: a frac=on q is searched, we speed up the ED by 1/q

17 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

18 Build MidiFind System Goal: Set the thresholds Consider both search quality and search scalability S Whole set S ED(< ε ED ) Lev400 sc(< ε Lev ) Method: S precision = (S recall = (S S ) S S ) S F value = ( 1 precision + 1 recall ) 1 Compute precision, recall, and F- value as func=ons of thresholds Choose ε Lev = 306,which leads to the largest F- value Choose ε ED = 0.1, which balance a large recall and a small size S ED

19 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

20 Experiment: Dataset and machine Small labeled dataset: 325 different MIDI files, 79 unique composi=ons, 2289 similar pairs of MIDI files. Much bigger unlabeled dataset: 12484 MIDI files, free download from websites Machine: 3.06 GHz, 2- core(intel Core i3) imac with 4GB Memory

21 Experiment: Search quality ε ED (a) ED (b) Lev- 400sc (c) Standard- Lev ε Lev (d) MF

22 Best thresholds and their qualities

Experiment: Search scalability ED threshold (ε ED ) Vs. Speed- ups Fraction of surviving candidates 0.08 0.06 0.04 0.02 0 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 ED threshold (ε ED ) Ratio to linear scan 0.65 0.6 0.55 0.5 0.45 0.4 0.35 maximum lower bound approach minimum sum of radii approach 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 ED threshold (ε ED ) 23

24 Experiment: Search scalability A comparison of the searching =me of different methods Average query time (sec) 0.25 0.2 0.15 0.1 0.05 MF MF SC MF Q Lev 400 linear scan Lev linear scan 0 0 2000 4000 6000 8000 10000 12000 The size of MIDI dataset

25 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

26 Demo www.cmumidifind.com

27 Conclusion We present MidiFind, a MIDI query system for effec=ve and fast searching of MIDI databases. It is effec=ve: It achieve 99.5% precision and 89.8% recall, compared to pure Levinshtein distance measurement, which achieves 95.6% precision and 56.3% recall. It is fast: By using clipped melody representa=on, bag- of- words filtering, Sakoe- Chiba Band, and M- tree, we achieve speed- ups of factors of 10, 40, 10, and 1.05, respec=vely, which finally leads to a speed- up of about 4000.

28 Thanks! Q&A