MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases

Size: px

Start display at page:

Download "MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases"

Edmund Gray
6 years ago
Views:

1 1 MidiFind: Fast and Effec/ve Similarity Searching in Large MIDI Databases Gus Xia Tongbo Huang Yifei Ma Roger B. Dannenberg Christos Faloutsos Schools of Computer Science Carnegie Mellon University

2 Introduction: Background What is MIDI? Musical Instrument Digital Interface. A MIDI file doesn t carry the actual sound but rather the control informa=on.

2 2 Introduction: Background What is MIDI? Musical Instrument Digital Interface. A MIDI file doesn t carry the actual sound but rather the control informa=on. For piano pieces: pitch, velocity, start =me, and ending =me. What are similar MIDI files? Different performance versions of the same composi=on, including the pure quan=zed version. Why find similar MIDI files? Important for musicians and music amateurs. Widely distributed online.

3 3 Introduction: Goal Context: MIDI files are difficult to search by metadata due to careless or casual labeling. Idea: content-based retrieval Our goal: Given a query MIDI file, find all different performance versions (including pure quantized version) of the same composition The search should be effective and fast to deal with 1 million MIDI files.

4 4 Introduction: General approach

5 5 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

6 6 Search Quality Goal: Design features and corresponding measurements to reveal the similarity between different MIDI files. General Methods: Euclidean distance for Bag- of- words feature Modified Levenshtein distance for melody string feature

7 7 Search Quality: ED for BOW feature Bag- of- words feature: Word: note, where its octave and dura=on are ignored. Word count: normalized appearance =mes of a note. BOW feature: 12- dim vector, an empirical distribu=on over the pitch classes (0.3, 0, 0.1, 0.05, 0.1, 0.1, 0.02,0.2, 0,0.02, 0.01, 0.1 ) (C, C#, D, D#, E, F, F#,G, G#, A, A#, B ) Euclidean Distance for two vectors: 12 i=1 ED(a,b) = (a i b i ) 2

8 8 Search Quality: modified Levenshtein distance for melody string feature Melody string feature: Dis=nc=ve element to help people tell different music We simply use highest pitches at any given =me as the melody, where the dura=ons are ignored. Levenshtein distance for two strings:

9 Search Quality: cons of Levenshtein Problem: The distance correlates with the melody length The distribu=on over the length of melody strings follows a power law, with the mean of 1303 and standard devia=on of count Length of melody string 9

10 10 Search Quality: Lev-400 Solu=on: Turn melody strings into equal length Chopping and concatena=ng the first and last 200 notes Don t modify the strings which are shorter than 400, but scale up the Levenshtein distance Insights: A unified length will leads to a unified threshold Similar melodies tend to agree more at the beginning and the ending part.

11 11 Search Quality: Lev-400SC Observa=on: For similar melody strings, the string edi=ng path of smallest distances stays close to diagonal. Idea: We don t need to fill up the whole matrix Solu=on: Use a diagonal Sakoe- Chiba Band Sakoe, H. & Chiba, S. (1978). Dynamic programming spoken word recognition algorithm optimization for

12 12 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

13 13 Search Scalability Goal: Speed up the searching process since naïve linear scanning is very slow. General Methods: Combine different similarity measurements Use M- tree indexing

14 14 Search Scalability: MF-Q Idea: Combine ED and Lev- 400 First do linear scan for ED, filtering out most candidates Then do linear scan for Lev- 400 on the surviving candidates Speed- up factor: BOW filtering: a frac=on p remains, we speed up 1/p m n Clipped melody representa=on: 400 2

15 15 Search Scalability: MF-SC Idea: Combine ED and Lev- 400SC BOW filtering works the same Use diagonal Sakoe- Chiba Band. Set the bandwidth: b = max{10% min{m,n,400},20} Speed- up factor: Most melody strings are longer than 400, A factor of 10 b = 40

$indexing for range query Speed- up factor: a frac=on q is searched, we speed up the$

16 16 Search Scalability: MF Idea: Further speed up for ED computa=on Use M- tree indexing for range query Speed- up factor: a frac=on q is searched, we speed up the ED by 1/q

17 17 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

18 18 Build MidiFind System Goal: Set the thresholds Consider both search quality and search scalability S Whole set S ED(< ε ED ) Lev400 sc(< ε Lev ) Method: S precision = (S recall = (S S ) S S ) S F value = ( 1 precision + 1 recall ) 1 Compute precision, recall, and F- value as func=ons of thresholds Choose ε Lev = 306,which leads to the largest F- value Choose ε ED = 0.1, which balance a large recall and a small size S ED

19 19 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

20 20 Experiment: Dataset and machine Small labeled dataset: 325 different MIDI files, 79 unique composi=ons, 2289 similar pairs of MIDI files. Much bigger unlabeled dataset: MIDI files, free download from websites Machine: 3.06 GHz, 2- core(intel Core i3) imac with 4GB Memory

21 21 Experiment: Search quality ε ED (a) ED (b) Lev- 400sc (c) Standard- Lev ε Lev (d) MF

22 22 Best thresholds and their qualities

23 Experiment: Search scalability ED threshold (ε ED ) Vs. Speed- ups Fraction of surviving candidates ED threshold (ε ED ) Ratio to linear scan maximum lower bound approach minimum sum of radii approach ED threshold (ε ED ) 23

24 24 Experiment: Search scalability A comparison of the searching =me of different methods Average query time (sec) MF MF SC MF Q Lev 400 linear scan Lev linear scan The size of MIDI dataset

25 25 Outline Introduc/on Search Quality Search Scalability Build MidiFind System Experiments Demo Conclusion

26 26 Demo

27 27 Conclusion We present MidiFind, a MIDI query system for effec=ve and fast searching of MIDI databases. It is effec=ve: It achieve 99.5% precision and 89.8% recall, compared to pure Levinshtein distance measurement, which achieves 95.6% precision and 56.3% recall. It is fast: By using clipped melody representa=on, bag- of- words filtering, Sakoe- Chiba Band, and M- tree, we achieve speed- ups of factors of 10, 40, 10, and 1.05, respec=vely, which finally leads to a speed- up of about 4000.

28 28 Thanks! Q&A

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based