Music Information Retrieval Opportunities for digital musicology Joren Six IPEM, University Ghent October 30, 2015 Introduction MIR Introduction Tasks Musical Information Tools Methods Overview I Tone Scale analysis: Tarsos Introduction Demo Pitch Class Histogram construction Confusing Concepts 2/64 Relating Timbre and Scale Conclusion Acoustic Fingerprinting: Panako Why Audio Fingerprinting? Demo System Design Opportunities for digital musicology Musical structure analysis Synchronization of audio streams Analysis of repertoire and techniques used in DJ-Sets Practical Audio Fingerprinting Overview II Bibliography 3/64 Introduction Goal Give an overview of the Music Information Retrieval research field while focusing on the opportunities for digital musicology. More detail about two MIR projects will be given: (i) Tarsos: tone scale extraction and analysis. (ii) Panako: acoustic fingerprinting. 4/64
MIR introduction Definition Music Information Retrieval (MIR) is the interdisciplinary science of extracting and processing information from music. MIR combines insights from musicology, computer science, library sciences, psychology, machine learning and cognitive sciences. 5/64 MIR introduction MIR tasks process Musical information. Musical information can be categorized into signals and symbols. Definition Signals are representations of analog manifestations and replicate perception. Symbols are discretized, limited and replicate content. Example: The task of transcribing a lecture is a conversion of a signal into the symbolic domain. An audio recording serves as input, a text is the output. The symbolic representation is easy to index but lacks nuance. 6/64 Tasks - Transcription Tasks - Structure analysis Fig: Music transcription Transcription Source separation Instrument recognition Polyphonic pitch estimation and chord detection Tempo and Rhythm extraction Signal symbolic Signal symbolic Fig: Structural analysis 7/64 8/64
Fig: Spotify automatically generates playlists based on listening behavior. Tasks - Music recommendation Music recommendation and automatic play-list generation. Content based: Signal symbolic. Based on (listening) behavior: Symbolic symbolic. Tasks - Other Tasks Score following: automatic score page turning or trigger effects based on musical content. Emotion recognition: label audio according to emotional content. Automatic Cover song identification. Optical music recognition: convert images of scores to digital scores. Symbolic music retrieval. Automatic genre recognition. 9/64 MIR Tasks Most tasks enable to browse, categorize, query, discover music in large databases. 10/64 Signals Recorded musical performances Video Audio MIDI Motion capture Scans of scores Musical Information Symbols Meta-data Artist Title Album-name Label Composer Instrumentation Lyrics Tags, reviews, ratings Digitized scores 11/64 Musical Information - Examples Digital representations of Liszt s Liebestraum No.3. Fig: Scanned score of Liszt s Liebestraum No.3. Scanned score MusicXML score MIDI synthesis MIDI performance Audio recording of a performance Arthur Rubinstein Daniel Barenboim 12/64
Musical Information Solved MIR Tasks Scores can be seen as a model of a performance. Quote I Essentially, all models are wrong, but some are useful. - George E. P. Box I I Monophonic pitch estimation [4, 9, 12] Content based audio search [18] Automatic Genre classification Models aim to reduce dimensions, complexity and improve understanding and readability. 14/64 13/64 Challenging Tasks Tools - Sonic Visualizer Sonic Visualizer offers a plugin-system with: I Beat tracking I Onset deteciton I Pitch tracking I Melody detection I Chord estimations Un-mix the mix Decomposing a mixed audio signal is very hard. Masking, overlapping partials make e.g. polyphonic pitch detection hard. Fig: How to unmix the mix? 15/64 Fig: Sonic Visualizer, an application for viewing and analysing the contents of music audio files. sonicvisualiser.org 16/64
Tools - Tartini Tools - Music21 Symbolic music queries: Specialized tool for pitch analysis Query rhythmic features Melodic contours Vibrato analysis Chord progressions,... Pitch contour Transcription http://web.mit.edu/music21/ Fig: Tartini an application for pitch analysis. http://miracle.otago.ac.nz/tartini 17/64 Fig: music21: programming environment for symbolic music analysis 18/64 Tools - Tarsos MIR Methods Fig: Tarsos: tone scale extraction and analysis Extracting and analysing tone scales from music. Tone scale extraction Tone scale analysis Transcription of ethnic music http://0110.be/software Fig: Input feature(s) feature processing output. 19/64 20/64
MIR Methods Bag of features approach to represent e.g. a musical genre. Sometimes more than 100 features are used[8]. MFCC, timbral characteristic Spectral centroid Spectral moment Zero crossing rate Number of low energy frames Autocorrelation lag Frequency... 21/64 Methodological problems MIR research is often limited by (over?) simplification: It focuses mainly on classical western art music or popular music with ethnocentric terminology like scores, chords, tone scale, chromagrams, instrumentation, rhythmical structures. It is mainly goal oriented and pragmatic (MIREX) without explaining processes[1]. More engineering than science? Unclear which features correlate with which cognitive processes. It is mainly concerned with a limited, disembodied view on music: disregarding social interaction, movement, dance, the body, individual or cultural preferences. 22/64 Methodological problems Introduction Quote Essentially, all MIR-research is wrong, but some is useful. - Me Tarsos Tarsos[14, 15] is a tool to extract, analyze and document tone scales and tone scale diversity. What follows are two examples of what aims to be useful MIR-research. It is mainly useful for analyzing music with an undocumented tone-scale. This is the case for a lot of ethinic music. 23/64 24/64
ESTIMATIONS Selection PITCH HISTOGRAM (PH) MIDI Tuning Dump Timespan Pitch range Filtering PEAK DETECTION Keep estimations near to pitch classes Steady state filter PITCH CLASS HISTOGRAM (PCH) A B C D E A B C D x x x x x x x x x x x x x x x x E x x x x Introduction Demo Tarsos was developed to analyze the dataset of the museum for Central Africa, Tervuren 30000 digitized sound recordings 3000 hours of music Meta-data database with contextual data Fig: Locations of recordings 25/64 Fig: Tarsos live demonstration 26/64 Demo INPUT AUDIO Pitch detection YIN MAMI VAMP... Pitch Class Histogram construction SCALA OUTPUT SIGNAL TRANSLATION CSV estimations Resynthesized annotations PH graphic PH CSV PCH graphic PCH CSV Pitch (cent) 7083 A4 5883 4683 SYMBOLIC PITCH INTERVAL TABLE Scala CSV pitch classes MIDI OUT 3600 0 2 4 6 8 10 Time (seconds) Fig: Step 1, pitch estimation. Fig: Tarsos block diagram. 27/64 28/64
Pitch Class Histogram construction Pitch Class Histogram construction Number of estimations 400 300 200 100 4683 5883 7083 A4 P itch (cent) Number of estimations 600 500 400 300 200 100 0 107 363 585 833A 1083 P itch (cent) Fig: Step 2, pitch histogram creation. 29/64 Fig: Step 3, pitch class histogram creation. 30/64 Number of estimations 600 500 400 300 200 100 696 318 378 168 168 285 453 771 939 1149 Pitch (cent) Examples Fig: A unequally divided pentatonic tone scale with a near perfect fifth consisting of a pure minor and pure major third. 31/64 Pitch (absolute cents) 7000 6000 5000 4000 0 2 4 6 8 10 12 Time (seconds) Concept of tone scale Pitch (absolute cents) 7000 6500 6000 5500 5000 4500 50 100 150 200 250 Time (seconds) Fig: Pitch steps shift upwards during a Finnish joik. 900 32/64
Concept of Tone Concept of Tone II Pitch (absolute cents) 8400 8200 8000 7860 7680 7400 0 0.5 1 1.5 Time (seconds) Frequency of occurence (#) 500 400 300 200 100 0 7200 7400 7680 7860 8200 8400 Pitch (absolute cents) Fig: Tonal center of Western vibrato. 33/64 Pitch (absolute cents) 6200 6000 5800 5600 5400 5200 0 0.2 0.4 0.6 0.8 1 1.2 Time (seconds) Frequency of Occurence (#) 300 200 100 0 4800 5000 5200 5400 5600 5800 6000 Pitch (absolute cents) Fig: Pitch gesture in an Indian raga. 34/64 Concept of Tuning Relating Timbre and Scale Number of estimations 600 500 400 300 200 100 600 237 249 411 477 639 669 813 861 Pitch (cent) Fig: Detuning of a mono-chord during performance. Question Why are some tones scales or pitch intervals much more popular than others? Why are instruments tuned the way they are? There is a theory[13, 10] that relates scale and timbre. The theory identifies points of maximum consonance that can be used to construct an optimal 1 scale. 35/64 1 In terms of consonance 36/64
Relating Timbre and Scale Relating Timbre and Scale Sensory Dissonance Cents 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1 0.5 Perfect fourth Perfect f fth Octave 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Frequency Ratio Fig: Dissonance curve for idealized harmonic instrument. 37/64 Fig: Screenshot of automatic timbre-scale mapping. 38/64 Relating Timbre and Scale Conclusion The consonance theory is currently not well supported by measurements. The dataset with African music has a large diversity in instrumentation and tone scales and offers an opportunity to support the theory. Question Tarsos offers opportunities to answer basic musicological questions: Is there a change in tone scale use over time? Is the 100 cents interval used more in recent years? Is there an acculturation effect? Is there a systematic relation between timbre and scale? 39/64 40/64
Audio What is Acoustic Fingerprinting Feature Extraction Features Fingerprint Fingerprint Construction Other Fingerprints Matching Identified Audio Figure: A generalized audio fingerprinter scheme. 1. Audio is fed into the system, 2. Features are extracted and fingerprints constructed 3. The fingerprints are compared with a database containing fingerprints of reference audio. 4. The audio is either identified or, if no match is found, labeled as unknown. 41/64 Identifying short audio fragments Why Audio Fingerprinting? Duplicate detection in large digital music archives Digital rights management applications (SABAM) Music structure analysis Analysis of techniques and repertoire in DJ-sets Synchronization of audio (and video) streams Alignment of extracted features with audio[17] Fig: Shazam music recognition service 42/64 Demo Panako System Design Panako[16] Fig: Spectrogram in Aphex Twin s Windowlicker Current audio fingerprinting systems use fingerprints based on: Spectral Peaks [18, 16, 6] Onsets in spectral bands [5] Other features [2, 7, 11, 3] 43/64 44/64
System Design System Design Fig: Step 1, extracting spectral peaks. Fig: Step 2, creating fingerprints by combining spectral peaks. 45/64 46/64 System Design Opportunities for digital musicology Acoustic fingerprinting can provide opportunities for digital musicology: 1. Analysis of repetition within songs 2. Comparison of versions/edits 3. Audio and audio feature alignment to share datasets 4. DJ-set analysis 47/64 48/64
Musical structure analysis Radio Edit vs. Original Fig: Repetition in Ribs Out by Fuck Buttons 2. 2 Unfortunately the best example I could find 49/64 Fig: Radio edit vs. original version of Daft Punk s Get Lucky. 50/64 Exact Repetition Over Time Synchronization of audio streams Fig: How much cut-and-paste is used on average for a set of 20000 recordings. 51/64 Fig: Two similar audio streams out of sync Audio synchronization can be used for: Aligning unsynchronized audio streams from several microphones Aligning video footage by using audio Aligning audio and extracted features Aligning audio and data[17] 52/64
Synchronization of audio streams Fig: Microphone placement for symphonic orchestra and synchronization Audio synchronization using acoustic fingerprinting is submillisecond accurate. If microphone placement spans several meters and with the speed of sound being 340.29m/s: Distance (m) Delay (ms) 1 3 2 6 3 9 53/64 Analysis of repertoire and techniques used in DJ-Sets Fig: a DJ An extension of the spectral peak fingerprinting method allows time-stretching, pitch-shifting and tempo change[16]. Given a DJ-set and reference audio a the following can be extracted automatically: Which parts of which songs were played and for how long Which modifications were applied (percentage modification of time and frequency) a Tracklists of DJ-Sets can be found on http://www.1001tracklists.com/ 54/64 Practical Audio Fingerprinting Panako[16] was used to generate the example data 3, an open source audio fingerprinting system available on http://panako.be. These subapplications of Panako were used: monitor during the live demo. compare for the comparison, structure analysis. monitor can also be used for DJ-set analysis. Other usable fingerprinters are audfprint and echoprint. 3 Some methods implemented within Panako are patented (US6990453). 55/64 Bibliography I Pedro Cano, Eloi Batlle, Ton Kalker, and Jaap Haitsma. A review of audio fingerprinting. The Journal of VLSI Signal Processing, 41:271 284, 2005. Michele Covell and Shumeet Baluja. Known-Audio Detection using Waveprint: Spectrogram Fingerprinting by Wavelet Hashing. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), 2007. 56/64
Bibliography II Alain de Cheveigné and Kawahara Hideki. YIN, a Fundamental Frequency Estimator for Speech and Music. The Journal of the Acoustical Society of America, 111(4):1917 1930, 2002. Dan Ellis, Brian Whitman, and Alastair Porter. Echoprint - an open music identification service. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR 2011), 2011. 57/64 Bibliography III Sébastien Fenet, Gaël Richard, and Yves Grenier. A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR 2011), pages 121 126, 2011. Jaap Haitsma and Ton Kalker. A highly robust audio fingerprinting system. In Proceedings of the 3th International Symposium on Music Information Retrieval (ISMIR 2002), 2002. 58/64 Bibliography IV Marc Leman, Dirk Moelants, Matthias Varewyck, Frederik Styns, Leon van Noorden, and Jean-Pierre Martens. Activating and relaxing music entrains the speed of beat synchronized walking. PLoS ONE, 8(7):e67932, 07 2013. Phillip McLeod and Geoff Wyvill. A Smarter Way to Find Pitch. In Proceedings of the International Computer Music Conference (ICMC 2005), 2005. 59/64 Bibliography V R. Plomp and W. J. Levelt. Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America, 38:548 560, 1965. M. Ramona and G. Peeters. AudioPrint: An efficient audio fingerprint system based on a novel cost-less synchronization scheme. In Proceedings of the 2013 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2013), pages 818 822, 2013. 60/64
Bibliography VI Bibliography VII M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H. J. Manley. Average Magnitude Difference Function Pitch Extractor. IEEE Trans. on Acoustics, Speech, and Signal Processing, 22(5):353 362, October 1974. William A. Sethares. Tuning Timbre Spectrum Scale. Springer, 2 edition, 2005. 61/64 Joren Six and Olmo Cornelis. Tarsos - a Platform to Explore Pitch Scales in Non-Western and Western Music. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR 2011), 2011. Joren Six, Olmo Cornelis, and Marc Leman. Tarsos, a Modular Platform for Precise Pitch Analysis of Western and Non-Western Music. Journal of New Music Research, 42(2):113 129, 2013. 62/64 Bibliography VIII Bibliography IX Joren Six and Marc Leman. Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification. In Proceedings of the 15th ISMIR Conference (ISMIR 2014), 2014. Joren Six and Marc Leman. Synchronizing Multimodal Recordings Using Audio-To-Audio Alignment. Journal of Multimodal User Interfaces, 9(3):223 229, 2015. 63/64 Avery L. Wang. An Industrial-Strength Audio Search Algorithm. In Proceedings of the 4th International Symposium on Music Information Retrieval (ISMIR 2003), pages 7 13, 2003. 64/64