Music Information Retrieval

Similar documents
Towards the tangible: microtonal scale exploration in Central-African music

Introductions to Music Information Retrieval

MUSI-6201 Computational Music Analysis

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Outline. Why do we classify? Audio Classification

Robert Alexandru Dobre, Cristian Negrescu

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Music Similarity and Cover Song Identification: The Case of Jazz

Applications of duplicate detection in music archives: from metadata comparison to storage optimisation

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Analysing Musical Pieces Using harmony-analyser.org Tools

Music Radar: A Web-based Query by Humming System

arxiv: v1 [cs.ir] 2 Aug 2017

Rechnergestützte Methoden für die Musikethnologie: Tool time!

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

CSC475 Music Information Retrieval

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

HST 725 Music Perception & Cognition Assignment #1 =================================================================

Beethoven, Bach, and Billions of Bytes

Towards a Complete Classical Music Companion

Music Information Retrieval. Juan P Bello

Statistical Modeling and Retrieval of Polyphonic Music

Voice & Music Pattern Extraction: A Review

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Automatic Music Clustering using Audio Attributes

Automatic Identification of Samples in Hip Hop Music

Effects of acoustic degradations on cover song recognition

Music Representations

Transcription of the Singing Melody in Polyphonic Music

THE importance of music content analysis for musical

Tool-based Identification of Melodic Patterns in MusicXML Documents

A prototype system for rule-based expressive modifications of audio recordings

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

jsymbolic 2: New Developments and Research Opportunities

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Tempo and Beat Analysis

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Musical Entrainment Subsumes Bodily Gestures Its Definition Needs a Spatiotemporal Dimension

A probabilistic framework for audio-based tonal key and chord recognition

Music Information Retrieval. Juan Pablo Bello MPATE-GE 2623 Music Information Retrieval New York University

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

Tempo and Beat Tracking

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

Evaluating Melodic Encodings for Use in Cover Song Identification

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

The Million Song Dataset

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Automatic music transcription

Melody Retrieval On The Web

TExES Music EC 12 (177) Test at a Glance

Music Information Retrieval

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Music Information Retrieval with Temporal Features and Timbre

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Music Processing Introduction Meinard Müller

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Harmonic Generation based on Harmonicity Weightings

Lecture 15: Research at LabROSA

Lecture 9 Source Separation

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

Data Driven Music Understanding

A repetition-based framework for lyric alignment in popular songs

Music Information Retrieval

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam

A multi-modal platform for semantic music analysis: visualizing audio- and score-based tension

The MAMI Query-By-Voice Experiment Collecting and annotating vocal queries for music information retrieval

Probabilist modeling of musical chord sequences for music analysis

MHSIB.5 Composing and arranging music within specified guidelines a. Creates music incorporating expressive elements.

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Representing, comparing and evaluating of music files

Automatic Piano Music Transcription

Music Information Retrieval Using Audio Input

A Survey on Music Retrieval Systems Using Survey on Music Retrieval Systems Using Microphone Input. Microphone Input

MUSIC is a ubiquitous and vital part of the lives of billions

Speech To Song Classification

Audio Feature Extraction for Corpus Analysis

Topic 4. Single Pitch Detection

Binning based algorithm for Pitch Detection in Hindustani Classical Music

Chord Classification of an Audio Signal using Artificial Neural Network

Transcription An Historical Overview

Automatic Rhythmic Notation from Single Voice Audio Sources

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

A REAL-TIME SIGNAL PROCESSING FRAMEWORK OF MUSICAL EXPRESSIVE FEATURE EXTRACTION USING MATLAB

Singer Recognition and Modeling Singer Error

Music Information Retrieval (MIR)

AUTOMASHUPPER: AN AUTOMATIC MULTI-SONG MASHUP SYSTEM

CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION

Transcription:

Music Information Retrieval Opportunities for digital musicology Joren Six IPEM, University Ghent October 30, 2015 Introduction MIR Introduction Tasks Musical Information Tools Methods Overview I Tone Scale analysis: Tarsos Introduction Demo Pitch Class Histogram construction Confusing Concepts 2/64 Relating Timbre and Scale Conclusion Acoustic Fingerprinting: Panako Why Audio Fingerprinting? Demo System Design Opportunities for digital musicology Musical structure analysis Synchronization of audio streams Analysis of repertoire and techniques used in DJ-Sets Practical Audio Fingerprinting Overview II Bibliography 3/64 Introduction Goal Give an overview of the Music Information Retrieval research field while focusing on the opportunities for digital musicology. More detail about two MIR projects will be given: (i) Tarsos: tone scale extraction and analysis. (ii) Panako: acoustic fingerprinting. 4/64

MIR introduction Definition Music Information Retrieval (MIR) is the interdisciplinary science of extracting and processing information from music. MIR combines insights from musicology, computer science, library sciences, psychology, machine learning and cognitive sciences. 5/64 MIR introduction MIR tasks process Musical information. Musical information can be categorized into signals and symbols. Definition Signals are representations of analog manifestations and replicate perception. Symbols are discretized, limited and replicate content. Example: The task of transcribing a lecture is a conversion of a signal into the symbolic domain. An audio recording serves as input, a text is the output. The symbolic representation is easy to index but lacks nuance. 6/64 Tasks - Transcription Tasks - Structure analysis Fig: Music transcription Transcription Source separation Instrument recognition Polyphonic pitch estimation and chord detection Tempo and Rhythm extraction Signal symbolic Signal symbolic Fig: Structural analysis 7/64 8/64

Fig: Spotify automatically generates playlists based on listening behavior. Tasks - Music recommendation Music recommendation and automatic play-list generation. Content based: Signal symbolic. Based on (listening) behavior: Symbolic symbolic. Tasks - Other Tasks Score following: automatic score page turning or trigger effects based on musical content. Emotion recognition: label audio according to emotional content. Automatic Cover song identification. Optical music recognition: convert images of scores to digital scores. Symbolic music retrieval. Automatic genre recognition. 9/64 MIR Tasks Most tasks enable to browse, categorize, query, discover music in large databases. 10/64 Signals Recorded musical performances Video Audio MIDI Motion capture Scans of scores Musical Information Symbols Meta-data Artist Title Album-name Label Composer Instrumentation Lyrics Tags, reviews, ratings Digitized scores 11/64 Musical Information - Examples Digital representations of Liszt s Liebestraum No.3. Fig: Scanned score of Liszt s Liebestraum No.3. Scanned score MusicXML score MIDI synthesis MIDI performance Audio recording of a performance Arthur Rubinstein Daniel Barenboim 12/64

Musical Information Solved MIR Tasks Scores can be seen as a model of a performance. Quote I Essentially, all models are wrong, but some are useful. - George E. P. Box I I Monophonic pitch estimation [4, 9, 12] Content based audio search [18] Automatic Genre classification Models aim to reduce dimensions, complexity and improve understanding and readability. 14/64 13/64 Challenging Tasks Tools - Sonic Visualizer Sonic Visualizer offers a plugin-system with: I Beat tracking I Onset deteciton I Pitch tracking I Melody detection I Chord estimations Un-mix the mix Decomposing a mixed audio signal is very hard. Masking, overlapping partials make e.g. polyphonic pitch detection hard. Fig: How to unmix the mix? 15/64 Fig: Sonic Visualizer, an application for viewing and analysing the contents of music audio files. sonicvisualiser.org 16/64

Tools - Tartini Tools - Music21 Symbolic music queries: Specialized tool for pitch analysis Query rhythmic features Melodic contours Vibrato analysis Chord progressions,... Pitch contour Transcription http://web.mit.edu/music21/ Fig: Tartini an application for pitch analysis. http://miracle.otago.ac.nz/tartini 17/64 Fig: music21: programming environment for symbolic music analysis 18/64 Tools - Tarsos MIR Methods Fig: Tarsos: tone scale extraction and analysis Extracting and analysing tone scales from music. Tone scale extraction Tone scale analysis Transcription of ethnic music http://0110.be/software Fig: Input feature(s) feature processing output. 19/64 20/64

MIR Methods Bag of features approach to represent e.g. a musical genre. Sometimes more than 100 features are used[8]. MFCC, timbral characteristic Spectral centroid Spectral moment Zero crossing rate Number of low energy frames Autocorrelation lag Frequency... 21/64 Methodological problems MIR research is often limited by (over?) simplification: It focuses mainly on classical western art music or popular music with ethnocentric terminology like scores, chords, tone scale, chromagrams, instrumentation, rhythmical structures. It is mainly goal oriented and pragmatic (MIREX) without explaining processes[1]. More engineering than science? Unclear which features correlate with which cognitive processes. It is mainly concerned with a limited, disembodied view on music: disregarding social interaction, movement, dance, the body, individual or cultural preferences. 22/64 Methodological problems Introduction Quote Essentially, all MIR-research is wrong, but some is useful. - Me Tarsos Tarsos[14, 15] is a tool to extract, analyze and document tone scales and tone scale diversity. What follows are two examples of what aims to be useful MIR-research. It is mainly useful for analyzing music with an undocumented tone-scale. This is the case for a lot of ethinic music. 23/64 24/64

ESTIMATIONS Selection PITCH HISTOGRAM (PH) MIDI Tuning Dump Timespan Pitch range Filtering PEAK DETECTION Keep estimations near to pitch classes Steady state filter PITCH CLASS HISTOGRAM (PCH) A B C D E A B C D x x x x x x x x x x x x x x x x E x x x x Introduction Demo Tarsos was developed to analyze the dataset of the museum for Central Africa, Tervuren 30000 digitized sound recordings 3000 hours of music Meta-data database with contextual data Fig: Locations of recordings 25/64 Fig: Tarsos live demonstration 26/64 Demo INPUT AUDIO Pitch detection YIN MAMI VAMP... Pitch Class Histogram construction SCALA OUTPUT SIGNAL TRANSLATION CSV estimations Resynthesized annotations PH graphic PH CSV PCH graphic PCH CSV Pitch (cent) 7083 A4 5883 4683 SYMBOLIC PITCH INTERVAL TABLE Scala CSV pitch classes MIDI OUT 3600 0 2 4 6 8 10 Time (seconds) Fig: Step 1, pitch estimation. Fig: Tarsos block diagram. 27/64 28/64

Pitch Class Histogram construction Pitch Class Histogram construction Number of estimations 400 300 200 100 4683 5883 7083 A4 P itch (cent) Number of estimations 600 500 400 300 200 100 0 107 363 585 833A 1083 P itch (cent) Fig: Step 2, pitch histogram creation. 29/64 Fig: Step 3, pitch class histogram creation. 30/64 Number of estimations 600 500 400 300 200 100 696 318 378 168 168 285 453 771 939 1149 Pitch (cent) Examples Fig: A unequally divided pentatonic tone scale with a near perfect fifth consisting of a pure minor and pure major third. 31/64 Pitch (absolute cents) 7000 6000 5000 4000 0 2 4 6 8 10 12 Time (seconds) Concept of tone scale Pitch (absolute cents) 7000 6500 6000 5500 5000 4500 50 100 150 200 250 Time (seconds) Fig: Pitch steps shift upwards during a Finnish joik. 900 32/64

Concept of Tone Concept of Tone II Pitch (absolute cents) 8400 8200 8000 7860 7680 7400 0 0.5 1 1.5 Time (seconds) Frequency of occurence (#) 500 400 300 200 100 0 7200 7400 7680 7860 8200 8400 Pitch (absolute cents) Fig: Tonal center of Western vibrato. 33/64 Pitch (absolute cents) 6200 6000 5800 5600 5400 5200 0 0.2 0.4 0.6 0.8 1 1.2 Time (seconds) Frequency of Occurence (#) 300 200 100 0 4800 5000 5200 5400 5600 5800 6000 Pitch (absolute cents) Fig: Pitch gesture in an Indian raga. 34/64 Concept of Tuning Relating Timbre and Scale Number of estimations 600 500 400 300 200 100 600 237 249 411 477 639 669 813 861 Pitch (cent) Fig: Detuning of a mono-chord during performance. Question Why are some tones scales or pitch intervals much more popular than others? Why are instruments tuned the way they are? There is a theory[13, 10] that relates scale and timbre. The theory identifies points of maximum consonance that can be used to construct an optimal 1 scale. 35/64 1 In terms of consonance 36/64

Relating Timbre and Scale Relating Timbre and Scale Sensory Dissonance Cents 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1 0.5 Perfect fourth Perfect f fth Octave 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 Frequency Ratio Fig: Dissonance curve for idealized harmonic instrument. 37/64 Fig: Screenshot of automatic timbre-scale mapping. 38/64 Relating Timbre and Scale Conclusion The consonance theory is currently not well supported by measurements. The dataset with African music has a large diversity in instrumentation and tone scales and offers an opportunity to support the theory. Question Tarsos offers opportunities to answer basic musicological questions: Is there a change in tone scale use over time? Is the 100 cents interval used more in recent years? Is there an acculturation effect? Is there a systematic relation between timbre and scale? 39/64 40/64

Audio What is Acoustic Fingerprinting Feature Extraction Features Fingerprint Fingerprint Construction Other Fingerprints Matching Identified Audio Figure: A generalized audio fingerprinter scheme. 1. Audio is fed into the system, 2. Features are extracted and fingerprints constructed 3. The fingerprints are compared with a database containing fingerprints of reference audio. 4. The audio is either identified or, if no match is found, labeled as unknown. 41/64 Identifying short audio fragments Why Audio Fingerprinting? Duplicate detection in large digital music archives Digital rights management applications (SABAM) Music structure analysis Analysis of techniques and repertoire in DJ-sets Synchronization of audio (and video) streams Alignment of extracted features with audio[17] Fig: Shazam music recognition service 42/64 Demo Panako System Design Panako[16] Fig: Spectrogram in Aphex Twin s Windowlicker Current audio fingerprinting systems use fingerprints based on: Spectral Peaks [18, 16, 6] Onsets in spectral bands [5] Other features [2, 7, 11, 3] 43/64 44/64

System Design System Design Fig: Step 1, extracting spectral peaks. Fig: Step 2, creating fingerprints by combining spectral peaks. 45/64 46/64 System Design Opportunities for digital musicology Acoustic fingerprinting can provide opportunities for digital musicology: 1. Analysis of repetition within songs 2. Comparison of versions/edits 3. Audio and audio feature alignment to share datasets 4. DJ-set analysis 47/64 48/64

Musical structure analysis Radio Edit vs. Original Fig: Repetition in Ribs Out by Fuck Buttons 2. 2 Unfortunately the best example I could find 49/64 Fig: Radio edit vs. original version of Daft Punk s Get Lucky. 50/64 Exact Repetition Over Time Synchronization of audio streams Fig: How much cut-and-paste is used on average for a set of 20000 recordings. 51/64 Fig: Two similar audio streams out of sync Audio synchronization can be used for: Aligning unsynchronized audio streams from several microphones Aligning video footage by using audio Aligning audio and extracted features Aligning audio and data[17] 52/64

Synchronization of audio streams Fig: Microphone placement for symphonic orchestra and synchronization Audio synchronization using acoustic fingerprinting is submillisecond accurate. If microphone placement spans several meters and with the speed of sound being 340.29m/s: Distance (m) Delay (ms) 1 3 2 6 3 9 53/64 Analysis of repertoire and techniques used in DJ-Sets Fig: a DJ An extension of the spectral peak fingerprinting method allows time-stretching, pitch-shifting and tempo change[16]. Given a DJ-set and reference audio a the following can be extracted automatically: Which parts of which songs were played and for how long Which modifications were applied (percentage modification of time and frequency) a Tracklists of DJ-Sets can be found on http://www.1001tracklists.com/ 54/64 Practical Audio Fingerprinting Panako[16] was used to generate the example data 3, an open source audio fingerprinting system available on http://panako.be. These subapplications of Panako were used: monitor during the live demo. compare for the comparison, structure analysis. monitor can also be used for DJ-set analysis. Other usable fingerprinters are audfprint and echoprint. 3 Some methods implemented within Panako are patented (US6990453). 55/64 Bibliography I Pedro Cano, Eloi Batlle, Ton Kalker, and Jaap Haitsma. A review of audio fingerprinting. The Journal of VLSI Signal Processing, 41:271 284, 2005. Michele Covell and Shumeet Baluja. Known-Audio Detection using Waveprint: Spectrogram Fingerprinting by Wavelet Hashing. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP 2007), 2007. 56/64

Bibliography II Alain de Cheveigné and Kawahara Hideki. YIN, a Fundamental Frequency Estimator for Speech and Music. The Journal of the Acoustical Society of America, 111(4):1917 1930, 2002. Dan Ellis, Brian Whitman, and Alastair Porter. Echoprint - an open music identification service. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR 2011), 2011. 57/64 Bibliography III Sébastien Fenet, Gaël Richard, and Yves Grenier. A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR 2011), pages 121 126, 2011. Jaap Haitsma and Ton Kalker. A highly robust audio fingerprinting system. In Proceedings of the 3th International Symposium on Music Information Retrieval (ISMIR 2002), 2002. 58/64 Bibliography IV Marc Leman, Dirk Moelants, Matthias Varewyck, Frederik Styns, Leon van Noorden, and Jean-Pierre Martens. Activating and relaxing music entrains the speed of beat synchronized walking. PLoS ONE, 8(7):e67932, 07 2013. Phillip McLeod and Geoff Wyvill. A Smarter Way to Find Pitch. In Proceedings of the International Computer Music Conference (ICMC 2005), 2005. 59/64 Bibliography V R. Plomp and W. J. Levelt. Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America, 38:548 560, 1965. M. Ramona and G. Peeters. AudioPrint: An efficient audio fingerprint system based on a novel cost-less synchronization scheme. In Proceedings of the 2013 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2013), pages 818 822, 2013. 60/64

Bibliography VI Bibliography VII M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg, and H. J. Manley. Average Magnitude Difference Function Pitch Extractor. IEEE Trans. on Acoustics, Speech, and Signal Processing, 22(5):353 362, October 1974. William A. Sethares. Tuning Timbre Spectrum Scale. Springer, 2 edition, 2005. 61/64 Joren Six and Olmo Cornelis. Tarsos - a Platform to Explore Pitch Scales in Non-Western and Western Music. In Proceedings of the 12th International Symposium on Music Information Retrieval (ISMIR 2011), 2011. Joren Six, Olmo Cornelis, and Marc Leman. Tarsos, a Modular Platform for Precise Pitch Analysis of Western and Non-Western Music. Journal of New Music Research, 42(2):113 129, 2013. 62/64 Bibliography VIII Bibliography IX Joren Six and Marc Leman. Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification. In Proceedings of the 15th ISMIR Conference (ISMIR 2014), 2014. Joren Six and Marc Leman. Synchronizing Multimodal Recordings Using Audio-To-Audio Alignment. Journal of Multimodal User Interfaces, 9(3):223 229, 2015. 63/64 Avery L. Wang. An Industrial-Strength Audio Search Algorithm. In Proceedings of the 4th International Symposium on Music Information Retrieval (ISMIR 2003), pages 7 13, 2003. 64/64