Music Segmentation Using Markov Chain Methods

Similar documents
Tempo and Beat Analysis

Query By Humming: Finding Songs in a Polyphonic Database

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Chord Classification of an Audio Signal using Artificial Neural Network

Robert Alexandru Dobre, Cristian Negrescu

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

MODELS of music begin with a representation of the

Music Radar: A Web-based Query by Humming System

Analysis of local and global timing and pitch change in ordinary

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Music Structure Analysis

Automatic music transcription

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

Hidden Markov Model based dance recognition

Automatic Piano Music Transcription

Computational Modelling of Harmony

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL

Voice & Music Pattern Extraction: A Review

CSC475 Music Information Retrieval

A Bayesian Network for Real-Time Musical Accompaniment

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

TRACKING THE ODD : METER INFERENCE IN A CULTURALLY DIVERSE MUSIC CORPUS

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

A NEW LOOK AT FREQUENCY RESOLUTION IN POWER SPECTRAL DENSITY ESTIMATION. Sudeshna Pal, Soosan Beheshti

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

SEGMENTATION, CLUSTERING, AND DISPLAY IN A PERSONAL AUDIO DATABASE FOR MUSICIANS

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Audio-Based Video Editing with Two-Channel Microphone

2. AN INTROSPECTION OF THE MORPHING PROCESS

A probabilistic framework for audio-based tonal key and chord recognition

A DISCRETE MIXTURE MODEL FOR CHORD LABELLING

Toward Automatic Music Audio Summary Generation from Signal Analysis

Topic 10. Multi-pitch Analysis

Algorithms for melody search and transcription. Antti Laaksonen

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Classification of Timbre Similarity

A New Method for Calculating Music Similarity

THE importance of music content analysis for musical

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

MUSI-6201 Computational Music Analysis

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Tempo and Beat Tracking

Sudhanshu Gautam *1, Sarita Soni 2. M-Tech Computer Science, BBAU Central University, Lucknow, Uttar Pradesh, India

Transcription of the Singing Melody in Polyphonic Music

Pitch correction on the human voice

FFT Laboratory Experiments for the HP Series Oscilloscopes and HP 54657A/54658A Measurement Storage Modules

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Audio Structure Analysis

... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University

Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases *

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

CS229 Project Report Polyphonic Piano Transcription

Singing voice synthesis based on deep neural networks

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Creating Data Resources for Designing User-centric Frontends for Query by Humming Systems

Singer Traits Identification using Deep Neural Network

Pitch-Synchronous Spectrogram: Principles and Applications

Pattern Recognition in Music

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Processing. Electrical Engineering, Department. IIT Kanpur. NPTEL Online - IIT Kanpur

Appendix A Types of Recorded Chords

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Music Information Retrieval with Temporal Features and Timbre

Music Structure Analysis

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Statistical Modeling and Retrieval of Polyphonic Music

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Time Series Models for Semantic Music Annotation Emanuele Coviello, Antoni B. Chan, and Gert Lanckriet

HUMANS have a remarkable ability to recognize objects

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

Audio Structure Analysis

HarmonyMixer: Mixing the Character of Chords among Polyphonic Audio

Interacting with a Virtual Conductor

Subjective Similarity of Music: Data Collection for Individuality Analysis

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Audio Structure Analysis

Probabilist modeling of musical chord sequences for music analysis

Automatic Music Clustering using Audio Attributes

Supervised Learning in Genre Classification

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Music Theory: A Very Brief Introduction

Creating a Feature Vector to Identify Similarity between MIDI Files

Simple Harmonic Motion: What is a Sound Spectrum?

/$ IEEE

Automatic Music Transcription: The Use of a. Fourier Transform to Analyze Waveform Data. Jake Shankman. Computer Systems Research TJHSST. Dr.

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Transcription:

Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some of the techniques being used to segment audio, including the Wolff- Gibbs Algorithm, Hidden Markov Models, and Prior Distribution. Finally, we will discuss the possibilities for improvement in segmentation and the various uses for such a process. 1 Introduction Music segmentation is a process by which specialists in digital music are extracting data from sound waves and using the base units of sound (pitch, rhythm, and volume) to separate segments of a song from one another. There are many different types of segments, as is exhibited in Figure 1.

Fig. 1 Examples of different types of segmentation The level of segmentation we will be looking for is of the highest order, as is shown in the top line of the Figure. The smaller chunks, though, inform us when separating into its largest chunks. Our goal will be to separate a popular song into its introduction, verse(s), chorus(es), bridge, transitions, and conclusion. 2 Spectrogram Creation Previous attempts at music segmentation involved segmenting by spectral shape, segmenting by harmony, and segmenting by pitch and rhythm. While these methods exhibited some amount of success, they generally resulted in over- segmentation. The first step of every processing chain is to take the sinusoidal waveform of the sample audio and convert it into a spectrogram that can more easily be used to extract information. We will use a constant- Q transform to map the frequencies of Western harmony, which are geometrically spaced. A constant- Q transform, as opposed to a discrete Fourier transform, will provide us with a log frequency representation for the spectral components and makes the process of recognizing timbre, instrument, and speech differences. The Q constant is determined based on the sampling rate (usually in frames per second) and the amount of pitches desired

within a single octave. A constant- Q, logarithmically- banded spectrogram with 1/12 octave- wide bands will provide the best representation of pitch since Western harmony is based on 12 equally spaced tones within every octave. Let z m (n ) denote the mth band of the nth short- term log- frequency power spectrum in the sequence, with M bands in total, M w (n ) = ( (logz (n) m ) 2 ) 1/ 2 (n and u ) m = logz m m =1 The M sequences are then collected into an array w (n ) (n ) X after subtracting the mean of each band to make sure each rows sums to zero. This is why this method is called mean- based clustering. Fig. 2 Example of the logarithmically banded constant- Q spectrogram 3 Hidden Markov Models The next step is to construct a song- specific Hidden Markov Model to perform the actual segmentation. An HMM is a Markov Chain where the state space Ω is hidden (in the case of segmentation, Ω is the various types of segments), but there is a string of known observations resulting from the order of the chain (the audio data retrieved from the spectrogram). Using the string of observations, the HMM can train itself using Baum- Welch training to decide the most probable state- sequence for the given observations. Left to its own devices the HMM will segment a song into clusters like this:

Fig. 3 Segmentation by clustering Each different colored rectangle is a segment in Figure 3. While the song is certainly starting to take shape, it is exhibiting over- segmentation because the HMM is not trained to cluster on a large time- scale. Given a sequence {a,a,a,a,b,b,c,d,c,d,c,d,c,d}, the HMM will segment it to be (4a, 2b, c, d, c, d, c, d, c, d). It is clear, though, that the repeated c,d section is probably a 4cd segment rather than 8 distinct segments. Therefore, we must find a way to reduce the amount of fragmentation. 4 Temporal Coherence A common tool in Bayesian statistics is the prior distribution. A prior can be decided arbitrarily or through an informed process. Our prior for segment duration needs to favor longer segments so as to avoid over- segmentation (Fig. 3), a method first described by Abdallah, Rhodes, Sandler and Casey in 2005. This can be modeled fairly well by an inverse Gamma discrete- time duration model p p D (d) = IG (τ 1 fd M (d) γ) L p (τ 1 fd M (l) γ) IG l =1 where τ is a scale factor representing the most likely segment length, which was determined to be 20 seconds in a popular song.

Fig. 4 The resulting prior distribution, τ = 20 Now using a Gibbs sampler, informed by the durational prior, we can update and expand the segments in Fig. 3. To do this, we will use a Wolff- Gibbs algorithm with block- updates. This algorithm will simulate Ising, Potts and x y systems while avoiding the critical slowing down of a normal Gibbs sampler. We will converge upon the results we are seeking with the following steps: 1.) Choose a seed site based on temperature T and configuration so that the proposed step will be accepted with probability = 1. 2.) Expand the block left and right into a band of contiguous sites. 3.) Stop when a boundary is reached or with probability α after each step (one frame of the audio), based on the duration prior. 5 Results After running the algorithm, the resulting segmentation proved to be much more accurate. (a) (b) Fig. 5 (a) The final machine segmentation, (b) the ground- truth annotation, as determined by expert listeners

We compare the experimental results to the ground- truth, by computing overlap between the segments in the two graphs. This is known as the Directional Hamming Distance (DHD). For every section S i in the machine segmentation, find the section of maximum overlap S k in the ground- truth segmentation and sum the overlaps over every section. d GM = S M i S M i S G k S G j S G k We can use the DHD from ground- truth to machine and machine to ground- truth to come up with metrics for Precision P and Recall R. The results of the experiment for many songs looked like this, Fig. 6 Success of the temporal coherence model with the better results tending towards (1,1).

6 Conclusions and Implications While Abdallah, et al. have found a way to segment popular music with great accuracy, the real challenge and academic interest is in the analysis of classical music. Consider a database of 100s of recordings of Grieg s First Piano Concerto. If we could machine segment those recordings, we could actually quantify the significance of particular recordings and performers in setting tempo and expression standards. This tool could be of great use to musicologists. One could also track trends over the history of music. For instance, we could track average segment length from classical music up to the music of today and prove a definite diminishing of attention span to musical ideas over the past 100 years. To make these applications possible, it will be necessary to develop a different model for segmentation. Temporal coherence won t suffice because of the great variation in durational expectations in classical music. It will become necessary to better hone the durational prior, but there may be too many variables in classical music for this to be possible. References [1] Abdallah, Samer (2006). "Using duration models to reduce fragmentation in audio segmentation". Machine learning (0885-6125), 65 (2-3), p.485. [2] Aucouturier, J.- J. (2005). ""The way it Sounds": timbre models for analysis and retrieval of music signals". IEEE transactions on multimedia(1520-9210), 7 (6), p. 1028. [3] Haggstrom, Olle. Finite Markov Chains and Algorithmic Applications. Cambridge [u.a.: Cambridge UP, 2008. Print. [4] Rhodes, C. (2006). "A Markov- Chain Monte- Carlo Approach to Musical Audio Segmentation". Acoustics, Speech, and Signal Processing (ICASSP), International Conference on (1520-6149), 5, p. V.