POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Similar documents
Topic 10. Multi-pitch Analysis

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Robert Alexandru Dobre, Cristian Negrescu

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Transcription of the Singing Melody in Polyphonic Music

A prototype system for rule-based expressive modifications of audio recordings

Statistical Modeling and Retrieval of Polyphonic Music

Query By Humming: Finding Songs in a Polyphonic Database

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Experiments on musical instrument separation using multiplecause

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

Automatic music transcription

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

THE importance of music content analysis for musical

Effects of acoustic degradations on cover song recognition

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Polyphonic music transcription through dynamic networks and spectral pattern identification

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Transcription An Historical Overview

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Automatic Rhythmic Notation from Single Voice Audio Sources

CSC475 Music Information Retrieval

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

Topic 4. Single Pitch Detection

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Automatic Piano Music Transcription

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

IMPROVING GENRE CLASSIFICATION BY COMBINATION OF AUDIO AND SYMBOLIC DESCRIPTORS USING A TRANSCRIPTION SYSTEM

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Appendix A Types of Recorded Chords

2. AN INTROSPECTION OF THE MORPHING PROCESS

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

PERCEPTUALLY-BASED EVALUATION OF THE ERRORS USUALLY MADE WHEN AUTOMATICALLY TRANSCRIBING MUSIC

Music Radar: A Web-based Query by Humming System

Automatic music transcription

Week 14 Music Understanding and Classification

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Onset Detection and Music Transcription for the Irish Tin Whistle

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Classification of Timbre Similarity

TECHNIQUES FOR AUTOMATIC MUSIC TRANSCRIPTION. Juan Pablo Bello, Giuliano Monti and Mark Sandler

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Piano Transcription MUMT611 Presentation III 1 March, Hankinson, 1/15

Measurement of overtone frequencies of a toy piano and perception of its pitch

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Audio-Based Video Editing with Two-Channel Microphone

Analysis, Synthesis, and Perception of Musical Sounds

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

ESTIMATING THE ERROR DISTRIBUTION OF A TAP SEQUENCE WITHOUT GROUND TRUTH 1

Tempo and Beat Analysis

Outline. Why do we classify? Audio Classification

Improving Frame Based Automatic Laughter Detection

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Computational Modelling of Harmony

Acoustic Instrument Message Specification

User-Specific Learning for Recognizing a Singer s Intended Pitch

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

Improving Polyphonic and Poly-Instrumental Music to Score Alignment

CS229 Project Report Polyphonic Piano Transcription

Pitch correction on the human voice

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Evaluation of the Audio Beat Tracking System BeatRoot

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Music Source Separation

Polyphonic monotimbral music transcription using dynamic networks

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Music Representations

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

Semi-supervised Musical Instrument Recognition

Topics in Computer Music Instrument Identification. Ioanna Karydi

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

A REAL-TIME SIGNAL PROCESSING FRAMEWORK OF MUSICAL EXPRESSIVE FEATURE EXTRACTION USING MATLAB

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

A Composition for Clarinet and Real-Time Signal Processing: Using Max on the IRCAM Signal Processing Workstation

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Author Index. Absolu, Brandt 165. Montecchio, Nicola 187 Mukherjee, Bhaswati 285 Müllensiefen, Daniel 365. Bay, Mert 93

Chapter Two: Long-Term Memory for Timbre

Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas

HST 725 Music Perception & Cognition Assignment #1 =================================================================

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Transcription:

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University of London London, England andrew.robertson@elec.qmul.ac.uk ABSTRACT We present a method for real-time pitch-tracking which generates an estimation of the relative amplitudes of the partials relative to the fundamental for each detected note. We then employ a subtraction method, whereby lower fundamentals in the spectrum are accounted for when looking at higher fundamental notes. By tracking notes which are playing, we look for note off events and continually update our expected partial weightings for each note. The resulting algorithm makes use of these relative partial weightings within its decision process. We have evaluated the system against a data set and compared it with specialised offline pitch-trackers. 1. INTRODUCTION Polyphonic or multiple pitch-tracking is a difficult problem in signal processing. Most existing work in multi-pitch tracking is designed for Music Information Retrieval which takes place offline on large data sets. A method for multiple frequency estimation by the summing of partial amplitudes within the frequency domain was presented by Klapuri [5], who makes use of an iterative procedure to subsequently subtract partials within a pitch detection algorithm. Pertusa and Inesta [7] list potential fundamental frequency candidates in order of the sum of their harmonic amplitudes. Existing real-time algorithms for pitch detection include fiddle, a Max/MSP object by Puckette et al. [8] based on a Fourier transform which employs peak picking. Jehan [4] adapted the algorithm to analyse timbral qualities of a signal. In the time domain, de Cheveigné and Kawahara s Yin [1] is a widely-known algorithm which uses auto-correlation on the time-domain signal to calculate the most prominent frequency. However, these algorithms are more suited to monophonic signals and they are not reliable enough to generate a MIDI transcription of audio from a polyphonic instrument. We proceed from the observation that any given pitch will also create peaks at frequencies corresponding to its partials. In our approach, we iteratively subtract partials within the frequency domain in order to aid a real-time pitch detector. A learning method is employed to optimise the expected amplitudes of the partials of each detected note by continually updating the weights whenever a note is detected. In addition, we model the variations within the amplitude and summed partial amplitudes of detected notes. The weightings for each partial derived from observations are used within the decision-making process. Our motivation for this method is for use within live performance, to generate information about new notes played by an instrument. This can then be used to provide accompaniment, either directly, or by aligning the information with an expected part. Previous research into pitch tracking for interactive music has highlighted the importance of minimal latency and accuracy within noisy conditions [2]. Since our algorithm is employed for real-time audio-to-midi conversion within a performance system, we require a fast detection of notes and fast computation time. 2. METHOD 2.1. Implementation and Pre-Processing Our algorithm has been implemented in Java within a Max/MSP patch and in doing so, we made use of the fiddle object [8] in the pre-processing stage. Ordinarily, fiddle provides its own fundamental frequency estimation, but it also gives the uncooked data of the top N frequencies from the peak picking process and their respective amplitudes above a suitable threshold. Since fiddle has been optimised for fast processing within a real-time environment, it is well-suited to providing an efficient FFT and noise reduction process used to provide the data for our partial-removal system. We use a frame of 2048 samples with a hop-size of 1024, so that our detection of notes is a fast as possible

whilst still detecting as low as 80Hz. 2.2. Update the Amplitudes The input to the algorithm is the list of top N frequencies and their amplitudes (typically 8 to 20) from fiddle. The algorithm continually tracks the amplitude of all MIDI notes. First we calculate the corresponding MIDI notes to the incoming peak frequencies and update their respective amplitudes. The amplitudes of all MIDI notes not present in the peak frequencies list are decreased by 20% for each input frame (every 23 ms). This allows for errors if notes are accidentally skipped in this top 10 procedure. It is quite common in the duration of a note, for one of the peak frequencies to shift to an adjacent note in a frame and this prevents original amplitude dropping to zero and triggering a note off. 2.3. Track New and Existing Notes We begin with the lowest note and work up the range of frequencies. For every note present in the incoming peak frequencies list, potentially a new note-on, and every note already playing, potentially a note-off event, we calculate the power of the note, P(m), by summing the product of the amplitude of the respective partials with our weighting matrix, W m (k) for that note. This is given by P(m) = L k=1 W m (k)a(m + h[k]) (1) where A(m) is the amplitude of MIDI note m, L is the number of partials summed (we chose L = 6), k is the partial number (the note s frequency as an integer multiple of the fundamental frequency), h[k] is the interval in semitones between frequencies f 0 and k f 0, and W m (k) is the weight vector derived from the observed signal, of the amplitude of the k th partial relative to the amplitude of the fundamental, specific for each individual note in the spectrum. 2.4. Notes On and Notes Off For currently playing notes, we look for a note-off event: If P(m) < θ. P(m), then output a MIDI note-off for pitch m, where θ is a threshold and P(m) is an estimate of the median of the power in a positively detected note. Figure 1 shows how this quantity varies over the range of notes. For non-playing notes, we calculate the change in power as a ratio between the current frame and the previous frame. r(m) = P t(m) P t 1 (m) We check that the MIDI note has at least one partial note, m + h[k], that is one of the top N peaks, such that k <= 4 (2) Figure 1. Median power over the range of piano notes. The power of played notes varies dramatically with pitch so that learning the median value for triggering plays an important role. and if the only partial present is k = 3 (19 semitones) then (m + 7), the fifth, is not also a peak. Then: If P(m) > θ +. P(m) and r(m) > θ r and A(m) > θ a.ā(m), output a MIDI note-on for pitch m, where θ +, θ a and θ r are thresholds for power, amplitude and ratio, r(m), respectively. This ensures a significant measure of summed harmonic amplitudes and a significant increase in this measure since the last observed frame. In practice, values for the ratio threshold, θ r, tend to be between 1.4 and 3, depending on the level of response required. The higher the ratio, the less likely the algorithm is to trigger a false positive. In the case of a note on or if the note on was within the last three frames, we adapt our weights W(p n ). Our current observation would suggest: W (k) = A(m + h[k]) A(m) We track how many observations have been made in the past and adapt so W(p n ) is the average of these and the new observation W (p n ). We perform this update for all notes within 5 semitones of the played note since some notes are played less frequently, yet we can reasonably assume that the tone and timbre with respect to partial weightings is approximately the same as the surrounding notes. By including notes close to our observed note, we adapt the weights more quickly to a useful approximation. We also update our estimate for the median of the amplitude and power of a note out at that MIDI pitch using an exponential moving average: (3) Ā(m) = (1 α).ā(m) + α.a(m) (4)

Figure 2. Ground-truth MIDI from Bach s Well Tempered Clavier (top) and the MIDI output from the corresponding synthesized audio as input to the pitch-tracker (bottom). where α (typically 0.2) defines the response of the median estimate to new data. 2.5. Partial Subtraction Having evaluated the current note s strength, if the note is either playing or a new note, then we subtract from the amplitudes of its partials higher in the frequency range. High frequencies will have considerable amplitude due to this lower fundamental, so the subtraction process helps to prevent false positives from partials. Hence, we use the following update rule: A(m + h[k]) = A(m + h[k]) W m (k).a(m) (5) for 1 k L. We aim to optimise these weights by introducing some feedback at this stage. If the subtraction process results in A(k) becoming less than zero, then we decrease W m (k). If it is greater than zero then we increase the weight. Hence, all playing notes function to the optimise the average weighting of their respective pitch class (the note and surrounding notes). There is an assumption here that for the majority of notes the instrument is relatively monophonic. The weights are adjusted on the basis that if a fundamental is playing, the partial is not also playing as part of a polyphonic chord. Whilst this may not be strictly true (as when an octave plays), it is true for the most part, so that when an octave does play, the residual power in the first partial after the subtraction process should still be substantial enough to trigger the recognition of the octave note. We have used the algorithm in live performances on an acoustic guitar, using it to create a texture of synthesized sounds behind the guitar. By filtering notes to an appropriate scale, we can help to avoid dissonance from false detections. Experimentation with a MIDI-triggered electric piano sound suggests that there is a detection latency between 60 and 90 msec. This is still quite considerable for use within a live context when fast passages are played. By comparison, Miller Puckette s bonk onset detector has a latency of approximately 10 to 30 msec for the same notes. However, the onset detector is able to make use of a frame-size of 256 samples (or 5.8 msec), whereas for the Fourier analysis in- Piece Correct False Number (%) Positive of notes WTC1f 80.0 31.3 1075 WTC1p 71.0 39.3 833 WTC2f 76.4 36.8 647 WTC2p 78.4 27.3 1408 WTC8f 79.0 41.0 1014 Table 1. Detection Rates against synthesized harpsichord audio from Bach s Well-Tempered Clavier. volved in adequate pitch detection, we require a frame-size of 2048 samples (or 46 msec). 3. EVALUATION When used within performance, this provides good subjective results. To our knowledge, there is no existing Max/MSP polyphonic real-time object available for direct comparison. The fiddle and yin objects are monophonic and were not designed for polyphonic pitch detection, and their use in this context gives subjectively poor results in comparison. We would like to provide an objective measure of success within a performance application. However, we can so far only compare with offline systems. On this, we tested the tracker on several synthesized harpsichord recordings of Bach s Well-Tempered Clavier. By sending MIDI files to a Yamaha Stage Piano and testing the pitch-tracker on the corresponding synthesized audio, we can simulate the task of audio-to-midi conversion for a polyphonic instrument, whilst having ground-truth of the notes actually triggered. A representation of the MIDI ground truth and the corresponding output from the detector is shown in Figure 2. The results are shown in Table 1 and the average latency measured between 70 and 90ms. The percussive, distinctive nature of the harpsichord sound seems to be an optimal input for the pitch-tracker resulting in high performance statistics of approximately 80% correct detections. The precision currently appears to comparable with some offline trackers. The MIREX 2007 [3] competition results rate offline trackers with a precision of between approximately 40 and 70%. The equivalent precision for our real-time pitchtracker here would be over 50%, but it is important to note that the MIREX competition uses a wide database of varied sounds and hence the result on the Bach pieces may be artificially high. Marolt [6] has developed an offline pitch-tracker, Sonic, specialised for piano input which uses adaptive oscillators and neural networks. We also tested the tracker on his data set using a selection of three synthesized audio samples and three samples from performances with a real piano. The synthesized pieces were: J. S. Bach, Partita no. 4,

Piece Correct False Number (%) Positive of notes Synthetic Partita no.4 46.0 38.0 496 Humoresque 42.1 49.6 545 Sonata no.15 61.1 22.4 651 Real Suite no.5 50.1 43.7 652 Nocturne no.2 38.1 37.6 252 Entertainer 45.1 42.7 567 Table 2. Detection Rates against the piano data set used to test Sonic. BWV828, ; A. Dvorak, Humouresque Op. 101, no. 7, ; W. A. Mozart, Sonata no. 15 in C major, K. 545, 3. mvm. The real pieces were: J. S. Bach, English Suite no. 5, BWV810; F. Chopin, Nocturne Op. 9 no. 2; S. Joplin, The Entertainer. Sonic obtains a success rate of approximately 90 % on this data set, with a false detection rate of approximately 9%, whereas our real-time tracker only succeeded in detecting between 40 and 50% of the notes with considerably less precision (approximately 40% false positives). A large proportion of the false positive rate is due to high frequency content from harmonics present within the original signal, which are more tolerable in a live context than inharmonic and lower frequency errors. Our proposed system therefore gives offline figures that are significantly worse than specialised systems designed for polyphonic transcription tasks. However, our subjective observations are that it is very successful in a live performance context. This raises the issue of how we can perform an evaluation that fairly reflects success in such performances. In [9] we used subjective evaluation to assess the success of a real-time beat tracker, and this is one of the directions for future work. In addition, preliminary analysis indicates that many of the errors made by the system are harmonics misidentified as notes. When used for creating a texture for live performance, these errors are less significant than random errors. This may go some way to explaining the large difference between objective error and high perceived success in the performance application, and it would be interesting to explore an error measure designed to distinguish between these different types of errors. 4. CONCLUSION We present a system for real-time polyphonic pitch tracking for live performance applications, based on iterative subtraction of estimated partial amplitudes from the frequency representation. Our approach uses a fast deductive procedure based on the existence of partials for any given note. By continually updating estimates for the weight of the partials relative to the fundamental, the median values for the amplitude and power of all notes, our algorithm is capable of performing moderately well on databases designed for offline multiple pitch-tracking algorithms. Although the algorithm does not perform as well in objective offline tests as other algorithms designed for offline use, our approach does give subjectively high success in a live performance application. In future work, we will further explore this evaluation issue, including investigating the relative importance of various types of misidentifications, and the use of subjective testing methodologies. 5. ACKNOWLEDGEMENTS This work was supported in part by the EPSRC EP/E017614/1 project OMRAS2 (Online Music Recognition and Search). AR is supported by a studentship from the EPSRC. 6. REFERENCES [1] A. de Cheveigné and H. Kawahara, Yin, a fundamental frequency estimator for speech and music, The Journal of the Acoustical Society of America, vol. 111, pp. 1917 1930, 2002. [2] P. de la Cuadra, A. Master, and C. Sapp, Efficient pitch detection techniques for interactive music, in Proc. International Computer Music Conference, 2001, pp. 403 406. [3] J. S. Downie. Music Information Retrieval Evaluation Exchange. [Online]. Available: http://www.musir-ir. org/mirex2007 [4] T. Jehan and B. Schoner, An audio-driven perceptually meaningful timbre synthesizer, in Proc. International Computer Music Conference, 2001, pp. 381 388. [5] A. Klapuri, Multiple fundamental frequency estimation by harmonicity and spectral smoothness, IEEE Trans. Speech and Audio Processing, vol. 11, pp. 804 816, 2003. [6] M. Marolt, A connectionist model of finding partial groups in music recordings with application to music transcription, in Proc. Int. Conf. on Adaptive and natural computing algorithms, Ribiero et al., Eds., 2005, pp. 494 497. [7] A. Pertusa and J. M. Inesta, Multiple fundamental frequency estimation using Gaussian smoothness, in Proc. International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 105 108.

[8] M. Puckette, T. Apel, and D. Zicarelli, Real-time audio analysis tools for Pd and MSP, in Proc. International Computer Music Conference, 1998, pp. 109 112. [9] A. N. Robertson and M. D. Plumbley, A Turing Test for B-Keeper: Evaluating a real-time beat tracker, in Proc. International Conference on New Interfaces for Musical Expression (NIME), 2008, pp. 319 324.