Prior Subspace Analysis for Drum Transcription

Similar documents
Drum Transcription in the presence of pitched instruments using Prior Subspace Analysis

NCH Software VideoPad Video Editor

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

Down - (DW Sampler Hold Buffer * Digital Filter * Fig. 1 Conceptual bunch-by-bunch, downsampled feedback system.

Topology of Musical Data

Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips

EDT/Collect for DigitalMicrograph

Remarks on The Logistic Lattice in Random Number Generation. Neal R. Wagner

Energy meter MRE-44S. MRE-44S/DC24V energy meter

Diploma Syllabus. Music Performance from 2005

LONG term evolution (LTE) has now been operated in

Multi-TS Streaming Software

Operation Guide 5200

Section 2 : Exploring sounds and music

ITU BS.1771 Loudness Meter BLITS Channel Identification for 5.1 Surround Sound

25th DOE/NRC NUCLEAR AIR CLEANING AND TREATMENT CONFERENCE

Running a shared reading project. A scheme of activities to help older children share picture books with younger ones

NAIVE - Network Aware Internet Video Encoding

Spectrum Management. Digital Audio Broadcasting. Content Protection. Video Streaming. Quality of Service

Operation Guide

Image Generation in Microprocessor-based System with Simultaneous Video Memory Read/Write Access

Operation Guide

Operation Guide 3270/3293

Operation Guide 3271

Vocal Technique. A Physiologic Approach. Second Edition

3,81 mm Wide Magnetic Tape Cartridge for Information Interchange - Helical Scan Recording - DDS-2 Format using 120 m Length Tapes

Operation Guide 3197

The Basics of Monitor Technology (1)

Operation Guide 2531

D-ILA PROJECTORS DLA-X95R DLA-X75R DLA-X55R DLA-X35

Specifications. Lens. Lens Shift. Light Source Lamp. Connectors. Digital. Video Input Signal Format. PC Input Signal Format.

Intercom & Talkback. DanteTM Network Intercom BEATRICE R8. Glensound. Network Intercom. Eight Channel Rackmount Intercom.

MMS-Übungen. Einführung in die Signalanalyse mit Python. Wintersemester 2016/17. Benjamin Seppke

v z :,& 9.-b OF WlS D O C U M M 0s ~L~~~ BS

Operation Guide 4717

Using wordless picture books in schools and libraries. Ideas for using wordless picture books in reading, writing and speaking activities

THE NEED for supporting multimedia applications in

THE importance of music content analysis for musical

Operation Guide 3143

Concerto in B-flat Major Opus 4 Number 6. G.F. Handel ( )

Onset Detection and Music Transcription for the Irish Tin Whistle

B. Please perform all warm- ups/exercises and Open Up Wide as close to tempo markings as provided.

Getting in touch with teachers

Operation Guide 3172

Operation Guide 5135

Muslim perceptions of beauty in Indonesia and Malaysia Neil Gains Warc Exclusive Institute on Asian Consumer Insight, February 2016

Convention Paper 6031 Presented at the 116th Convention 2004 May 8 11 Berlin, Germany

Modal Bass Line Modules

A motor behavioral evaluation method for children with developmental disorders during music therapy sessions: A pilot study.

Operation Guide

(12) (10) Patent N0.: US 7,043,320 B1 Roumeliotis et a]. (45) Date of Patent: May 9, 2006

Geometric Path Planning for Automatic Parallel Parking in Tiny Spots

Horizontal Circuit Analyzing

Professional HD Integrated Receiver Decoder GEOSATpro DSR160

Important Information... 3 Cleaning the TV... 3

Measuring Product Semantics with a Computer

MUSC5 (MUS5A, MUS5B, MUS5C) General Certificate of Education Advanced Level Examination June Developing Musical Ideas.

INSTRUCTIONS FOR AUTHORS

Operation Guide

Operation Guide 3147

Library and Information Sciences Research Literature in Sri Lanka: A Bibliometric Study

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Home & Garden Shows. Oak Brook v N. Shore v Naperville v Arlington Lake Co. v Tinley Park v Crystal Lake

USER S GUIDE About This Manual. (Light) 12/24-Hour Format. described below. Setting GMT differential. Longitude

Operation Guide 4719

RX-V890. Natural Sound Stereo Receiver. Contents OWNER S MANUAL

Operation Guide 2804

TRANSCENSION DMX OPERATOR 2 USER MANUAL

The optimal multi-stage contest

LEGEND SERIES. DIMENSIONS In inches (mm)

UNIQUE LIGHTING SOLUTIONS. LED PRODUCTS for the SIGN INDUSTRY

Operation Guide 3220

NATURAL SOUND AV RECEIVER AMPLI-TUNER AUDIO-VIDEO

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

RX-V795aRDS. Natural Sound AV Receiver Ampli-tuner audio vidéo

Heritage Series. Heritage Heritage Heritage Heritage Extender. Heritage 1000

Background Talent. Chapter 13 BACKGROUND CASTING AGENCIES. Finding Specific Types THE PROCESS

Operation Guide 3195

Robert Alexandru Dobre, Cristian Negrescu

AX-590/490. IMPORTANT! Please record the serial number of this unit in the space below. Serial No.:

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

v 75 THE COMMUNICATIONS CIRCUIT REVISITED'

U C A RX-V995 AV RECEIVER AMPLI-TUNER AUDIO-VIDEO OWNER S MANUAL MODE D EMPLOI

Operation Guide 3150

Operation Guide 5008

A prototype system for rule-based expressive modifications of audio recordings

Operation Guide

Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music

TIMBRE-CONSTRAINED RECURSIVE TIME-VARYING ANALYSIS FOR MUSICAL NOTE SEPARATION

Operation Guide

Measurement of overtone frequencies of a toy piano and perception of its pitch

RX-V793. Natural Sound AV Receiver Récepteur audiovisuel Son Naturel OWNER S MANUAL MODE D EMPLOI

Voice & Music Pattern Extraction: A Review

Operation Guide 3017

Conducteur d'émotions...

Texas Music Educators Association 2017 Clinic/Convention San Antonio, Texas 9-12 February 2017

Operating Instructions VEGAPULS 63 Foundation Fieldbus

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Falcons team update. Presentation Portugal Workshop 2015

Operating Instructions VEGAPULS ma/hart

Transcription:

Audio Engineering Society Convention Paper Presented at the 4th Convention 23 March 22 25 Amsterdam, he Netherands his convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. he AES takes no responsibiity for the contents. Additiona papers may be obtained by sending request and remittance to Audio Engineering Society, 6 East 42 nd Street, New York, New York 65-252, USA; aso see www.aes.org. A rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journa of the Audio Engineering Society. for Drum ranscription Derry FitzGerad, Bob Lawor 2, and Eugene Coye 3 Music echnoogy Center, Dubin Institute of echnoogy, Rathmines Rd. Dubin, Ireand 2 Department of Eectronic Engineering, Nationa University of Ireand, Maynooth, Ireand 3 Department of Eectronic Engineering, Dubin Institute of echnoogy, Kevin St, Dubin Ireand ABSRAC his paper introduces the technique of (PSA) as an aternative to Independent Subspace Anaysis (ISA) in cases where prior knowedge about the sources to be separated is avaiabe. he use of prior knowedge overcomes some of the probems associated with ISA, in particuar the probem of estimating the amount of information required for separation. his resuts in improved robustness for drum transcription purposes. Prior knowedge is incorporated by use of a set of prior frequency subspaces that characterise features of the sources to be extracted. he effectiveness and robustness of PSA is demonstrated by its use in a simpe drum transcription agorithm.. INDEPENDEN SUBSPACE ANALYSIS Independent Subspace Anaysis (ISA) provides a means of attempting sound source separation from singe channe mixtures []. Based on redundancy reduction techniques, it represents sound sources as ow dimensiona independent subspaces in the timefrequency pane. o carry out ISA the singe channe mixture signa is converted to a time-frequency representation such as a spectrogram. It is then assumed that the overa spectrogram Y resuts from the superposition of a number of unknown statisticay independent spectrograms Y j, yieding: Y j Y = () Further it is assumed that each independent spectrogram can be represented as an outer product of an invariant frequency basis function and a corresponding invariant ampitude basis function, giving: Y j = f j t j (2) he frequency basis function describes the reative strengths of the frequencies present in the independent spectrogram, and the ampitude basis

function describes the variations in ampitude of the frequency basis function over time. his is iustrated in Figure, which shows the frequency basis function and ampitude basis function of a snare drum. When mutipied together they produce a spectrogram which is a reasonabe approximation to that of the origina snare drum spectrogram. basis functions from the principa components retained from the PCA step. ICA attempts to separate a set of observed signas that are composed of inear mixtures of a number of independent non-gaussian sources into a set of signas that contain the independent sources [2]. It shoud be noted that the separation obtained by ICA is not perfect and so in some cases there wi sti be artefacts reated to other sound sources in the independent basis functions. However these wi be much reduced in comparison to the artefacts present before separation using ICA. Once obtained the independent basis functions can be used to generate the independent spectrograms. Phase information for resynthesis can be obtained by using the origina phase information from the Short ime Fourier ransform used to generate the spectrogram or via a phase estimation method such as that described by Griffin and Lim [3]. Figure. Basis functions and Spectrogram of a snare drum Summing the Y j yieds: Y = f jt (3) j he basis functions represent features of the individua sound sources and each source is composed of a number of these basis functions. hese basis functions form a ow dimensiona subspace that represents the individua sounds in the time-frequency pane. Principa Component Anaysis (PCA) provides a means of decomposing a spectrogram into a set of outer product basis functions and aso provides a means of redundancy reduction. PCA takes a set of correated variabes and ineary transforms them into a number of uncorreated variabes that are termed principa components. hese are ordered by the amount of variance of the origina data they contain. As the principa components are ordered by decreasing variance PCA is used to reduce redundancy by discarding components of ow variance. However the principa components obtained from PCA are not statisticay independent and so a technique known as Independent Component Anaysis (ICA) is used to obtain a set of independent However there are a number of probems associated with ISA. Firsty because the basis functions are invariant no pitch changes are aowed in the sound source spectrograms. his presents a probem when deaing with most musica instruments. However with drum sounds where the pitch does not change from one occurrence of a given drum to another this is a vaid approximation. his makes ISA-type approaches we suited to drum transcription. Secondy, due to the fact that a ICA agorithms are indeterminate with regards to ordering of the input components, it is necessary to identify a given source by some means such as their frequency characteristics or ampitude enveopes after ISA has been competed. hirdy the quaity of separation aso depends on the ength of the signa input. For instance a signa containing just one hi-hat and snare payed simutaneousy wi not separate correcty. For the hihat/snare separation 2-4 events are typicay required, depending on the frequency and ampitude characteristics of the drums used. Finay estimating the number of components to retain from the PCA stage represents a considerabe difficuty. he number of components required for correct separation varies with the frequency and ampitude characteristics of the source sounds. here is aso a trade-off between the number of components retained and the recognisabiity of the resuting basis functions. Keeping a arge number of components resuts in basis functions that support sma regions of the frequency spectrum. Using a sma number of AES 4H CONVENION, AMSERDAM, HE NEHERLANDS, 23 MARCH 22-25 2

components resuts in basis functions that contain recognisabe features of the source sounds with support across the entire frequency spectrum. As a resut of this trade-off ISA works best on signas with ess than five sources. his trade-off aso means that it is necessary to choose carefuy the number of components retained to achieve optima source separation. hreshoding methods have not proved effective in obtaining the correct number of components, and as a resut an observer is necessary to determine the required number of components. Methods such as sub-band ISA have been proposed in an effort to overcome this indeterminacy for the purposes of drum transcription [4]. 2. PRIOR SUBSPACE ANALYSIS As noted previousy there are a number of probems inherent in the ISA method, in particuar estimating the correct number of components to retain from the PCA step. Whie methods such as sub-band ISA go some way to overcoming this probems a more efficient method ies in the utiisation of prior knowedge about the sources to be separated. ISA arose out of attempts to create a signa representation that coud characterise and aow further manipuation of individua everyday sounds such as a coin hitting the foor [5]. he method ooked for invariants that characterised sounds and invoved performing PCA foowed by ICA on a spectrogram of a sound in a manner simiar to that of ISA. he technique was ater used for generaised sound cassification and incorporated into the MPEG7 specification [6]. Appying the same technique to a mixture of sounds resuted in ISA. he success of this method in generaised sound cassification suggests that it can be adapted to create a set of prior subspaces that can characterise a given sound source such as a snare drum. hese prior subspaces can then be used to carry out an initia anaysis of a mixture signa. his Prior Subspace Anaysis (PSA) has the same underying assumptions as ISA, namey that the overa mixture spectrogram resuts from the sum of a number of independent spectrogram, and that these independent spectrograms can be represented as the outer product of a frequency basis function and an ampitude basis function. PSA then assumes that there exists known prior frequency subspaces or basis functions f p that are good initia approximations to the actua subspaces. Substituting the f j in equation 3 with these prior subspaces yieds: Y f pt (4) j herefore mutipying the overa spectrogram by the pseudoinverse of a prior frequency subspace yieds an estimate of the ampitude basis function tˆ j. However the estimated ampitude basis functions returned are not independent, and so ICA is carried out on these ampitude basis functions to yied independent basis functions tˆ ij. hese independent basis functions can in turn be used to get improved estimates of the frequency basis functions fˆ ij. he independent spectrograms can then be estimated from Yˆ j = fˆ ijt ˆ ij (5) Resynthesis can then be carried out in a simiar manner to that of ISA. Prior Subspaces for use with PSA are obtained by anaysing arge numbers of each of the sound sources of interest. An ISA-type approach is used to generate frequency prior subspaces for each sampe of a particuar sound source. PCA is carried out on the spectrogram of each sound source sampe. he first three principa components are retained and these are then anaysed using ICA to yied independent frequency subspaces. he independent subspace with the argest projected variance is then chosen to be the prior frequency subspace for the sound source sampe in question. K-means custering is then carried out on the prior subspaces for a given sound source to yied a singe prior frequency subspace which characterises that sound source. Figure 2 shows the prior subspaces obtained for snare, kick drum and hi-hat respectivey. he genera frequency characteristics of each drum type have been captured we. Kick drums have the vast majority of their energy in the owest part of the spectrum, with very itte energy outside this region. Snares contains most of their energy in the ower part of the spectrum, but the main resonance occurs at frequencies higher than that of kick drums. Snares aso have some frequency energy spread across a arge portion of the spectrum. Finay hi-hats have energy across the entire range of the spectrum. As can be seen these characteristics have been captured we in the prior subspaces obtained for these sources. AES 4H CONVENION, AMSERDAM, HE NEHERLANDS, 23 MARCH 22-25 3

.8.6.4.2.5.5 2 x 5.8.6.4.2.5.5 2 x 5.8.6.4.2.5.5 2 Frequency (Hz) x 5 Figure 2. Prior Subspaces for snare, kick drum and hi-hat. Figure 3 shows the independent ampitude basis functions obtained by carrying out PSA on a drum oop. As can be seen there is good separation of the events, with kick drums ony showing up as tiny peaks in the snare basis function and vice-versa..5.5.5 2 2.5.5.5.5 2 2.5 Hi-hats.5 Kick Drum Snare.5.5 2 2.5 ime (s) Figure 3. Drum oop separation using PSA PSA needs prior knowedge of the sources present and a prior frequency subspace associated with each source. Significanty by using prior knowedge it ony ooks for as many sources as are known to be present and can find them efficienty. PSA is aso faster than ISA or sub-band ISA due to the fact that it does not need to use PCA to obtain data reduction. 3. DRUM RANSCRIPION USING PSA o demonstrate the utiity of the PSA method a simpe drum transcription agorithm was impemented. o aow direct comparison with subband ISA the same drum oops used in testing subband ISA were used in testing PSA. he 5 drum oops used contained hi-hats, snares and kick drums. hese drums were chosen as they are the most commony occurring drums in popuar music. he drum patterns used were exampes of commony found patterns in rock and pop music as we as variations on these patterns. he drums were taken from various sampe CDs and were chosen to cover the wide variations in sound within each type of drum. empos ranging from 8 bpm to 5 bpm. were used and different meters such as 4/4, 3/4 and 2/8 were used. Reative ampitudes between the drums were varied between dbs to 24 dbs to cover a range of situations so as to make the tests as reaistic as possibe. In order to overcome the source signa ordering probem inherent in ICA a number of assumptions were made to aow identification of the sources. Firsty it is assumed that hi-hats occur more frequenty than the other drums present. his assumption hods for most drum patterns containing hi-hats in popuar music. Secondy it is assumed that the kick drum has a ower spectra centroid than the snare drum. As snare drums are perceptuay brighter than kick drums, and the brightness of sounds has been found to correate we with the spectra centroid, this is a reasonabe assumption [8]. As a resut of imperfect separation from the ICA stage the recovered independent ampitude basis functions are normaised and a peaks over a set threshod are taken as an occurrence of a given drum. he same threshod was used for a the test signas in both PSA and sub-band ISA to aow for direct comparison of resuts. Onset times were cacuated using a variation on the onset detection agorithm deveoped by Kapuri [7]. 4. DRUM RANSCRIPION RESULS he resuts obtained for transcription using PSA are summarised in abe beow. abe 2 shows the resuts obtained using sub-band ISA to aow comparison between the two methods. he percentage of correctness was obtained from the foowing formua: tota missing - incorrect correct = * tota ype ota Missing Incorrect % Snare 2 2 9.5 Kick 33 Hats 79 2 6 89.9 Overa 33 2 8 92.5 abe : Drum ranscription Resuts using PSA AES 4H CONVENION, AMSERDAM, HE NEHERLANDS, 23 MARCH 22-25 4

ype ota Missing Incorrect % Snare 2 2 9.5 Kick 33 Hats 79 6 6 84.8 Overa 33 6 8 89.5 abe 2: Drum ranscription Resuts using sub-band ISA As can be seen from the tabes the resuts for snares and kicks are identica. However it shoud be noted that the extra snares detected using PSA were as a resut of ampitude moduation rather than identifying kick drums as snares, as was the case with sub-band ISA. A change to the PSA transcription agorithm to take ampitude moduations into account woud possiby eiminate these errors. PSA correcty detected more of the hi-hats than subband ISA. he fact that PSA correcty identified a greater number of hats suggests that using prior subspaces provides a better means to detect hi-hats than the bind separation methods of sub-band ISA. In both methods the undetected hats were separated correcty but fe beow the threshod for detection. A number of snares were aso identified as hi-hats in both PSA and sub-band ISA. his is due to the high frequency energy present in snare drums which can make the separation between snares and hats difficut. It shoud be noted that there is a trade-off in setting the threshod eve between detecting ow ampitude occurrences of a drum and between incorrecty detecting drums due to imperfect separation. In the case of hi-hat/snare separation setting the threshod too ow resuts in extra snares being detected as hi-hats, whie too high a threshod resuts in increased numbers of undetected hi-hats. he threshod used was found to represent a good baance between the two. he average error in detecting onsets was ms for both PSA and sub-band ISA. his is due mainy to smearing of the onsets as a resut of the overapping windows used in cacuating the spectrogram and aso due to the imitations of time resoution in the SF used to cacuate the spectrogram. Drum transcription using PSA is consideraby faster than using sub-band ISA. When both agorithms are impemented in Matab PSA is approximatey ten times faster than sub-band ISA. his is as a resut of the eimination of the PCA step which resuts in an increase in the speed of the agorithm. It shoud aso be noted that sub-band ISA needs two passes through the data, resuting in ISA being performed twice, compared to the singe pass required for PSA, making PSA more efficient than sub-band ISA. 5. CONCLUSIONS his paper has introduced the technique of Prior Subspace Anaysis as a too for drum transcription and sound source separation. It has proved itsef to be a viabe method for the transcription of drums, overcoming some of the probems associated with ISA. It has aso proved to be more effective and efficient in transcribing drums than sub-band ISA. Future work incudes the extension of PSA to transcribe drums in the presence of pitched instruments, and to extend PSA to hande increased numbers of drums. 6. REFERENCES [] Casey, M.A. & Westner, A., Separation of Mixed Audio Sources By Independent Subspace Anaysis in Proc. Of ICMC 2, pp. 54-6, Berin, Germany. [2] A. Hyvärinen and E. Oja. Independent Component Anaysis: Agorithms and Appications. Neura Networks, 3(4-5): pp 4-43, 2. [3] Griffin, D., & Lim, J. S. Signa Estimation from Modified Short-ime Fourier ransform, IEEE ransactions on Acoustics, Speech, and Signa Processing, Vo. ASSP-32, pp. 236-243, 984. [4] FitzGerad, D., Coye E, Lawor B. Sub-band Independent Subspace Anaysis for Drum ranscription, Proceedings of the. Digita Audio Effects Conference (DAFX2), Hamburg, pp. 65-69, 22. [5] Casey, M., Auditory Group heory: with Appications to Statistica Basis Methods for Structured Audio, Ph.D. hesis, MI Media Lab, February 998. [6] Casey, M., Generaized Sound Cassification and Simiarity in MPEG-7, Organized Sound, 6:2, 22 [7] Kapuri, A., Sound Onset Detection by Appying Psychoacoustic Knowedge. IEEE Internationa Conference on Acoustics, Speech and Signa Processing, ICASSP 999. [8] Gordon, J., and Grey, J. M., "Perceptua Effects of Spectra Modifications on Orchestra Instrument ones." Computer Music Journa, Vo. 2, N, pp. 24-3, 978 AES 4H CONVENION, AMSERDAM, HE NEHERLANDS, 23 MARCH 22-25 5