Experiments on musical instrument separation using multiplecause

Similar documents
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Topic 10. Multi-pitch Analysis

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Neural Network for Music Instrument Identi cation

CS229 Project Report Polyphonic Piano Transcription

2. AN INTROSPECTION OF THE MORPHING PROCESS

Year 7 revision booklet 2017

Automatic Construction of Synthetic Musical Instruments and Performers

MUSICAL NOTE AND INSTRUMENT CLASSIFICATION WITH LIKELIHOOD-FREQUENCY-TIME ANALYSIS AND SUPPORT VECTOR MACHINES

Lecture 9 Source Separation

Automatic music transcription

Automatic Piano Music Transcription

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Robert Alexandru Dobre, Cristian Negrescu

Simple Harmonic Motion: What is a Sound Spectrum?

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Spectrum Analyser Basics

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Tempo and Beat Analysis

TECHNIQUES FOR AUTOMATIC MUSIC TRANSCRIPTION. Juan Pablo Bello, Giuliano Monti and Mark Sandler

A prototype system for rule-based expressive modifications of audio recordings

Chord Classification of an Audio Signal using Artificial Neural Network

LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

NON-LINEAR EFFECTS MODELING FOR POLYPHONIC PIANO TRANSCRIPTION

Gender and Age Estimation from Synthetic Face Images with Hierarchical Slow Feature Analysis

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Music Source Separation

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

The Elements of Music. A. Gabriele

Precise Digital Integration of Fast Analogue Signals using a 12-bit Oscilloscope

Music Representations

Polyphonic music transcription through dynamic networks and spectral pattern identification

Spectral toolkit: practical music technology for spectralism-curious composers MICHAEL NORRIS

Doctoral Research Prospectus

AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES. A Thesis. presented to

Short-Time Fourier Transform

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Virtual Vibration Analyzer

Video coding standards

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

Music Information Retrieval with Temporal Features and Timbre

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Predicting the immediate future with Recurrent Neural Networks: Pre-training and Applications

Cross-Dataset Validation of Feature Sets in Musical Instrument Classification

Lecture 10 Harmonic/Percussive Separation

LSTM Neural Style Transfer in Music Using Computational Musicology

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Semi-supervised Musical Instrument Recognition

Improving Frame Based Automatic Laughter Detection

Note on Posted Slides. Noise and Music. Noise and Music. Pitch. PHY205H1S Physics of Everyday Life Class 15: Musical Sounds

CTP431- Music and Audio Computing Musical Acoustics. Graduate School of Culture Technology KAIST Juhan Nam

Research on sampling of vibration signals based on compressed sensing

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Syllabus List. Beaming. Cadences. Chords. Report selections. ( Syllabus: AP* Music Theory ) Acoustic Grand Piano. Acoustic Snare. Metronome beat sound

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Topics in Computer Music Instrument Identification. Ioanna Karydi

BBN ANG 141 Foundations of phonology Phonetics 3: Acoustic phonetics 1

Speech To Song Classification

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Digital Image and Fourier Transform

COPYRIGHTED MATERIAL. Introduction: Signal Digitizing and Digital Processing. 1.1 Subject Matter

PS User Guide Series Seismic-Data Display

HUMANS have a remarkable ability to recognize objects

Lab 5 Linear Predictive Coding

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Musical instrument identification in continuous recordings

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Psychophysical quantification of individual differences in timbre perception

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Analysis, Synthesis, and Perception of Musical Sounds

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Transcription An Historical Overview

Query By Humming: Finding Songs in a Polyphonic Database

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. X, NO. X, MONTH 20XX 1

Design of Speech Signal Analysis and Processing System. Based on Matlab Gateway

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

Optimized Color Based Compression

BASE-LINE WANDER & LINE CODING

An integrated granular approach to algorithmic composition for instruments and electronics

The Elements of Music

An Accurate Timbre Model for Musical Instruments and its Application to Classification

Music Genre Classification

9.35 Sensation And Perception Spring 2009

Proceedings of Meetings on Acoustics

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

Timbre blending of wind instruments: acoustics and perception

advanced spectral processing

EVALUATION OF A SCORE-INFORMED SOURCE SEPARATION SYSTEM

REAL-TIME PITCH TRAINING SYSTEM FOR VIOLIN LEARNERS

Transcription:

Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk Abstract Over the last few years, interest has been growing in neural network circles in the separation of independent sources, using techniques such as blind source separation and independent component analysis (ICA). A related technique is the 'Multiple-Cause Model' of Saund [Neural Computation, 7, 51-71, 1995]. In this technique, a neural network is trained to model the observed pattern as a composition of several underlying 'causes', in contrast to the more traditional 'winner-takes-all' neural networks which can handle only a single 'cause'. In this paper, we report on experiments which use a simple multiple-cause model to separate different instruments and notes from audio spectral representations such as perceptually scaled power spectra and wavelets. We will consider the implications of this approach for audio music analysis and compression. 1. Introduction Human perception of sounds is much more advanced than any technical system so far created. A human listener is able to distinguish different tones in a complex sound structure such as a number of different human voices or musical instruments. In this paper, we report on an approach which tries to learn patterns of sounds. Different tones, such as a violin playing the note A, are presented in the form of an audio signal: the goal of the system would be to recognize these tones, without any prior knowledge. A technique that has proved to be successful at recognition of patterns like these is that of neural networks. In particular, we shall use a neural network which uses feedback connections, the multiple cause model (Saund, 1995). This model searches for representations of the underlying causes of the input signal by attempting to take account of all of these underlying causes. The audio signal will be pre-processed using a suitable transform (e.g. FFT or wavelets) before being passed to the multiple cause model (Figure 1). Audio Signal Pre-processing FFT, Wavelets Neural Network Multiple Cause Model Analysis Figure 1: Analysis of an audio signal Ultimately, this analysis of an audio signal into its underlying causes could be used compression, since good compression would be achieved when separate parts of the signal are separated and compressed, or audio transcription, where the separate components could be recognized (perhaps by another neural network).. The Multiple Cause Model Learning in neural networks can be either supervised or unsupervised (Haykin, 199). For supervised learning, a teacher is provided which provides a target output for the neural network, and the network is trained to minimize the error between the actual output and the target output. For unsupervised learning, no teacher is provided, and the neural network learns from input data alone. A well-known example of a neural network that can be trained using unsupervised learning is the Kohonen selforganizing map (SOM) (Kohonen, 199). Here, any input to the neural network causes one output unit (the winner ) in the SOM to become active: this is called a winner-take-all network. The SOM is very useful for extracting a low-dimensional representation of a single cause within a high-dimensional data space.

Saund (1995) introduced an alternative type of network, the multiple-cause model ( Figure ). This type of network is designed to cope with input data which is composed of several causes active at the same time. This network does not operate in a simple feed-forward manner: rather the encoding layer and connections are adjusted until the encoding forms a good reconstruction of the observed data. For details of the algorithms used, see Klingseisen (1999). Initialisation: parameters weight matrix c END FALSE new training pattern? Create training pattern Initial random measurements m data layer encoding layer Learn m? FALSE observed data dj (j =1...J) predicted data rj measurements mk, (k=1..k) Learning m compute the prediction compute the adjustments m learn m : m=m+p m m adjust m in the interval [,1] counter m= counter m+1 Learn c? FALSE Learning c compute the adjustments c learn c : c= c + p c c measurement adjust c in the interval [,1] counter c= counter c+1 prediction Figure : Multiple cause model architecture and algorithm A simple demonstration of the capability of the multiple-cause model is on the Bars problem. Here an image is composed of a white () background with horizontal and vertical black (1) bars, each of which may appear with some probability P. Where two bars overlap, black (1) is the result: this is a non-linear OR-type write-black imaging model (Figure 3). a) P=.1 P=.3 P=.5 P=.7 b) c) 1 1 1 1 11111 1 1 11111 = > 1 1 = 1 + 1 + + 1 1 1 1 11111 1 1 11111 (a) Data set consisting of horizontal and vertical bars in respect of the probabilty p of each basic pattern to appear. Because of this probablity it might happen that no black pixel appears or that the whole pattern is black. (b) The basic patterns for the data in a. (c) The translation of the black and white pixels into numbers (black represented by 1, white represented by ) and how this pattern is mixed up of its basic patterns. The mixing function is the logical or. Figure 3: Bars problem

3. Dealing with non-binary data In the Bars problem, the data used was binary basic patterns, with binary amounts (1 or ) of each, combined with an OR mixing function to produce a binary image. However, we wish to use the multiple-cause model to deal with basic which have continuously variable (non-binary) levels (e.g. power spectra), and also have continuous amounts (e.g. volumes). Initially we introduced only one of these non-binary elements, constructing a dataset with grey-level basic patterns (in the interval [,1]), but with the linear images created with binary amounts (present with probability P), added with a linear mixing function. (This might be equivalent to the spectrum of a musical instrument which could either be played at a constant volume, or not at all.) With a simple modification to the multiple-cause model architecture, separation of 8 patterns with overlap of up to about 5% is possible. Above this, some patterns come to be recognized as a partial activation of other patterns, leaving a small error that is insufficient to drive the learning algorithm (Klingseisen, 1999). In the multiple-cause model, since the reconstruction is the product of the measurements m and the weights c, there is some redundancy between these parameters. If the probability of occurrence, P, is above.5, we found that it was helpful to learning to constrain the interval of the mean value of the weights c for each basic pattern. For example, if we know that the smallest basic pattern has mean. and the largest has mean.6, we could constrain the mean to be in the interval [.,.3], and re-scale whenever the weights of a learned basic pattern fall outside of this range. We found that this made learning more successful. Typical learning time for these grey patterns, consisting of 8 patterns, each of 16 components, is minutes to an error level of.1 using Matlab on a Pentium II (35 MHz). Varying the probability of appearance of each pattern had a significant effect on the learning time for each dataset. For a given error threshold, we observed that both high and low probabilites of occurrence gave rise to longer learning times than probabilities around.-.6. So far the measurements m (volumes) have all been binary: for real music signals we would need to release this constrain to allow constantly varying volumes. To this end, we released some of the measurements m (33%, 5% and 1%) so that they could vary between and 1, and the c values were also constrained to lie in the interval [, 1]. We found that the more of the measurements that were allowed to vary, the longer was the time taken to learn the patterns, so that fastest learning is obtained when as many measurements as possible are fixed as or 1 only, and the probability of occurrence is in the approximate interval [.3,.6]. For more details, see (Klingseisen, 1999).. Music-based signals To work towards audio signals, we next applied the multiple-cause model to artificial spectra produced from synthesized sounds. In the first instance, we trained the model on spectra, downsampled to 3 bins, of a clarinet playing one of 8 notes (G 3, C, A 3, D, F, G, A, E ). The training set was composed of linear additions of these basic spectra (NB: not mixed in the time domain), and needed about 8 presentations of training patterns for successful learning. Separation of patterns composed from spectra of different instruments playing the same note also worked. For six instruments, about 6 presentations was needed, with about 3 presentations for 1 instruments (Figure ). Combining these two approaches, patterns composed from the spectra of three instruments (Clarinet, Oboe, Trumpet) playing each of three notes was also possible: this was successful after about 8 presentations of the training patterns. Clarinet Guitar Harpsichord Harp Horn Oboe Piano Trombone Trumpet Vibrone Figure : The basic patterns for 1 different instruments

5. Real sounds In the experiments reported on so far, we analysed synthesized sounds, with artificial spectra composed by linear addition of underlying spectra. We also assumed that the spectra are essentially unchanged by volume or tone changes. For the sounds of real instruments (Iowa, 1999), the situation is more complicated (Figure 5). Volume and tone can both change the spectra, so that simple shifting or scaling is not sufficient. a) For three notes :C, F and B b) The note C for different volumes (pp, mf, ff) C pp 5 1 15 5 3 35 5 1 15 5 3 35 F mf 5 1 15 5 3 35 5 1 15 5 3 35 B ff 5 1 15 5 3 35 5 1 15 5 3 35 We can see in the first column that the spectral envelope does not stay the same for different notes. The notes illustrated here all belong to the fourth octave, which means that they lie quite close together. Nevertheless the Fourier spectrums do not resemble to each other. The spectrum for different volumes does also change, which is shown in the second column. A tone played pianissimo (pp) is much purer than a loud note (fortissimo), which means that higher harmonics do not appear with a big amplitude in the spectrum. For a note played fortissimo (ff) the higher harmonics are present with a big amplitude. (The amplitude scale of the Fourier spectrum shown here is a logarithmic scale, in order to illustrate the values for the higher harmonics better. ) Figure 5: Fourier spectrum of notes played on a clarinet For this experiment, we constructed an audio signal composed of the addition of pulses of notes played on different instruments (Figure 6). (1) Clarinet, C () Clarinet, G b (3) Clarinet, B b x 1 () Oboe, D b (5) Oboe, F x 1 (6) Bassoon, D x 1 (7) Bassoon, A b (8) Flute, E b (9) Flute, B 1 3 5 6 7 8 9 seconds Figure 6: Waveforms of nine notes played on different instruments

The algorithm was adapted to use a large number of input units (spectrum of 8 inputs), with.1 seconds used to generate each spectrum. No attempt was made at windowing, so patterns at the onset and offset of notes will have disrupted spectra. The algorithm found the nine underlying patterns after 3 presentations of the training patterns, equivalent to hour s learning. 6. Conclusions In this paper, we have reported on initial work investigating the use of Saund s multiple-cause model neural network applied to audio signal separation. While there is still a long way to go, our initial results are promising, and we feel that future work in this direction will be fruitful. References Haykin, S. (199) Neural Networks: a comprehensive foundation. Macmillan College Publishing Company. Iowa (1999) Instruments Sample of Real Sounds; University of Iowa Musical Instrument Samples internet page; URL http://theremin.music.uiowa.edu/~web/sound/ Klingseisen, J (1999) Audio Analysis using Multiple Cause Neural Networks. Project Report. Audio & Music Technology Lab, Department of Electronic Engineering, King s College London. Kohonen, T (199) The Self-Organizing Map. Proc. IEEE, 78 (9), 16-18. Saund, E (1995) A Multiple Cause Mixture Model for Unsupervised learning. Neural Computation, 7 51-71.