The Use of the Attack Transient Envelope in Instrument Recognition

Similar documents
The Comparison of Selected Audio Features and Classification Techniques in the Task of the Musical Instrument Recognition

Research on the optimization of voice quality of network English teaching system

Quantitative Evaluation of Violin Solo Performance

Automatic Chord Recognition with Higher-Order Harmonic Language Modelling

Convention Paper Presented at the 132nd Convention 2012 April Budapest, Hungary

Analysis of Technique Evolution and Aesthetic Value Realization Path in Piano Performance Based on Musical Hearing

On Some Topological Properties of Pessimistic Multigranular Rough Sets

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

TORCHMATE GROWTH SERIES MINI CATALOG

A Chance Constraint Approach to Multi Response Optimization Based on a Network Data Envelopment Analysis

2013 SCHOOLS NOTES. MOZART CLARINET CONCERTO Victoria. Image: Mats Bäcker

DATA COMPRESSION USING NEURAL NETWORKS IN BIO-MEDICAL SIGNAL PROCESSING

IMPROVED SUBSTITUTION FOR ERRONEOUS LTP-PARAMETERS IN A SPEECH DECODER. Jari Makinen, Janne Vainio, Hannu Mikkola, Jani Rotola-Pukkila

Niggunim (Violin Sonata No. 3)

Predicting when to Laugh with Structured Classification

Music Plus One and Machine Learning

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Dynamics and Relativity: Practical Implications of Dynamic Markings in the Score

Piano Why a Trinity Piano exam? Initial Grade 8. Exams and repertoire books designed to develop creative and confident piano players

The Informatics Philharmonic By Christopher Raphael

Contemporary Christmas Classics

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

2. AN INTROSPECTION OF THE MORPHING PROCESS

THE importance of music content analysis for musical

Transcribing string music for saxophone: a presentation of Claude Debussy's Cello Sonata for baritone saxophone

Appendix A. Strength of metric position. Line toward next core melody tone. Scale degree in the melody. Sonority, in intervals above the bass

UBTK YSP-1. Digital Sound Projector OWNER'S MANUAL

COGNITION AND VOLITION

YSP-900. Digital Sound Projector OWNER S MANUAL

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

A guide to the new. Singing Syllabus. What s changing in New set songs and sight-singing

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

A Fractal Video Communicator. J. Streit, L. Hanzo. Department of Electronics and Computer Sc., University of Southampton, UK, S09 5NH

Classification of Gamelan Tones Based on Fractal Analysis

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Advanced Scalable Hybrid Video Coding

Chord Classification of an Audio Signal using Artificial Neural Network

Exploring Principles-of-Art Features For Image Emotion Recognition

A repetition-based framework for lyric alignment in popular songs

Practice makes less imperfect: the effects of experience and practice on the kinetics and coordination of flutists' fingers

Recognising Cello Performers Using Timbre Models

SKYCITY ENTERTAINMENT GROUP LIMITED. FY07 Result Year Ended 30 June 2007

Lyrics Classification using Naive Bayes

PROCESSIONE DI LACRIME (PAVAN)

Similarity matrix for musical themes identification considering sound s pitch and duration

YSP-500. Digital Sound Projector TM OWNER S MANUAL

Recognising Cello Performers using Timbre Models

Similarity Measurement of Biological Signals Using Dynamic Time Warping Algorithm

Musically Useful Scale Practice

Commentary on the Arranging Process of the Octet in G Minor

Release Year Prediction for Songs

EPSON PowerLite 5550C/7550C. User s Guide

Precision testing methods of Event Timer A032-ET

Improving Polyphonic and Poly-Instrumental Music to Score Alignment

Topics in Computer Music Instrument Identification. Ioanna Karydi

How to Obtain a Good Stereo Sound Stage in Cars

ANALYSIS OF SOUND DATA STREAMED OVER THE NETWORK

Experimental Study of Attack Transients in Flute-like Instruments

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Normalized Cumulative Spectral Distribution in Music

CS 591 S1 Computational Audio

UNIVERSITY OF CINCINNATI

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

Missy Mazzoli. Still Life With Avalanche. Sample begins on following page

LIFESTYLE VS 1. Video Expander

AUTOMATICALLY IDENTIFYING VOCAL EXPRESSIONS FOR MUSIC TRANSCRIPTION

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Singer Traits Identification using Deep Neural Network

1. A 16 bar period based on the extended tenorclausula.

Topic 10. Multi-pitch Analysis

Implementation of a turbo codes test bed in the Simulink environment

GCSE Music. Unit 2 Guidance

Temporal coordination in string quartet performance

Music Genre Classification

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Varying Degrees of Difficulty in Melodic Dictation Examples According to Intervallic Content

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Theseus and the Minotaur

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Subjective Similarity of Music: Data Collection for Individuality Analysis

Art and Technology- A Timeline. Dr. Gabriela Avram

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

UAB YSP Digital Sound Projector OWNER S MANUAL

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

UAB YSP-900. Digital Sound Projector OWNER S MANUAL

J. HARRY WHALLEY. Mixed Quartet, NGS and EEG 2012

Characterization and improvement of unpatterned wafer defect review on SEMs

Design Project: Designing a Viterbi Decoder (PART I)

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

Lecture 9 Source Separation

EFFECT OF REPETITION OF STANDARD AND COMPARISON TONES ON RECOGNITION MEMORY FOR PITCH '

Automatic Piano Music Transcription

ZONE PLATE SIGNALS 525 Lines Standard M/NTSC

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

Transcription:

PAGE 489 The Use of the Attack Transient Enveloe in Instrument Recognition Benedict Tan & Dee Sen School of Electrical Engineering & Telecommunications University of New South Wales Sydney Australia Abstract The transient comonents in instrument signals have been known to contain a vast amount of information about the instrument. The undiscovered information found in the attack is known to be essential in roviding the recognition of the instrument. This aer investigates the attack transient and is successful in discovering one of the features that enable the recognition of the instrument. The enveloe of the attack transient has been used in this aer as a feature of the attack transient and exeriments carried out showing the otential of the enveloe.. Introduction There are hundreds of different tyes of musical instruments in the world today and each of those instruments has its own characteristic that distinguishes it from another. The area of instrument recognition has been an area that has interested many researchers and engineers alike and many have been successful in being able to define features and characteristics that differentiate one instrument from another. Although there have been many features that have been found that enable the identification of certain instruments there is still so much undiscovered information waiting to be found. Looking at the instrument signal from a temoral ersective there are four main sections the attack decay sustain and retard. The main areas of focus in instrument recognition have been focused rimarily on the decay and sustain ortions of the music signal these regions of the signal are also known as the steady-state. The heavy focus on the steady-state regions can be seen due to the signal being stable or seudo-stable during those sections of the signal because of this reason the steady-state is referred over other sections of the signal as the data analysis is made easier due to the availability of steady-state analysis techniues. Brown has conducted studies on the steady-state using various features such as the cestral coefficients and various statistical methods to distinguish instruments (Brown 00). The other comonents of the instrument signal can be classified as transient. There are a number of studies that been conducted that show that there is a vast amount of information contained in the transient sections that enable eole to recognize the instrument. The onset of the instrument signals lays a big art in characterizing the instrument and having this knowledge it is ossible to use just the attack transient to be able to identify an instrument. Although the attack transient contains this information investigation into the transient is still a relatively uncovered area. The nature of the transient being non-stationary and the fact that the boundaries of the transient are not defined exactly makes it difficult to analyse. Keeler carried out exeriments (Keeler 97) and was successfully able to differentiate between various wind instruments using temoral features such as the transient duration delay overshoot and the instability of the signal. Using these temoral features Keeler was able to distinguish between the different families of wind instruments. Through Keeler s aer it was shown that even through simle ercetual features the instruments could be categorized and identified. The urose of this aer is to find a distinguishing feature in the attack transient that will enable the recognition of that instrument comared to another. Using the feature systematic tests will be erformed to show that the features found are feasible to be used as an attribute in instrument recognition. In this aer the attack transient has been defined to be the non-eriodic segments of the attack which have been derived from the definition of transient the level of harmonic content was measured and used as a gauge of the transientness of the signal. This gave a systematic way of obtaining the attack transient and also allowing Proceedings of the th Australian International Conference on Seech Science & Technology ed. Paul Warren & Catherine I. Watson. ISBN 0 958946 9 University of Auckland New Zealand. December 6-8 006. Coyright Australian Seech Science & Technology Association Inc.

flexibility to be able to exeriment with different levels of harmonic content in the data. Using threshold these thresholds the attack transient could be extracted from the signal and used in the following exeriments. The reliminary investigations involved analysing several attack transients of a few instruments and from the investigations revealed that there were certain ercetual attributes that were reoccurring in the attack transients. Figure reresents one of the attack transients from a violin wie most distinguishing features circled in. Figure also shows an attack transient wie most noticeable characteristics marked. As can be seen by comaring the two figures they are not comletely identical but there is a large similarity between the two samles. The idea of using the attack transient enveloe became a feasible feature for identification but before any tests could be conducted there were a few roblems that had to be dealt with concerning the use of the enveloe as a feature. There was the issue of the attack transient being different lengths although most of the features were contained in the attack transient their duration in time varied differently and so attern matching oint to oint was not feasible. A method was found which would alleviate the roblem and enable attern matching between the transient signals.. Dynamic Time Waring PAGE 490 The analysis techniue chosen to analyse the instrument signals was the dynamic time waring (DTW) method. This method was rimarily develoed and used as a seech rocessing techniue to be able to attern match seech samles; it allows two seech samles that have time discreancies to be able to be matched correctly to one another. Using this techniue it is then ossible to create a temlate of the attack transient and use the temlate to attern match and comare against instrument samles to try and identify Signal Figure Attack transient of note E4 of violin Figure Attack transient of note G4 of violin Euclidean Signal Further investigations into the attack transients showed that the recurring features could be ointed out in the majority of the attack transients which lead to the ossibility of being able to use the enveloe as a form of identification. Dynamic Time War Figure 3 Comarison of Euclidean and DTW attern matching the correct instrument. A comarison is shown in Fig 3 of the Euclidian based attern matching against the DTW method. The Euclidean based attern matching is a oint to oint comarison of two signals. As shown in Fig 3 the DTW is able to align the neighbouring oints Proceedings of the th Australian International Conference on Seech Science & Technology ed. Paul Warren & Catherine I. Watson. ISBN 0 958946 9 University of Auckland New Zealand. December 6-8 006. Coyright Australian Seech Science & Technology Association Inc.

in the samle so that the best match between the samle and temlate is obtained. It can be seen that through this techniue the wared signal will be able to obtain a correct match to the temlate. The rocess of how the DTW works and of which the tests were conducted is as follows. Starting with a temlate P and a samle signal Q to match with length m and n resectively. We have the following P Q = = 3 3...... An m x n matrix d is then formed of which the d ( i j ) value being the distance between i and element therefore giving matrix d wie values j formed by En d ( i j) ( ) i j m n = () From the local distance matrix d the global distance matrix D can then be comuted. Each cell in matrix D is calculated by the summation of the local distance at d i j and smallest distance of the neighbouring ( ) cells of ( i j ) D. The neighbouring cells are chosen by a steing attern which will be covered in the following section. The result is matrix D of which its values are the minimised global distances of the seuences. A ath can then be chosen by the steing ath which results in the otimum maing of one signal to the other. The best ath ossible would be the straight diagonal ath from the corners of the matrix which would mean that the two signals are exactly the same therefore the more the ath deviates from the otimum ath of the diagonal the more distortion and waring that is needed to maniulate the samle to reflect the temlate. The euation to calculate the values of matrix D is as follows ( i j) = d( i j)+ min[ ( i j) D( i j ) D( i j ) ] D D () The waring aat mas the samle to the temlate is the aat results in the least distortion. Starting from D ( ) the next element in the waring ath will be the neighbour wie smallest value. The waring ath will have a minimum length of the max m n and a maximum length of ( m + n). The ( ) final value in ( m n) D is the overall measure of the distortion between the signals the smaller the value the closer the match is between the temlate and samle and the less distortion there is in the maing of the signals. The higher the value the more waring is needed to match the signals together. There are certain constraints that need to be taken note of regarding the waring ath such as the following:- Boundary Conditions: The waring ath must start at D ( ) and end at D ( m n). Continuity: the waring ath can only increase by oint at a time; this makes sure that all oints in the signal are used in the maing. Monotonicity: the waring ath cannot go backwards in time; this condition ensures that a oint that has reviously been maed will not be maed again. There are various euations available to relace En each with different advantages and disadvantages. The algorithms chosen in this aer were the original steing attern shown in En and the Itakura algorithm in En 3. ( i j) = d( i j)+ min[ ( i j) D( i j ) D( i j ) ] D D (3) The advantage that En 3 has over En is that every oint on the temlate is maed and alleviates the roblem of monotonicity. It also allows extends the range of neighbouring cells giving the steing attern a larger range to be able to comare the distances. The difference in the steing attern can be seen in the results in the following section. Each steing attern has its advantages and disadvantages and there is no steing attern that is the overall best attern to use. There are a number of arameters concerned when using the DTW method but it is also a owerful yet uite simle techniue to use in conjunction with attern matching and has roven to be useful in the identification of instruments as seen by the exeriments. 3. Results and Discussion The tests were conducted with two instruments the violin and cello which come from the family of string instruments. In total here were four tests that were conducted and a total of 94 and 68 samles for the cello and violin resectively containing the third fourth and fifth s. The first test conducted consisted of the steing attern described by En and the second test used the same steing attern with a different temlate for the cello instrument. The third and fourth tests were PAGE 49 Proceedings of the th Australian International Conference on Seech Science & Technology ed. Paul Warren & Catherine I. Watson. ISBN 0 958946 9 University of Auckland New Zealand. December 6-8 006. Coyright Australian Seech Science & Technology Association Inc.

the same as the revious tests but with a different steing attern. The individual tests conducted were further searated into s and reresented in the results in s and as a whole. Along wie results the ercentages have also been calculated showing the ercentage of correct identifications for that instrument. Also included in the results following are the total for each instrument. 3 7/37 9% 5/5 00% 4 3/4 3% 3/5 9% 5 4/5 7% 38/38 00% Total 4/94 6% 66/68 97% Table results of exeriment Table shows the results from the first test as can be seen the recognition rate of the violin is excellent but the results of the cello are not very good. The result from the first exeriment has confirmed that the DTW is a suitable analysis techniue that can be used for identification. Since the total recognition for the violin is 97% it can be suggested that there is a bias towards the violin at this stage. However through changes in the arameters there is still a lot of room for imrovement as will be seen by the following set of results. In the next exeriment the temlate for the cello was changed. Table shows the results after the change as can be seen there has been an imrovement in the identification rate for the cello which increased % from the first test. The imrovement in results from changing the temlate has shown that an imrovement in the recognition rate can be obtained deending on the temlate chosen to reresent the instrument. On the other hand an inadeuate temlate will result in the recognition rate decreasing. The characteristics of a good temlate are those that contain the various characteristics of the attack transients of that instrument. As a result it can be a rigorous testing rocess to find the most suitable temlate to reresent that instrument and there might be more than one suitable candidate that is able to be used as a temlate. 3 0/37 7% 5/5 00% 4 /4 50% 3/5 9% 5 4/5 7% 38/38 00% Total 35/94 37% 66/68 97% Table results of exeriment The next two exeriments were executed with a changed steing attern which was able to imrove the results even further. The steing attern used was that of En 3 commonly described as the itakura algorithm this steing attern has the advantage that the maing of the temlate to the samle kees moving forward and a oint on the temlate can only be maed once. This is advantageous because it is more desirable for the oints on the temlate to be maed once only so that a more accurate match for the instrument is obtained. Table 3 reresents the results of the test erformed wie first temlate from the first test and also with the itakura steing algorithm of En 3. As can be seen from the results the cello recognition rate has again imroved increasing a further % although the recognition rate for the violin has decreased dramatically. This set of results show how the steing attern has great influence on the results and the ability to match the instruments. One roblem wie steing atterns is that each steing attern has its advantages and disadvantages and there is no best steing attern available. One imortant factor to note is that the steing attern chosen cannot be too stringent or too lenient. A stringent steing attern will result wie temlate only matching to the samles that are almost identical to the temlate and a flexible steing attern will allow all samles to be able to match to the temlate. Finding the correct median for the temlate is crucial to obtaining the correct results while a stringent steing attern is favoured over the lenient attern. For the exeriments carried out in this aer the itakura algorithm and the original steing attern have been suitable in roviding the results that rove that the attack enveloe can be used as a feature for instrument recognition. PAGE 49 Proceedings of the th Australian International Conference on Seech Science & Technology ed. Paul Warren & Catherine I. Watson. ISBN 0 958946 9 University of Auckland New Zealand. December 6-8 006. Coyright Australian Seech Science & Technology Association Inc.

3 7/37 46% 5/5 00% 4 6/4 6% 6/5 64% 5 /5 80% 0/38 6% Total 55/94 58% 3/68 46% Table 3 results of exeriment 3 3 8/37 76% 4/5 80% 4 33/4 79% 6/5 64% 5 5/5 00% 4/38 63% Total 76/94 80% 44/68 65% Table 4 results of exeriment 4 The final test was carried out wie imroved temlate and the itakura steing attern the outcome can be seen in the table above. The results for this test have been the most imroved for both instruments while increasing % and 9% resectively for the cello and violin. The recognition rates between the two instruments have both risen to more accetable ercentages showing that the enveloe of the attack can be used as an identifying feature for the instrument. Using the right set of arameters and temlates the attack enveloe can be a owerful identification feature for the instrument. Further work in this area will involve adding more instruments in the tests and also exerimenting with more steing atterns to increase the recognition ercentages. There are a lentiful number of avenues that can be taken from this oint; further research into this area will hoefully be able to imrove the results further and rovide a more robust way of instrument identification. 5. References Brown J.C (00). Feature deendence in the automatic identification of musical woodwind instruments J. Acoust. Soc. Am. Vol. 09 No. 3 Keeler J.S (97). The Attack of Some Organ Pies IEEE Tran on Audio and Electroacoustics Vol. 0 no. 5. 378 39 Keogh E.J and Pazzani M.J (00). Derivative Dynamic Time Waring Deartment of Information and Comuter Science University of California Irvine California USA Tan B (006). The investigation of transient comonents in single instrument music signals School of Electrical Engineering and Telecomunications UNSW Thesis Saldanha E.L. and Corso J.F. (964) Timbre cues and the Identification of Instruments Journal of the Acoustical Society of America Wrigley S.N. Seech Recognition by Dynamic Time Waring htt://www.dcs.shef.ac.uk/~stu/com36/index.html PAGE 493 4. Conclusion The results discussed in this aer in indicate that we have identified at least one feature in the attack transient which can be used to distinguish between musical instruments. We have shown that the feature amongst the ones investigated is the enveloe of the attack transient. Of course more imrovement will be ossible if we looked beyond just the attack transient. Through the use of the dynamic time waring techniue the enveloe of the instrument can be attern matched to identify that instrument. By matching the features found in the attack transient it is ossible to identify that instrument and the results of the exeriment show that at least two instruments are able to be identified using this method. Proceedings of the th Australian International Conference on Seech Science & Technology ed. Paul Warren & Catherine I. Watson. ISBN 0 958946 9 University of Auckland New Zealand. December 6-8 006. Coyright Australian Seech Science & Technology Association Inc.