Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

Similar documents
MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

Topics in Computer Music Instrument Identification. Ioanna Karydi

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

A Study on Feature Analysis for Musical Instrument Classification

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

MUSICAL NOTE AND INSTRUMENT CLASSIFICATION WITH LIKELIHOOD-FREQUENCY-TIME ANALYSIS AND SUPPORT VECTOR MACHINES

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Musical instrument identification in continuous recordings

Automatic Rhythmic Notation from Single Voice Audio Sources

Supervised Learning in Genre Classification

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

Neural Network for Music Instrument Identi cation

Automatic Labelling of tabla signals

WE ADDRESS the development of a novel computational

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly

Classification of Timbre Similarity

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Automatic Music Clustering using Audio Attributes

MUSI-6201 Computational Music Analysis

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Reducing False Positives in Video Shot Detection

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Music Information Retrieval with Temporal Features and Timbre

Chord Classification of an Audio Signal using Artificial Neural Network

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

MIRAI: Multi-hierarchical, FS-tree based Music Information Retrieval System

Automatic Construction of Synthetic Musical Instruments and Performers

THE importance of music content analysis for musical

Music Genre Classification and Variance Comparison on Number of Genres

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

A Survey on: Sound Source Separation Methods

Automatic Piano Music Transcription

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

AMusical Instrument Sample Database of Isolated Notes

MODAL ANALYSIS AND TRANSCRIPTION OF STROKES OF THE MRIDANGAM USING NON-NEGATIVE MATRIX FACTORIZATION

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Recognising Cello Performers using Timbre Models

MOTIVATION AGENDA MUSIC, EMOTION, AND TIMBRE CHARACTERIZING THE EMOTION OF INDIVIDUAL PIANO AND OTHER MUSICAL INSTRUMENT SOUNDS

TOWARD UNDERSTANDING EXPRESSIVE PERCUSSION THROUGH CONTENT BASED ANALYSIS

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 4, APRIL

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

MUSICAL INSTRUMENTCLASSIFICATION USING MIRTOOLBOX

An Accurate Timbre Model for Musical Instruments and its Application to Classification

UNDERSTANDING the timbre of musical instruments has

Recognising Cello Performers Using Timbre Models

Semi-supervised Musical Instrument Recognition

Acoustic Scene Classification

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Cross-Dataset Validation of Feature Sets in Musical Instrument Classification

Interactive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Speech Recognition Combining MFCCs and Image Features

A Categorical Approach for Recognizing Emotional Effects of Music

Feature-based Characterization of Violin Timbre

Audio classification from time-frequency texture

Automatic Laughter Detection

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

Improving Frame Based Automatic Laughter Detection

Multiple classifiers for different features in timbre estimation

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

Automatic Classification of Instrumental Music & Human Voice Using Formant Analysis

pitch estimation and instrument identification by joint modeling of sustained and attack sounds.

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Singer Traits Identification using Deep Neural Network

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

Topic 10. Multi-pitch Analysis

Lecture 9 Source Separation

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

A prototype system for rule-based expressive modifications of audio recordings

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Normalized Cumulative Spectral Distribution in Music

Proposal for Application of Speech Techniques to Music Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Research & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION

Automatic Laughter Detection

Proceedings of Meetings on Acoustics

Toward Multi-Modal Music Emotion Classification

Music Mood. Sheng Xu, Albert Peyton, Ryan Bhular

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Towards Music Performer Recognition Using Timbre Features

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Features for Audio and Music Classification

MOVIES constitute a large sector of the entertainment

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data

Transcription:

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Arijit Ghosal, Rudrasis Chakraborty, Bibhas Chandra Dhara +, and Sanjoy Kumar Saha! * CSE Dept., Institute of Technology and Marine Engg. 24 Parganas (South), West Bengal, India ghosal.arijit@yahoo.com - Indian Statistical Institute, Kolkata, India rudrasischa@gmail.com + IT Dept., Jadavpur University, Kolkata, India bcdhara@gmail.com! CSE Dept., Jadavpur University, Kolkata, India sks_ju@yahoo.co.in Abstract. In this work, we have presented a simple but novel scheme for automatic identification of instrument type present in the music signal. A hierarchical approach has been devised by observing the characteristics of different types of instruments. Accordingly, suitable features are deployed at different stages. In the first stage, wavelet based features are used to subdivide the instruments into two groups which are then classified using MFCC based features at second stage. RANSAC has been used to classify the data. Thus, a system has been proposed which unlike the previous system relies on very low dimensional feature. Key words: Audio Classification, Instrument Identification, MFCC, Music Retrieval, RANSAC, Wavelet Feature 1 Introduction An efficient audio classification system can serve as the foundation for various applications like audio indexing, content based audio retrieval, music genre classification. In the context of a music retrieval system, at first level it is necessary to classify them as music without voice i.e. instrumental and music with voice i.e. song. A few works [1, 2] have been reported in this direction. At subsequent stages further sub-classification can be carried out. Automatic recognition of instrument or its type like string, woodwind, keyboard is an important issue in dealing with instrument signals. In several works like [3], isolated musical notes have been considered as input to the system. But, in the signal arising out of a performance, the notes are not separated [4]. On the other hand recognition of musical instruments in a polyphonic, multi-instrumental music is a difficult challenge and a successful recognition system for a single instrument music may help in addressing the case [4].

2 Lecture Notes in Computer Science: Ghosal, Chakraborty, Dhara, Saha A comprehensive study made by Deng [5] indicates that a wide variety of features and classification schemes have been reported by the researchers. Mel Frequency Cepstral Coefficient(MFCC) have been used in different manner in number of systems. Brown et al. [7] have relied on MFCC, spectral centroid, auto correlation coefficients and adopted Bayes decision rules for classification. Agostini et al. [8] have dealt with timbre classification based on spectral features. A set of 62-dimensional temporal, spectral, harmonic and perceptual features is used by Livshin et al. [4] and k-nn classification is tried for recognition. Kaminskyj et al. [9] have initially considered 710 features including MFCC, rms, spectral centroid, amplitude envelope and dimensionality is reduced by performing PCA. Finally, k-nn classifier is used. The branch and bound search technique and non negative matrix factorization have been tried by Benetos et al. [6] respectively for feature selection and classification. Past study reveals that different schemes have tried with various combination of the features with high dimensionality and classification techniques. Still the task of instrument recognition system, even for single instrument signal, is an open issue. In this work, we have classified instrumental signal based on the instrument type. The paper is organized as follows. The brief introduction is followed by the description of proposed methodology in section 2. Experimental result and concluding remarks are put in section 3 and 4 respectively. 2 Proposed Methodology The proposed scheme deals with recorded signals of single instrument. A hierarchical framework is presented to classify the signal according to the type of instrument used in generating the music. Instruments are commonly categorized as String (Violin, Guitar etc.), Woodwind (Flute, Saxophone etc.), Percussion (Drum, Tabla etc.) and Keyboard (Piano, Organ etc.). Sound produced by different instruments bear different acoustics. Sound envelopes produced by a note may reflect signature of the instrument. Shape of the envelope is determined by the variation in sound level of the note and represents the timbral characteristics. The envelope includes attack i.e. time from silence to peak, sustain ı.e. time length for which the amplitude level is maintained and decay i.e. time the sound fades to silence. As in a continuous signal, it is difficult to isolate a note, a higher level features are designed that can exploit the underlying characteristics. In our effort, we try to deal with small number of features and rely on the basic perception of the sound generated by the instruments. As we perceive, sound generated by a string or percussion instrument persists longer till it gradually fades away completely and it is not so for a conventional keyboard or woodwind type instrument. This observation has motivated us to classify the signals into two groups at first stage. The first group consists of keyboard and woodwind whereas the second group consists of string and percussion. At subsequent level, we take up the task of classifying the individual groups. In the following subsections we discuss about the features and classification technique that we have used.

Lecture Notes in Computer Science: Identification of Instrument Type 3 2.1 Extraction of Features At the first level of classification we have opted for features that can reflect the difference in the sound envelope of the two groups of instruments as discussed earlier. Basically, the envelope is formed by the variation in amplitude. It has motivated us to look for wavelet based feature. Audio signal is decomposed Fig. 1. Schematic Diagram for Wavelet Decomposition following Haar Wavelet transform [10]. As it has been shown in Fig. 1, a signal is first decomposed in low (L 1 ) and High (H 1 ) bands. Low band is successively decomposed giving rise to L 2 and H 2 and so on. In general, high band contains the variation details at each level. Wavelet decomposed signals (after 3rd level of decomposition) for different types of instruments have been shown in Fig. 2. Sustain phase of audio envelope is mostly reflected in low band. On the other hand, amplitude variation during attack and decay have substantial impact on the high bands. A fast attack or decay will give rise to sharp change in amplitude in the high band and a steady rise or fall is reflected by uniform amplitude in high bands. As it appears in Fig. 2, the high bands show discriminating characteristics for the two group of instruments. There is a uniform variation of amplitudes for the first group of instruments. On the other hand, for the second group a noticeable phase of uniform amplitude without much variation is reflected. (a) Signal of Keyboard, Woodwind, String and Percussion (b) Signal after wavelet decomposition of corresponding signal shown in (a) Fig. 2. Signal of different instruments and corresponding signal after wavelet decomposition

4 Lecture Notes in Computer Science: Ghosal, Chakraborty, Dhara, Saha Features are computed based on short time energy (STE) for the decomposed signals in H 1, H 2, H 3 and L 3 bands. For each band, signal is first divided into frames consisting of 400 samples. For each frame, short time energy (STE) is computed. Finally, the average and standard deviation of STE of all frames in the band are taken to form 8-dimensional feature. (a) (b) (c) (d) Fig. 3. MFCC plots for different instrument signal shown in Fig. 2: (a) Keyboard, (b) Woodwind, (c) String and (d) Percussion For the second stage, in order to discriminate the instrument types within the groups, we have considered Mel Frequency Cepstral Co-efficients (MFCC) as the features. As the instruments within each group differs in terms of distribution of spectral power, we have considered 13-dimensional MFCC features. The steps for computing the features are same as elaborated in [11]. Features are obtained by taking the average of first 13 co-efficients obtained for each frame. The plot of MFCC co-efficients for different signals have been shown in Fig. 3. It clearly shows that the plots for a keyboard and woodwind are quite distinctive and same is also observed for a string and percussion instrument. 2.2 Classification The variety in the audio database under consideration makes the task of classification critical. The variation even within a class poses problem for NN based classification. For SVM, the tuning of parameters for optimal performance is very critical. It has motivated us to look for a robust estimator capable of handling the diversity of data and can model the data satisfactorily. RANdom Sample And Consensus (RANSAC) appears as a suitable alternative to fulfill the requirement. RANSAC [12] is an iterative method to estimate the parameters of a certain model from a set of data contaminated by large number of outliers. The major strength of RANSAC over other estimators lies in the fact that the estimation is made based on inliers i.e. whose distribution can be explained by a set of model parameters. It can produce reasonably good model provided a data set contains a sizable amount of inliers. It may be noted that RANSAC can work satisfactorily even with outliers amounting to 50% of entire data set [13]. Classically, RANSAC is an estimator for the parameters of a model from a given data set. In this work, the evolved model has been used for classification.

Lecture Notes in Computer Science: Identification of Instrument Type 5 3 Experimental Result In order to carry out the experiment, we have prepared a database consisting of 334 instrumental files. 86 files corresponds to different keyboard instruments like piano, organ. 82 files corresponds to woodwind instrument like flute, saxophone. String instrument like guitar, violin, sitar contribute 84 files and remaining 82 files represent percussion instruments like drum, tabla. The database thus reflects appreciable variety in each class of instrument. Each file has the audio of around 40-45 seconds duration. Sampling frequency for the data is 22050 Hz. Samples are of 16-bits and of type mono. Table 1. Classification Accuracy (in %) at First Stage Classific. Keyboard String Scheme and and Woodwind Percussion MLP 81.95 85.94 SVM 88.40 85.54 RANSAC 91.50 92.67 Table 2. Classification Accuracy (in %) at Second Stage Classific. Keyboard Woodwind String Percussion Scheme MLP 81.40 76.74 71.43 75.61 SVM 82.55 79.26 73.80 90.69 RANSAC 87.21 85.37 84.52 89.02 Table 1 and 2 show the performance of the proposed scheme at two stages. We have used 50% data of each class as training set and remaining data for testing. Experiment is once again repeated by reversing the training and test set. Average accuracy has been shown in the tables. For MLP, there are 8 and 13 nodes in the input layers at first and second stage respectively. Number of output nodes is 2. we have considered single hidden layer with 6 and 8 internal nodes at first and second stage respectively. For SVM we have considered RBF kernel. Tables clearly show that performance of RANSAC based classification (with default parameter setting) is better.

6 Lecture Notes in Computer Science: Ghosal, Chakraborty, Dhara, Saha 4 Conclusion We have presented a hierarchical scheme for automatic identification of instrument type in a music signal. Unlike other systems, proposed system works with features which are simple and of very low dimension. Wavelet based features categorizes the instruments in two groups and finally, MFCC based features classify the individual instrument classes in each group. RANSAC has been utilized as a classification tool which is quite robust in handling the variety of data. Experimental result also indicates the effectiveness of this simple but novel scheme. Acknowledgment The work is partially supported by the facilities created under DST-PURSE program in Computer Science and Engineering Department of Jadavpur University. References 1. Zhang, T., Kuo, C.C.J.: Content-based Audio Classification and Retrieval for Audiovisual Data Parsing. Kluwer Academic (2001) 2. Ghosal, A., Chakraborty, R., Dhara, B.C., Saha, S.K.: Instrumental/song classification of music signal using ransac. In: 3 rd Intl. Conf. on Electronic Computer Technology, India, IEEE CS Press (2011) 3. Herrera, P., Peeters, G., Dubnov, S.: Automatic classification of musical instrument sounds. New Music Research (2000) 4. Livshin, A.A., Rodet, X.: Musical instrument identification in continuous recordings. In: Intl. Conf. Digital Audio Effects. (2004) 222 226 5. Deng, J.D., Simmermacher, C., Cranefield, S.: A study on feature analysis for musical instrument classification. IEEE Trans. on System, Man and Cybernatics Part B 38 (2008) 429 438 6. Kotti, E.B.M., Kotropoulos, C.: Musical instrument classification using nonnegative matrix factorization algorithms and subset feature selection. In: ICASSP. (2006) 7. Brown, J.C., Houix, O., McAdams, S.: Feature dependence in the automatic identification of musical woodwind instruments. Journal of Acoustic Soc. America 109 (2001) 1064 1072 8. Agostini, G., Longari, M., Poolastri, E.: Musical instrument timbres classification with spectral features. EURASIP Journal Appl. Signal Process. (2003) 5 14 9. Kaminskyj, L., Czaszejko, T.: Automatic recognition of isolated monophonic musical instrument using knnc. J. Intell. Inf. Syst. 24 (2005) 199 221 10. Gonzalez, C.R., Woods, E.R.: Digital Image Processing (3rd Edition). Prentice- Hall Inc., NJ, USA (2006) 11. Rabiner, L.R., Juang, B.H.: Fundamentals of Speech Recoognition. Prentice-Hall (1993) 12. Fischler, M.A., Bolles, R.C.: Random sample consensus: A paradigm for model for model fitting with applications to image analysis and automated cartography. ACM Communications 24 (1981) 381 395 13. Zuliani, M., Kenney, C.S., Manjunath, B.S.: The multiransac algorithm and its application to detect planar homographies. In: IEEE Conf. on Image Processing. (2005)