Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May 2013
Outline Semi-supervised instrument recognition A. Diment 17.5.2013 2/25
Musical instrument recognition Semi-supervised learning Objectives and main results Semi-supervised instrument recognition A. Diment 17.5.2013 3/25
Introduction Musical instrument recognition Musical instrument recognition Semi-supervised learning Objectives and main results Music information retrieval: obtaining information of various kinds from music. Situationally tailored playlisting, personalised radio etc. Instrument recognition automatic music database annotation; automatic music transcription; musical genre classification. Semi-supervised instrument recognition A. Diment 17.5.2013 4/25
Introduction Semi-supervised learning Musical instrument recognition Semi-supervised learning Objectives and main results Traditional learning paradigms: unsupervised: no additional knowledge about the data samples supervised: a label is assigned to each data sample Supervised: large amounts of annotated training data needed. SSL: only part of training data needs to be annotated. Not yet applied for instrument recognition. Semi-supervised instrument recognition A. Diment 17.5.2013 5/25
Introduction Objectives and main results Musical instrument recognition Semi-supervised learning Objectives and main results Objectives: studying techniques for musical instrument recognition; studying various SSL schemes; Main results: developed pattern recognition system for instrument recognition with SSL; two algorithms are implemented. Semi-supervised instrument recognition A. Diment 17.5.2013 6/25
Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training Semi-supervised instrument recognition A. Diment 17.5.2013 7/25
System description Building blocks Input data Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training Preprocessing Feature extraction Training set Features Testing set Training Models Classification Decision Figure 1: A block diagram of a typical pattern classification system. Semi-supervised instrument recognition A. Diment 17.5.2013 8/25
System description Feature extraction Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training Relevant information is within the timbre, unique to a particular group of instruments. Set of tonal qualities which characterise a particular musical sound. Everything except pitch, loudness and duration. Semi-supervised instrument recognition A. Diment 17.5.2013 9/25
System description Feature extraction Input signal Pre-emphasis Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training Frame blocking and windowing FFT 2 Log DCT Static coefficients d dt Delta coefficients Figure 2: A block diagram of MFCCs calculation. Semi-supervised instrument recognition A. Diment 17.5.2013 10/25
System description Training algorithms Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training The EM algorithm MLEs of model parameters when treating observations as incomplete data. sed when obtaining direct equations for the model is impossible. Extending EM for SSL: the iterative and incremental EM-based algorithms 1. 1 P. J. Moreno and S. Agarwal, An experimental study of EM-based algorithms for semi-supervised learning in audio classification, in Proc. of the ICML-2003 Workshop on the Continuum from Labeled to nlabeled Data, 2003. Semi-supervised instrument recognition A. Diment 17.5.2013 11/25
training sample from the unlabelled dataset L training sample from the labelled dataset Initial iteration Iteration t = 1 Iteration t = 2... Classification Training dataset Piano L Guitar L Piano L Guitar L Piano L Guitar L Training... Models nlabelled samples Labelled samples, piano Labelled samples, guitar nlabelled samples classified as piano nlabelled samples classified as guitar Figure 3: Schematic and feature space representation of the training stage with three iterations of the iterative EM-based algorithm.
training sample from the unlabelled dataset L training sample from the labelled dataset Initial iteration Iteration t = 1 Iteration t = 2... Classification Training dataset Piano L Guitar L Piano L Guitar L Piano L Guitar L Training... Models nlabelled samples Labelled samples, piano Labelled samples, guitar nlabelled samples classified as piano nlabelled samples classified as guitar Figure 4: Schematic and feature space representation of the training stage with three iterations of the incremental EM-based algorithm.
System description Labelled data weighting The algorithms improve the performance in case the initial labelled data size is relatively low. Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training Not much difference when adding large or small amount of unlabelled data. de-weight the contribution of the unlabelled data. Simply: S l = ω(t) S l. Semi-supervised instrument recognition A. Diment 17.5.2013 14/25
System description One-class-at-a-time training Another issue of the iterative algorithm: resulting classification accuracy is oscillating along the iteration axis. Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training 100 Accuracy, % 90 80 0 5 10 15 20 Iteration Semi-supervised instrument recognition A. Diment 17.5.2013 15/25
System description One-class-at-a-time training One-class-at-a time approach: at an iteration only one class is retrained. Building blocks Feature extraction Training algorithms Labelled data weighting One-class-at-a-time training previous models of one class are unaffected by training another one fewer peaks safer to chose an arbitrary iteration index for termination. Semi-supervised instrument recognition A. Diment 17.5.2013 16/25
Datasets Baseline results Results with the iterative EM-based algorithm Semi-supervised instrument recognition A. Diment 17.5.2013 17/25
Evaluation Datasets Datasets Baseline results Results with the iterative EM-based algorithm RWC Music Database 2. Three variations per instrument. Separate notes within the whole range with a step of a semitone. 44.1 khz, 16 bit. 2 M. Goto et al., RWC music database: music genre database and musical instrument sound database, in Proc. of the 4th Int. Conf. on Music Information Retrieval (ISMIR), 2003, pp. 229 230. Semi-supervised instrument recognition A. Diment 17.5.2013 18/25
Evaluation Datasets Table 1: List of instruments and number of recordings of the notes used in the smaller and larger sets, respectively. Datasets Baseline results Results with the iterative EM-based algorithm Instrument # notes Instrument # notes Acoustic Guitar 702 Electric Guitar 702 Tuba 270 Bassoon 360 Total 2 212 Pianoforte 792 Classic Guitar 702 Electric Guitar 702 Electric Bass 507 Trombone 278 Tuba 270 Horn 288 Bassoon 360 Clarinet 360 Banjo 941 Total 5 200 Semi-supervised instrument recognition A. Diment 17.5.2013 19/25
Evaluation Baseline results Datasets Baseline results Results with the iterative EM-based algorithm Baseline scenario: treating all available data as labelled. pper bound for possible SSL performance. Table 2: Average classification accuracy across all classes in fully-supervised case. Instrument set size Recognition accuracy, % 4 instruments 92.1 10 instruments 82.9 Semi-supervised instrument recognition A. Diment 17.5.2013 20/25
Evaluation Results with the iterative EM-based algorithm Datasets Baseline results Results with the iterative EM-based algorithm Table 3: Classification results with the iterative EM-based SSL algorithm when incorporating both modifications. Instrument set size Recognition accuracy, % initial maximum absolute gain relative gain 4 instruments 89.19 95.30 6.11 6.85 10 instruments 58.73 68.43 9.70 14.18 Semi-supervised instrument recognition A. Diment 17.5.2013 21/25
100 90 Accuracy, % Datasets Baseline results Results with the iterative EM-based algorithm 80 Initial approach One class at a time Labelled data weighting Both approaches 0 2 4 6 8 10 12 14 16 18 20 (Macro)iteration Figure 5: Comparison of the modifications to the iterative algorithm with their combination and the initial version, smaller instrument set. Semi-supervised instrument recognition A. Diment 17.5.2013 22/25
Accuracy, % 90 Datasets Baseline results Results with the iterative EM-based algorithm 80 Initial accuracy Final accuracy 5 6 8 10 15 20 30 100 labelled/total dataset sizes ratio, % Figure 6: Comparison of the accuracies of the initial models and the final iteration of the incremental EM-based SSL algorithm as a function of the relative labelled dataset size with the smaller instrument set. Semi-supervised instrument recognition A. Diment 17.5.2013 23/25
Semi-supervised instrument recognition A. Diment 17.5.2013 24/25
Conclusions SSL for instrument recognition works. Gain 9.7%; as little as 7% data needs to be annotated; proposed extensions simplify termination and increase accuracy. Future work: more complex scenarios (more instruments, noise, reverberation... ); neighbouring problems. Semi-supervised instrument recognition A. Diment 17.5.2013 25/25