2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY

Size: px
Start display at page:

Download "2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 2016, SALERNO, ITALY"

Transcription

1 216 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT , 216, SALERNO, ITALY A FULLY CONVOLUTIONAL DEEP AUDITORY MODEL FOR MUSICAL CHORD RECOGNITION Filip Korzeniowski and Gerhard Widmer Johannes Kepler University, Linz, Austria, Department of Computational Perception arxiv: v1 [cs.lg] 15 Dec 216 ABSTRACT Chord recognition systems depend on robust feature extraction pipelines. While these pipelines are traditionally hand-crafted, recent advances in end-to-end machine learning have begun to inspire researchers to explore data-driven methods for such tasks. In this paper, we present a chord recognition system that uses a fully convolutional deep auditory model for feature extraction. The extracted features are processed by a Conditional Random Field that decodes the final chord sequence. Both processing stages are trained automatically and do not require expert knowledge for optimising parameters. We show that the learned auditory system extracts musically interpretable features, and that the proposed chord recognition system achieves results on par or better than state-of-the-art algorithms. Index Terms chord recognition, convolutional neural networks, conditional random fields 1. INTRODUCTION Chord Recognition is a long-standing topic of interest in the music information research (MIR) community. It is concerned with recognising (and transcribing) chords in audio recordings of music, a labor-intensive task that requires extensive musical training if done manually. Chords are a highly descriptive feature of music and useful e.g. for creating lead sheets for musicians or as part of higher-level tasks such as cover song identification. A chord can be defined as multiple notes perceived simultaneously in harmony. This does not require the notes to be played simultaneously a melody or a chord arpeggiation can imply the perception of a chord, even if intertwined with out-of-chord notes. Through this perceptual process, the identification of a chord is sometimes subject to interpretation even among trained experts. This inherent subjectivity is evidenced by diverse ground-truth annotations for the same songs and discussions about proper evaluation metrics [1]. This work is supported by the European Research Council (ERC) under the EU s Horizon 22 Framework Programme (ERC Grant Agreement number 6735, project Con Espressione ). The Tesla K4 used for this research was donated by the NVIDIA Corporation. Typical chord recognition pipelines comprise three stages: feature extraction, pattern matching, and chord sequence decoding. Feature extraction transforms audio signals into representations which emphasise content related to harmony. Pattern matching assigns chord labels to such representations but works on single frames or local context only. Chord sequence decoding puts the local detection into global context by predicting a chord sequence for the complete audio. Originally hand-crafted [2], all three stages have seen attempts to be replaced by data-driven methods. For feature extraction, linear regression [3], feed-forward neural networks [4] and convolutional neural networks [5] were explored; these approaches fit a transformation from a general timefrequency representation to a manually defined one that is specifically useful for chord recognition, like chroma vectors or a Tonnetz representation. Pattern matching often uses Gaussian mixture models [6], but has seen work on chord classification directly from a time-frequency representation using convolutional neural networks [7]. For sequence decoding, hidden Markov models [8], conditional random fields [9] and recurrent neural networks [1] are natural choices; however, the vast majority of chord recognition systems still rely on hidden Markov models (HMMs): only one approach used conditional random fields (CRFs) in combination with simple chroma features for this task [9], with limited success. This warrants further exploration of this model class, since it has proven to out-perform HMMs in other domains. In this paper, we present a novel end-to-end chord recognition system that combines a fully convolutional neural network (CNN) for feature extraction with a CRF for chord sequence decoding. Fully convolutional neural networks replace the stack of dense layers traditionally used in CNNs for classification with global average pooling (GAP) [11], which reduces the number of trainable parameters and improves generalisation. Similarly to [7], we train the CNN to directly predict chord labels for each audio frame, but instead of using these predictions directly, we use the hidden representation computed by the CNN as features for the subsequent pattern matching and chord sequence decoding stage. We call the feature-extracting part of the CNN auditory model. For pattern matching and chord sequence decoding, we /16/$31. c 216 IEEE

2 connect a CRF to the auditory model. Combining neural networks with CRFs gives a fully differentiable model that can be learned jointly, as shown in [12, 13]. For the task at hand, however, we found it advantageous to train both parts separately, both in terms of convergence time and performance. 2. FEATURE EXTRACTION Feature extraction is a two-phase process. First, we convert the signal into a time-frequency representation in the preprocessing stage. Then, we feed this representation to a CNN and train it to classify chords. We take the activations of a hidden layer in the network as high-level feature representation, which we then use to decode the final chord sequence Pre-processing The first stage of our feature extraction pipeline transforms the input audio into a time-frequency representation suitable as input to a CNN. As described in Sec. 2.2, CNNs consist of fixed-size filters that capture local structure, which requires the spatial relations to be similarly distributed in each area of the input. To achieve this, we compute the magnitude spectrogram of the audio and apply a filterbank with logarithmically spaced triangular filters. This gives us a time-frequency representation in which distances between notes (and their harmonics) are equal in all areas of the input. Finally, we logarithmise the filtered magnitudes to compress the value range. Mathematically, the resulting time-frequency representation L of an audio recording is defined as ) L = log (1 + B Log S, where S is the short-time Fourier transform (STFT) of the audio, and B Log is the logarithmically spaced triangular filterbank. To be concise, we will refer to L as spectrogram in the remainder of this paper. We feed the network spectrogram frames with context, i.e. the input to the network is not a single column l i of L, but a matrix X i = [l i C,..., l i,..., l i+c ], where i is the index of the target frame, and C is the context size. We chose the parameter values based on our previous study on data-driven feature extraction for chord recognition [4] and a number of preliminary experiments. We use a frame size of 8192 with a hop size of 441 at a sample rate of 441 Hz for the STFT. The filterbank comprises 24 filters per octave between 65 Hz and 21 Hz. The context size C = 7, thus each X i represents 1.5 sec. of audio. Our choice of parameters results in an input dimensionality of X i R Auditory Model To extract discriminative features from the input, in [4], we used a simple deep neural network (DNN) to compute chromagrams, concise descriptors of harmonic content. From these chromagrams, we used a simple classifier to predict chords in a frame-wise manner. Despite the network being simple conceptually, due to the dense connections between layers, the model had 1.9 million parameters. In this paper, we use a CNN for feature extraction. CNNs differ from traditional deep neural networks by including two additional types of computational layers: convolutional layers compute a 2-dimensional convolution of their input with a set of fixed-sized, trainable kernels per feature map, followed by a (usually non-linear) activation function; pooling layers subsample the input by aggregating over a local neighbourhood (e.g. the maximum of a 2 2 patch). The former can be reformulated as a dense layer using a sparse weight matrix with tied weights. This interpretation indicates the advantages of convolutional layers: fewer parameters and better generalisation. CNNs typically consist of convolutional lower layers that act as feature extractors, followed by fully connected layers for classification. Such layers are prone to over-fitting and come with a large number of parameters. We thus follow [11] and use global average pooling (GAP) to replace them. To further prevent over-fitting, we apply dropout [14], and use batch normalisation [15] to speed up training convergence. Table 1 details our model architecture, which consists of 9k parameters, roughly 5% of the original DNN. Inspired by the architecture presented in [16], We opted for multiple lower convolutional layers with small 3 3 kernels, followed by a layer computing 128 feature maps using 12 9 kernels. The intuition is that these bigger kernels can aggregate harmonic information for the classification part of the network. We will denote the output of this layer as F i, the features extracted from input X i. We target a reduced chord alphabet in this work (major and minor chords for 12 semitones) resulting in 24 classes plus a no chord class. This is a common restriction used in the literature on chord recognition [18]. The GAP construct thus learns a weighted average of the 128 feature maps for each of the 25 classes using the 1 1 convolution and average pooling layer. Applying the softmax function then ensures that the output sums to 1 and can be interpreted as a probability distribution of class labels given the input. Following [19], the activations of the network s hidden layers can be interpreted as hierarchical feature representations of the input data. We will thus use F i as a feature representation for the subsequent parts of our chord recognition pipeline Training and Data Augmentation We train the auditory model in a supervised manner using the Adam optimisation method [2] with standard parameters, minimising the categorical cross-entropy between true targets y i and network output ỹ i. Including a regularisation

3 Layer Type Parameters Padding Output Size Input Pool-Max Conv-Rectify no Conv-Rectify no Pool-Max Conv-Rectify no Conv-Linear no Pool-Avg Softmax 25 Table 1. Proposed CNN architecture. Batch normalisation is performed after each convolution layer. Dropout with probability.5 is applied at horizontal rules in the table. All convolution layers use rectifier units [17], except the last, which is linear. The bottom three layers represent the GAP, replacing fully connected layers for classification. term, the loss is defined as L = 1 D D y i log(ỹ i ) + λ θ 2, i=1 where D is the number of frames in the training data, λ = 1 7 the l 2 regularisation factor, and θ the network parameters. We process the training set in mini-batches of size 512, and stop training if the validation accuracy does not improve for 5 epochs. We apply two types of data manipulations to increase the variety of training data and prevent model over-fitting. Both exploit the fact that the frequency axis of our input representation is linear in pitch, and thus facilitates the emulation of pitch-shifting operations. The first operation, as explored in [7], shifts the spectrogram up or down in discrete semitone steps by a maximum of 4 semitones. This manipulation does not preserve the label, which we thus adjust accordingly. The second operation emulates a slight detuning by shifting the spectrogram by fractions of up to.4 of a semitone. Here, the label remains unchanged. We process each data point in a mini-batch with randomly selected shift distances. The network thus almost never sees exactly the same input during training. We found these data augmenting operations to be crucial to prevent over-fitting. 3. CHORD SEQUENCE DECODING Using the predictions of the pattern matching stage directly (in our case, the predictions of the CNN) often gives good results in terms of frame-wise accuracy. However, chord sequences obtained this way are often fragmented. The main purpose of chord sequence decoding is thus to smooth the reported sequence. Here, we use a linear-chain CRF [21] to introduce inter-frame dependencies and find the optimal state sequence using Viterbi decoding Conditional Random Fields Conditional random fields are probabilistic energy-based models for structured classification. They model the conditional probability distribution p (Y X) = exp [E (Y, X)] Y exp [E (Y, X)] where Y is the label vector sequence [y,..., y N ], and X the feature vector sequence of same length. We assume each y i to be the target label in one-hot encoding. The energy function is defined as N [ E (Y, X) = y n 1 Ay n + yn c + x ] n Wy n (2) i=1 + y π + y N τ where A models the inter-frame potentials, W the frameinput potentials, c the label bias, π the potential of the first label, and τ the potential of the last label. This form of energy function defines a linear-chain CRF. From Eq. 1 and 2 follows that a CRF can be seen as generalised logistic regression. They become equivalent if we set A, π and τ to. Further, logistic regression is equivalent to a softmax output layer of a neural network. We thus argue that a CRF whose input is computed by a neural network can be interpreted as a generalised softmax output layer that allows for dependencies between individual predictions. This makes CRFs a natural choice for incorporating dependencies between predictions of neural networks Model Definition and Training Our model has 25 states (12 semitones {major, minor} and a no-chord class). These states are connected to observed features through the weight matrix W, which computes a weighted sum of the features for each class. This corresponds to what the global-average-pooling part of the CNN does. We will thus use the input to the GAP-part, F i, averaged for each of the 128 feature maps, as input to the CRF. We can pull the averaging operation from the last layer to right after the feature-extraction layer, because the operations in between (linear convolution, batch normalisation) are linear and no dropout is performed at test-time. Formally, we will denote the input sequence as F R 128 N, where each column f i is the averaged feature output of the CNN for a given input X i. Our CRF thus models p ( Y F ). (1)

4 Isophonics Robbie Williams RWC CB KO NMSD Proposed Table 2. Weighted Chord Symbol Recall of major and minor chords achieved by different algorithms. The results of NMSD2 are statistically significantly worse than others, according to a Wilcoxon signed-rank test. Note that train and test data overlaps for CB3, KO1 and NMSD2, while the results of our method are determined by 8-fold cross-validation. As with the CNN, we train the CRF using Adam, but set a higher learning rate of.1. The mini-batches consist of 32 sequences with a length of 124 frames (12.3 sec) each. As optimisation criterion, we use the l 1 -regularized negative log-likelihood of all sequences in the data set: L = 1 S S log p ( ) Y i F i + λ ξ 1, i=1 where S is the number of sequences in the data set, λ = 1 4 is the l 1 regularization factor, and ξ are the CRF parameters. We stop training when validation accuracy does not increase for 5 epochs. 4. EXPERIMENTS We evaluate the proposed system using 8-fold cross-validation on a compound dataset that comprises the following subsets: Isophonics 1 : 18 songs by The Beatles, 19 songs by Queen, and 18 songs by Zweieck, totalling 1:21 hours of audio. RWC Popular [22]: 1 songs in the style of American and Japanese pop music originally recorded for this data set, totalling 6:46 hours of audio. Robbie Williams [23]: 65 songs by Robbie Williams, totalling 4:3 hours of audio. As evaluation measure, we compute the Weighted Chord Symbol Recall (WCSR), often called Weighted Average Overlap Ratio (WAOR), of major and minor chords as implemented in the mir eval library [24]: R = tc /t a, where t c is the total time where the prediction corresponds to the annotation, and t a is the total duration of annotations of the respective chord classes (major and minor chords, in our case). We compare our results to the three best-performing algorithms in the MIREX competition in (no superior algorithm has been submitted to MIREX since then): CB3, based on [6]; KO1, [25]; and NMSD2, [26] Results D A E B F C G D A E B F b b f f c c g g d d a a e e b b f f c c g g d d 1 Fig. 1. Correlation between weight vectors of chord classes. Rows and columns represent chords. Major chords are represented by upper-case letters, minor chords by lower-case letters. The order of chords within a chord quality is determined by the circle of fifths. We observe that weight vectors of chords close in the circle of fifths (such as C, F, and G ) correlate positively. Same applies to chords that share notes (such as C and a, or C and c ) Results The results presented for the reference algorithms differ from those found on the MIREX website. This is because of minor differences in the implementation of the evaluation libraries. To ensure a fairer comparison, we obtained the predictions of the compared algorithms and ran the same evaluation code for all approaches. Note however, that for the reference algorithms there is a known overlap between train and test set, and the obtained results might be optimistic. Table 2 shows the results of our method compared to three state-of-the-art algorithms. We can see that the proposed method performs slightly better (but not statistically significant), although the train set of the reference methods overlaps with the test set. D A E B F C G D A E B F 5. AUDITORY MODEL ANALYSIS Following [11], the final feature maps of a GAP network can be interpreted as category confidence maps. Such confidence maps will have a high average value if the network is confident that the input is of the respective category. In our architecture, the average activation of a confidence map can be expressed as a weighted average over the (batchnormalised) feature maps of the preceding layer. We thus have 128 weights for each of the 25 categories (chord classes). We wanted to see whether the penultimate feature maps F i can be interpreted in a musically meaningful way. To this end, we first analysed the similarity of the weight vectors for each chord class by computing their correlation. The result 1

5 Weight Feature Maps for Minor Chords Feature Maps for Major Chords Fig. 2. Connection weights of selected feature maps to chord classes. Chord classes are ordered according to the circle of fifths, such that harmonically close chords are close to each other. In the left plot, we selected feature maps that have a high average contribution to minor chords. In the right plot, those with high contribution to major chords. Feature maps with high average weights to minor chords show negative connections to all major chords. Within minor chords, we observe that two of them (1 and 11) discriminate between chords that are harmonically close (zig-zag pattern). We observe a similar pattern in the right plot. a a b c c d d e f f g g Feature Maps Fig. 3. Contribution of feature maps to pitch classes. Although these results are noisy, we observe that some feature maps seem to specialise on detecting the presence (or absence) pitch classes. For example, feature maps 1, 25, and 57 detect single pitch classes; feature maps 22, 46, and 1 contribute to pairs of related pitch classes perfect fifth between g and d in the 22 nd, minor third between d and f in the 46 th, and major third between a and c in the 85 th feature map. Note that the 1 th feature map also slightly discriminates d and a from f, which would form a d-minor triad together. Other feature maps that discriminate between pitch classes include the 11 th ( a vs. e, perfect fifth) and the 95 th ( f vs. a, major third). is shown in Fig. 1. We see a systematic correlation between weight vectors of chords that share notes or are close to each other in the circle of fifths. The patterns within minor chords are less clear. This might be because minor chords are underrepresented in the data, and the network could not learn systematic patterns from this limited amount. Furthermore, we wanted to see if the network learned to distinguish major and minor modes independently of the root note. To this end, we selected the four feature maps with the highest connection weights to major and minor chords respectively and plotted their contribution to each chord class in Fig. 2. Here, an interesting pattern emerges: feature maps with high average weights to minor chords have negative connections to all major chords. High activations in these feature maps thus make all major chords less likely. However, they tend to be specific on which minor chords they favour. We observe a zig-zag pattern that discriminates between chords that are next to each other in the circle of fifths. This means that although the weight vectors of harmonically close chords correlate, the network learned features to discriminate them. Finally, we investigated if there are feature maps that indicate the presence of individual pitch classes. To this end, we multiplied the weight vectors of all chords containing a pitch class, in order to isolate its influence. For example, when computing the weight vector for pitch class c, we multiplied the weight vectors of C, F, A, c, f, and a chords; their only commonality is the presence of the c pitch class. Fig. 3 shows the results. We can observe that some feature maps seem to specialise in detecting certain pitch classes and intervals, and some to discriminate between pitch classes.

6 6. CONCLUSION We presented a novel method for chord recognition based on a fully convolutional neural network in combination with a CRF. The method automatically learns musically interpretable features from the spectrogram, and performs at least as good as state-of-the-art systems. For future work we aim at creating a model that distinguishes more chord qualities than major and minor, independently of the root note of a chord. 7. REFERENCES [1] E. J. Humphrey and J. P. Bello, Four timely insights on automatic chord estimation, in Proc. of the 16th ISMIR, Málaga, Spain, 215. [2] T. Fujishima, Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music, in Proc. of the ICMC, Beijing, China, [3] R. Chen, W. Shen, A. Srinivasamurthy, and P. Chordia, Chord recognition using duration-explicit hidden Markov models, in Proc. of the 13th ISMIR, Porto, Portugal, 212. [4] F. Korzeniowski and G. Widmer, Feature learning for chord recognition: the deep chroma extractor, in Proc. of the 17th ISMIR, New York, USA, 216. [5] E. J. Humphrey, T. Cho, and J. P. Bello, Learning a robust tonnetz-space transform for automatic chord recognition, in Proc. of ICASSP, Kyoto, Japan, 212. [6] T. Cho, Improved Techniques for Automatic Chord Recognition from Music Audio Signals, Dissertation, New York University, New York, 214. [7] E. J. Humphrey and J. P. Bello, Rethinking Automatic Chord Recognition with Convolutional Neural Networks, in Proc. of the 11th ICMLA, Boca Raton, USA, Dec [8] A. Sheh and D. P. W. Ellis, Chord segmentation and recognition using EM-trained hidden Markov models, in Proc. of the 4th ISMIR, Washington, USA, 23. [9] J. A. Burgoyne, L. Pugin, C. Kereluik, and I. Fujinaga, A Cross-validated Study of Modelling Strategies for Automatic Chord Recognition, in Proc. of the 8th IS- MIR, Vienna, Austria, 27. [1] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, Audio chord recognition with recurrent neural networks, in Proc. of the 14th ISMIR, Curitiba, Brazil, 213. [11] M. Lin, Q. Chen, and S. Yan, Network in network, arxiv preprint arxiv: , 213. [12] J. Peng, L. Bo, and J. Xu, Conditional neural fields, in Proc. of NIPS, Vancouver, Canada, 29. [13] T. Do and T. Arti, Neural conditional random fields, in Proc. of 13th AISTATS, Chia Laguna, Italy, 21. [14] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, The Journal of Machine Learning Research, vol. 15, no. 1, pp , 214. [15] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, arxiv preprint arxiv: , 215. [16] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arxiv preprint arxiv: , 214. [17] X. Glorot, A. Bordes, and Y. Bengio, Deep sparse rectifier neural networks, in Proc. of the 14th AISTATS, Fort Lauderdale, USA, 211. [18] M. McVicar, R. Santos-Rodriguez, Y. Ni, and T. D. Bie, Automatic Chord Estimation from Audio: A Review of the State of the Art, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 2, pp , Feb [19] Y. Bengio, A. Courville, and P. Vincent, Representation Learning: A Review and New Perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp , Aug [2] D. Kingma and J. Ba, Adam: A method for stochastic optimization, arxiv preprint arxiv: , 214. [21] J. D. Lafferty, A. McCallum, and F. C. N. Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, in Proc. of the 18th ICML, Williamstown, USA, 21. [22] M. Goto, H. Hashiguchi, T. Nishimura, and R. Oka, RWC Music Database: Popular, Classical and Jazz Music Databases., in Proc. of the 3rd ISMIR, Paris, France, 22. [23] B. Di Giorgi, M. Zanoni, A. Sarti, and S. Tubaro, Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony, in Proc. of the 8th Int. Workshop on Multidim. Sys., Erlangen, Germany, 213. [24] C. Raffel, B. McFee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. W. Ellis, mir eval: a transparent implementation of common MIR metrics, in Proc. of the 15th ISMIR, Taipei, Taiwan, 214. [25] M. Khadkevich and M. Omologo, Time-frequency reassigned features for automatic chord recognition, in Proc. of ICASSP [26] Y. Ni, M. McVicar, R. Santos-Rodriguez, and T. De Bie, Using Hyper-genre Training to Explore Genre Information for Automatic Chord Estimation., in Proc. of the 13th ISMIR, Porto, Portugal, 212.

arxiv: v2 [cs.sd] 31 Mar 2017

arxiv: v2 [cs.sd] 31 Mar 2017 On the Futility of Learning Complex Frame-Level Language Models for Chord Recognition arxiv:1702.00178v2 [cs.sd] 31 Mar 2017 Abstract Filip Korzeniowski and Gerhard Widmer Department of Computational Perception

More information

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations

Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Chord Label Personalization through Deep Learning of Integrated Harmonic Interval-based Representations Hendrik Vincent Koops 1, W. Bas de Haas 2, Jeroen Bransen 2, and Anja Volk 1 arxiv:1706.09552v1 [cs.sd]

More information

JOINT BEAT AND DOWNBEAT TRACKING WITH RECURRENT NEURAL NETWORKS

JOINT BEAT AND DOWNBEAT TRACKING WITH RECURRENT NEURAL NETWORKS JOINT BEAT AND DOWNBEAT TRACKING WITH RECURRENT NEURAL NETWORKS Sebastian Böck, Florian Krebs, and Gerhard Widmer Department of Computational Perception Johannes Kepler University Linz, Austria sebastian.boeck@jku.at

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception

LEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler

More information

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello Small chord vocabularies Typically a supervised learning problem N C:maj C:min C#:maj C#:min D:maj D:min......

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Sparse Representation Classification-Based Automatic Chord Recognition For Noisy Music

Sparse Representation Classification-Based Automatic Chord Recognition For Noisy Music Journal of Information Hiding and Multimedia Signal Processing c 2018 ISSN 2073-4212 Ubiquitous International Volume 9, Number 2, March 2018 Sparse Representation Classification-Based Automatic Chord Recognition

More information

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS François Rigaud and Mathieu Radenen Audionamix R&D 7 quai de Valmy, 7 Paris, France .@audionamix.com ABSTRACT This paper

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval Data-Driven Solo Voice Enhancement for Jazz Music Retrieval Stefan Balke1, Christian Dittmar1, Jakob Abeßer2, Meinard Müller1 1International Audio Laboratories Erlangen 2Fraunhofer Institute for Digital

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

IMPROVED CHORD RECOGNITION BY COMBINING DURATION AND HARMONIC LANGUAGE MODELS

IMPROVED CHORD RECOGNITION BY COMBINING DURATION AND HARMONIC LANGUAGE MODELS IMPROVED CHORD RECOGNITION BY COMBINING DURATION AND HARMONIC LANGUAGE MODELS Filip Korzeniowski and Gerhard Widmer Institute of Computational Perception, Johannes Kepler University, Linz, Austria filip.korzeniowski@jku.at

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

Audio Cover Song Identification using Convolutional Neural Network

Audio Cover Song Identification using Convolutional Neural Network Audio Cover Song Identification using Convolutional Neural Network Sungkyun Chang 1,4, Juheon Lee 2,4, Sang Keun Choe 3,4 and Kyogu Lee 1,4 Music and Audio Research Group 1, College of Liberal Studies

More information

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING

A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING A STUDY ON LSTM NETWORKS FOR POLYPHONIC MUSIC SEQUENCE MODELLING Adrien Ycart and Emmanouil Benetos Centre for Digital Music, Queen Mary University of London, UK {a.ycart, emmanouil.benetos}@qmul.ac.uk

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Joint Image and Text Representation for Aesthetics Analysis

Joint Image and Text Representation for Aesthetics Analysis Joint Image and Text Representation for Aesthetics Analysis Ye Zhou 1, Xin Lu 2, Junping Zhang 1, James Z. Wang 3 1 Fudan University, China 2 Adobe Systems Inc., USA 3 The Pennsylvania State University,

More information

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval Yi Yu, Roger Zimmermann, Ye Wang School of Computing National University of Singapore Singapore

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

HarmonyMixer: Mixing the Character of Chords among Polyphonic Audio

HarmonyMixer: Mixing the Character of Chords among Polyphonic Audio HarmonyMixer: Mixing the Character of Chords among Polyphonic Audio Satoru Fukayama Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan {s.fukayama, m.goto} [at]

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

DOWNBEAT TRACKING WITH MULTIPLE FEATURES AND DEEP NEURAL NETWORKS

DOWNBEAT TRACKING WITH MULTIPLE FEATURES AND DEEP NEURAL NETWORKS DOWNBEAT TRACKING WITH MULTIPLE FEATURES AND DEEP NEURAL NETWORKS Simon Durand*, Juan P. Bello, Bertrand David*, Gaël Richard* * Institut Mines-Telecom, Telecom ParisTech, CNRS-LTCI, 37/39, rue Dareau,

More information

Representations of Sound in Deep Learning of Audio Features from Music

Representations of Sound in Deep Learning of Audio Features from Music Representations of Sound in Deep Learning of Audio Features from Music Sergey Shuvaev, Hamza Giaffar, and Alexei A. Koulakov Cold Spring Harbor Laboratory, Cold Spring Harbor, NY Abstract The work of a

More information

Chord Recognition with Stacked Denoising Autoencoders

Chord Recognition with Stacked Denoising Autoencoders Chord Recognition with Stacked Denoising Autoencoders Author: Nikolaas Steenbergen Supervisors: Prof. Dr. Theo Gevers Dr. John Ashley Burgoyne A thesis submitted in fulfilment of the requirements for the

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Holger Kirchhoff 1, Simon Dixon 1, and Anssi Klapuri 2 1 Centre for Digital Music, Queen Mary University

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Analysing Musical Pieces Using harmony-analyser.org Tools

Analysing Musical Pieces Using harmony-analyser.org Tools Analysing Musical Pieces Using harmony-analyser.org Tools Ladislav Maršík Dept. of Software Engineering, Faculty of Mathematics and Physics Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

A DISCRETE MIXTURE MODEL FOR CHORD LABELLING

A DISCRETE MIXTURE MODEL FOR CHORD LABELLING A DISCRETE MIXTURE MODEL FOR CHORD LABELLING Matthias Mauch and Simon Dixon Queen Mary, University of London, Centre for Digital Music. matthias.mauch@elec.qmul.ac.uk ABSTRACT Chord labels for recorded

More information

Deep learning for music data processing

Deep learning for music data processing Deep learning for music data processing A personal (re)view of the state-of-the-art Jordi Pons www.jordipons.me Music Technology Group, DTIC, Universitat Pompeu Fabra, Barcelona. 31st January 2017 Jordi

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS

OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS First Author Affiliation1 author1@ismir.edu Second Author Retain these fake authors in submission to preserve the formatting Third

More information

An AI Approach to Automatic Natural Music Transcription

An AI Approach to Automatic Natural Music Transcription An AI Approach to Automatic Natural Music Transcription Michael Bereket Stanford University Stanford, CA mbereket@stanford.edu Karey Shi Stanford Univeristy Stanford, CA kareyshi@stanford.edu Abstract

More information

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS Juhan Nam Stanford

More information

A Study on Music Genre Recognition and Classification Techniques

A Study on Music Genre Recognition and Classification Techniques , pp.31-42 http://dx.doi.org/10.14257/ijmue.2014.9.4.04 A Study on Music Genre Recognition and Classification Techniques Aziz Nasridinov 1 and Young-Ho Park* 2 1 School of Computer Engineering, Dongguk

More information

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS Justin Salamon Music Technology Group Universitat Pompeu Fabra, Barcelona, Spain justin.salamon@upf.edu Emilia

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Image-to-Markup Generation with Coarse-to-Fine Attention

Image-to-Markup Generation with Coarse-to-Fine Attention Image-to-Markup Generation with Coarse-to-Fine Attention Presenter: Ceyer Wakilpoor Yuntian Deng 1 Anssi Kanervisto 2 Alexander M. Rush 1 Harvard University 3 University of Eastern Finland ICML, 2017 Yuntian

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

IMPROVED ONSET DETECTION FOR TRADITIONAL IRISH FLUTE RECORDINGS USING CONVOLUTIONAL NEURAL NETWORKS

IMPROVED ONSET DETECTION FOR TRADITIONAL IRISH FLUTE RECORDINGS USING CONVOLUTIONAL NEURAL NETWORKS IMPROVED ONSET DETECTION FOR TRADITIONAL IRISH FLUTE RECORDINGS USING CONVOLUTIONAL NEURAL NETWORKS Islah Ali-MacLachlan, Carl Southall, Maciej Tomczak, Jason Hockman DMT Lab, Birmingham City University

More information

arxiv: v1 [cs.cv] 16 Jul 2017

arxiv: v1 [cs.cv] 16 Jul 2017 OPTICAL MUSIC RECOGNITION WITH CONVOLUTIONAL SEQUENCE-TO-SEQUENCE MODELS Eelco van der Wel University of Amsterdam eelcovdw@gmail.com Karen Ullrich University of Amsterdam karen.ullrich@uva.nl arxiv:1707.04877v1

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

EXPLORING DATA AUGMENTATION FOR IMPROVED SINGING VOICE DETECTION WITH NEURAL NETWORKS

EXPLORING DATA AUGMENTATION FOR IMPROVED SINGING VOICE DETECTION WITH NEURAL NETWORKS EXPLORING DATA AUGMENTATION FOR IMPROVED SINGING VOICE DETECTION WITH NEURAL NETWORKS Jan Schlüter and Thomas Grill Austrian Research Institute for Artificial Intelligence, Vienna jan.schlueter@ofai.at

More information

Probabilist modeling of musical chord sequences for music analysis

Probabilist modeling of musical chord sequences for music analysis Probabilist modeling of musical chord sequences for music analysis Christophe Hauser January 29, 2009 1 INTRODUCTION Computer and network technologies have improved consequently over the last years. Technology

More information

arxiv: v1 [cs.sd] 5 Apr 2017

arxiv: v1 [cs.sd] 5 Apr 2017 REVISITING THE PROBLEM OF AUDIO-BASED HIT SONG PREDICTION USING CONVOLUTIONAL NEURAL NETWORKS Li-Chia Yang, Szu-Yu Chou, Jen-Yu Liu, Yi-Hsuan Yang, Yi-An Chen Research Center for Information Technology

More information

arxiv: v1 [cs.lg] 15 Jun 2016

arxiv: v1 [cs.lg] 15 Jun 2016 Deep Learning for Music arxiv:1606.04930v1 [cs.lg] 15 Jun 2016 Allen Huang Department of Management Science and Engineering Stanford University allenh@cs.stanford.edu Abstract Raymond Wu Department of

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES. A Thesis. presented to

AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES. A Thesis. presented to AUTOMATIC MUSIC TRANSCRIPTION WITH CONVOLUTIONAL NEURAL NETWORKS USING INTUITIVE FILTER SHAPES A Thesis presented to the Faculty of California Polytechnic State University, San Luis Obispo In Partial Fulfillment

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

A probabilistic framework for audio-based tonal key and chord recognition

A probabilistic framework for audio-based tonal key and chord recognition A probabilistic framework for audio-based tonal key and chord recognition Benoit Catteau 1, Jean-Pierre Martens 1, and Marc Leman 2 1 ELIS - Electronics & Information Systems, Ghent University, Gent (Belgium)

More information

DOWNBEAT TRACKING USING BEAT-SYNCHRONOUS FEATURES AND RECURRENT NEURAL NETWORKS

DOWNBEAT TRACKING USING BEAT-SYNCHRONOUS FEATURES AND RECURRENT NEURAL NETWORKS 1.9.8.7.6.5.4.3.2.1 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 DOWNBEAT TRACKING USING BEAT-SYNCHRONOUS FEATURES AND RECURRENT NEURAL NETWORKS Florian Krebs, Sebastian Böck, Matthias Dorfer, and Gerhard Widmer Department

More information

Homework 2 Key-finding algorithm

Homework 2 Key-finding algorithm Homework 2 Key-finding algorithm Li Su Research Center for IT Innovation, Academia, Taiwan lisu@citi.sinica.edu.tw (You don t need any solid understanding about the musical key before doing this homework,

More information

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Eita Nakamura and Shinji Takaki National Institute of Informatics, Tokyo 101-8430, Japan eita.nakamura@gmail.com, takaki@nii.ac.jp

More information

An Introduction to Deep Image Aesthetics

An Introduction to Deep Image Aesthetics Seminar in Laboratory of Visual Intelligence and Pattern Analysis (VIPA) An Introduction to Deep Image Aesthetics Yongcheng Jing College of Computer Science and Technology Zhejiang University Zhenchuan

More information

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui

More information

Music Similarity and Cover Song Identification: The Case of Jazz

Music Similarity and Cover Song Identification: The Case of Jazz Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks Timbre Analysis of Music Audio Signals with Convolutional Neural Networks Jordi Pons, Olga Slizovskaia, Rong Gong, Emilia Gómez and Xavier Serra Music Technology Group, Universitat Pompeu Fabra, Barcelona.

More information

A COMPARISON OF MELODY EXTRACTION METHODS BASED ON SOURCE-FILTER MODELLING

A COMPARISON OF MELODY EXTRACTION METHODS BASED ON SOURCE-FILTER MODELLING A COMPARISON OF MELODY EXTRACTION METHODS BASED ON SOURCE-FILTER MODELLING Juan J. Bosch 1 Rachel M. Bittner 2 Justin Salamon 2 Emilia Gómez 1 1 Music Technology Group, Universitat Pompeu Fabra, Spain

More information

BAYESIAN METER TRACKING ON LEARNED SIGNAL REPRESENTATIONS

BAYESIAN METER TRACKING ON LEARNED SIGNAL REPRESENTATIONS BAYESIAN METER TRACKING ON LEARNED SIGNAL REPRESENTATIONS Andre Holzapfel, Thomas Grill Austrian Research Institute for Artificial Intelligence (OFAI) andre@rhythmos.org, thomas.grill@ofai.at ABSTRACT

More information

Obtaining General Chord Types from Chroma Vectors

Obtaining General Chord Types from Chroma Vectors Obtaining General Chord Types from Chroma Vectors Marcelo Queiroz Computer Science Department University of São Paulo mqz@ime.usp.br Maximos Kaliakatsos-Papakostas Department of Music Studies Aristotle

More information

A Discriminative Approach to Topic-based Citation Recommendation

A Discriminative Approach to Topic-based Citation Recommendation A Discriminative Approach to Topic-based Citation Recommendation Jie Tang and Jing Zhang Department of Computer Science and Technology, Tsinghua University, Beijing, 100084. China jietang@tsinghua.edu.cn,zhangjing@keg.cs.tsinghua.edu.cn

More information

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

IEEE Proof Web Version

IEEE Proof Web Version IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 2, FEBRUARY 2014 1 Automatic Chord Estimation from Audio: AReviewoftheStateoftheArt Matt McVicar, Raúl Santos-Rodríguez, Yizhao

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Refined Spectral Template Models for Score Following

Refined Spectral Template Models for Score Following Refined Spectral Template Models for Score Following Filip Korzeniowski, Gerhard Widmer Department of Computational Perception, Johannes Kepler University Linz {filip.korzeniowski, gerhard.widmer}@jku.at

More information

Evaluating Melodic Encodings for Use in Cover Song Identification

Evaluating Melodic Encodings for Use in Cover Song Identification Evaluating Melodic Encodings for Use in Cover Song Identification David D. Wickland wickland@uoguelph.ca David A. Calvert dcalvert@uoguelph.ca James Harley jharley@uoguelph.ca ABSTRACT Cover song identification

More information

CREPE: A CONVOLUTIONAL REPRESENTATION FOR PITCH ESTIMATION

CREPE: A CONVOLUTIONAL REPRESENTATION FOR PITCH ESTIMATION CREPE: A CONVOLUTIONAL REPRESENTATION FOR PITCH ESTIMATION Jong Wook Kim 1, Justin Salamon 1,2, Peter Li 1, Juan Pablo Bello 1 1 Music and Audio Research Laboratory, New York University 2 Center for Urban

More information

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017

Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Noise (Music) Composition Using Classification Algorithms Peter Wang (pwang01) December 15, 2017 Background Abstract I attempted a solution at using machine learning to compose music given a large corpus

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology 26.01.2015 Multipitch estimation obtains frequencies of sounds from a polyphonic audio signal Number

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering,

DeepID: Deep Learning for Face Recognition. Department of Electronic Engineering, DeepID: Deep Learning for Face Recognition Xiaogang Wang Department of Electronic Engineering, The Chinese University i of Hong Kong Machine Learning with Big Data Machine learning with small data: overfitting,

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

/$ IEEE

/$ IEEE 564 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals Jean-Louis Durrieu,

More information

Using Deep Learning to Annotate Karaoke Songs

Using Deep Learning to Annotate Karaoke Songs Distributed Computing Using Deep Learning to Annotate Karaoke Songs Semester Thesis Juliette Faille faillej@student.ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH

More information