Towards instrument segmentation for music content description: a critical review of instrument classification techniques
|
|
- Daisy Norris
- 5 years ago
- Views:
Transcription
1 Towards instrument segmentation for music content description: a critical review of instrument classification techniques Perfecto Herrera, Xavier Amatriain, Eloi Batlle, Xavier Serra Audiovisual Institute - Pompeu Fabra University Rambla 31, Barcelona, Spain {perfecto.herrera, xavier.amatriain, eloi.batlle, xavier.serra}@iua.upf.es A system capable of describing the musical content of any kind of sound file or sound stream, as it is supposed to be done in MPEG7-compliant applications, should provide an account of the different moments where a certain instrument can be listened to. In this paper we concentrate on reviewing the different techniques that have been so far proposed for automatic classification of musical instruments. As most of the techniques to be discussed are usable only in "solo" performances we will evaluate their applicability to the more complex case of describing sound mixes. We conclude this survey discussing the necessity of developing new strategies for classifying sound mixes without a priori separation of sound sources. Keywords: classification, timbre models, segmentation, music content processing, multimedia content description, MPEG-7 Introduction The need for automatically classifying sounds 1 arises in contexts as different as bioacoustics or military surveillance. Our focus, anyway, will be that of multimedia content description, where segmentation of musical audio streams can be done in terms of the instruments that can be listened to (for example in order to locate a solo in the middle of a song). Two main different objectives can be envisioned in this context: segmentation according to the played instrument, where culturally accepted labels for all the classes have to be associated with certain feature vectors (hence it is a clear example of supervised learning problem); segmentation according to perceptual features, where there are no universal labels for classifying segments but similarity distance functions derived from psychoacoustical studies on what humans intend as timbral similarity [1;2;3;4]. The first point will be the subject of this paper, whereas the second one has been partially pursued in one of our recent contributions to the MPEG-7 process [5]. Although a blind or completely bottom-up approach could be feasible for tackling the problem, we can assume that some additional meta-information (e.g. title of the piece, composer, players ) will be available in the moment of performing the classification, because these and other metadata are expected to be part of the MPEG7 standard that should be approved by the end of 2001 [6]. Descriptions compliant with that standard will include, alongside all those textual metadata, other structural, semantic and temporal data about the instruments or sound sources that are being played in a specific moment, the notes/chords/scales they are playing, or the types of expressive musical resources (e.g. vibrato, sforzando ) used by the players. Extracting all those non-textual data by hand is an overwhelming task 1 The construction of a classification procedure from a set of data for which the true classes are known has also been variously termed pattern recognition, discrimination, or supervised learning (in order to distinguish it from unsupervised learning or clustering in which the classes are inferred from the data) [55]. The aim of supervised learning is to derive, from correctly classified cases, a rule whereby we can classify a new observation into one of the existing classes.
2 and therefore automatic procedures have to be found to perform what has been called the signal-tosymbol transformation [7]. Instrument segmentation of complex mixtures of signals is still far from being solved (but see [8], [9], [10] for different approaches). Therefore, one preliminary way of overriding the obnoxious stage of separating components is reducing the scope of the classification systems to only deal with isolated sounds. There is an obvious tradeoff in endorsing this strategy: we gain simplicity and tractability, but we lose contextual and time-dependent cues that can be exploited as relevant features for classifying the sounds. As this has been the preferred strategy in the current literature on instrument classification, this paper will concentrate on them. A review of those studies would not be complete without discussing the features used for classification, but space constraints have prevented us of including it here. Classification of monophonic sounds K-Nearest Neighbors The K-Nearest Neighbors algorithm is one of the most popular algorithms for instance-based learning. It first stores the feature vectors of all the training examples and then, for classifying a new instance, it finds (usually using an Euclidean distance) a set of k nearest training examples in the feature space, and assigns the new example to the class that has more examples in the set. Although it is an easy algorithm to implement, the K-NN technique has several drawbacks: as it is a lazy algorithm [11] it does not provide a generalization mechanism (because it is only based on local information), it requires having in memory all the training instances, it is highly sensitive to irrelevant features (as they can dominate the distance metrics), and it may require a significant load of computation each time a new query is done. A K-NN algorithm classified 4 instruments almost with complete accuracy in [12]. Unfortunately, they used a small database (with restricted note range to one octave, although including different dynamics), and conclusions should be taken with caution, moreover if we consider the following more thoroughful works. Martin and Kim [13] (but also see [14]) developed a classification system that used the K-NN with 31 features extracted from cochleagrams. The system also used a hierarchical procedure consisting on first discriminating pizzicati from continuous notes, then discriminating between different families (sustained sounds furthermore divided into strings, woodwind and brass), and finally, specifically classifying sounds into instrument categories. With a database of 1023 sounds they achieved 87% of successful classifications at the family level and 61% at the instrument level when no hierarchy was used. Using the hierarchical procedure increased the accuracy at the instrument level to 79% but it degraded the performance at the family level (79%). Without including the hierarchical procedure performance figures were lower than the ones they obtained with a Bayesian classifier (see below). [15] used a combination of Gaussian classifier 2 and k-nn for classifying 1498 samples into specific instrumental families or specific instrument labels. Using an architecture very similar to Martin and Kim s hierarchy (sounds are first classified in broad categories and then the classification is refined inside that category) they reported a success of 75% in individual instrument classification (and 94% in family classification). Additionally they report a small accuracy improvement by only using the best features for each instrument and no hierarchy at all (80%). 2 The Gaussian classifier was only used for rough discrimination between pizzicati and sustained sounds
3 A possible enhancement of the K-NN technique consisting on weighting each feature according to its relevance for the task has been used by the Fujinaga team 3 [16;17;18] [19]. In a series of three experiments using over 1200 notes from 39 different timbres taken from the McGill Master Samples CD library the success rate of 50%, observed when only the spectral shape of steady-state notes was used, increased to 68% when tristimulus, attack position and features of dynamically changing spectrum envelope, such as the change rate of the centroid, were added. In the most recent paper, a real-time version of this system was reported. The fact the best accuracy figures are around 80% and that Martin and Fujinaga have settled into similar figures, can be interpreted as an estimation of the limitations of the K-NN algorithm (provided that the feature selection has been optimized with genetic or other kind of techniques). Therefore, more powerful techniques should be explored. Naive Bayesian Classifiers This method 4 involves a learning step in which the probabilities for the classes and the conditional probabilities for a given feature and a given class are estimated, based on their frequencies over the training data. The set of these estimates corresponds to the learned hypothesis, which is formed without searching, simply by counting the frequency of various data combinations within the training examples, and can be used then to classify each new instance. This technique has been used with 18 Mel-Cepstrum Coefficients in [20]. After clustering the feature vectors with a K-means algorithm, a Gaussian mixture model from their means and variances was built. This model was used to estimate the probabilities for a Bayessian classifier. It then classified 30 short sounds of oboe and sax with an accuracy rate of 85%. Martin [14] enhanced a similar Bayesian classifier with context-dependent feature selection procedures, rule-one-out category decisions, beam search, and Fisher discriminant analysis for estimating the Maximum A Priori probabilities. In [13] performance of this system was better than that of a K-NN algorithm at the instrument level (71% accuracy) and equivalent to it at the family level (85% accuracy). Discriminant Analysis Classification using categories or labels that have been previously defined can be done with the help of discriminant analysis, a technique that is related to multivariate analysis of variance and multiple regression. Discrimination analysis attempts to minimize the ratio of within-class scatter to the betweenclass scatter and builds a definite decision region between the classes. It provides linear, quadratic or logistic functions of the variables that "best" separate cases into two or more predefined groups, but it is also useful for determining which the most discriminative features are and the most alike/different groups. One possible drawback of the technique is its reduced generalization power, although Jackknife tests (cross-validating with leave-one-case-out) can protect against overfitting to the observed data. Surprisingly the only study using this technique, and not thoroughly, has been the one by Martin and Kim. They only used LDA for estimation of the mean and variance for the gaussians of each class to be fed to an enhanced naive Bayesian classifier. Perhaps it is commonly assumed that the classification problem is much more complex than that of a quadratic estimation, but it means taking from granted something that has not been experimentally verified, and maybe it should be done. Following this line, in a pilot study carried in our laboratory with 120 sounds from 8 classes and 3 families we have got 85% (Jackknifed: 75%) accuracy using quadratic linear discriminant functions in 3 The feature relevance was determined with a genetic algorithm 4 Here naive means that it assumes feature independence
4 two steps (sounds are first assigned to family, and then they are specifically classified). Given that the features we used were not optimized for segmentation but for searching by similarity, we expect to be able to get still better results when we include other valuable features. Binary trees Binary trees, in different formulations, are pervasively used for different machine learning and classification tasks. They are constructed top-down, beginning with the feature that seems to be the most informative one, that is, the one that maximally reduces entropy. Branches are then created from each one of the different values of this descriptor (in the case of non binary valued descriptors a procedure for dichotomic partition of the value range must be defined). The training examples are sorted to the appropriate descendant node, and the entire process is then repeated recursively using the examples of one of the descendant nodes, then with the other. Once the tree has been built, it can be pruned to avoid overfitting and to remove secondary features. Although building a binary tree is a recursive procedure, it is anyway faster than the training of a neural network. Binary trees are best suited for approximating discrete-valued target functions but they can be adapted to real-valued features as Jensen s binary decision tree [21], which exemplifies their application to instrument classification. In his system the trees are constructed by asking a large number of questions (e.g. attack time longer than 60 ms? ), then, for each question, data are split into two groups, goodness of split (average entropy) is calculated and finally the question that renders the best goodness is chosen. Once the tree has been built using the learning set, it can be used for classifying new sounds (each leaf corresponds to one specific class) but also for making explicit rules about which features better discriminate an instrument from another. Unfortunately results regarding the classification of new sounds have not yet been published (but see Jensen s thesis [22] for an attempt on log-likelihood classification functions). An application of the C4.5 algorithm [23] can be found in [24], where a database of 18 classes and 62 features was classified with accuracy rates between 64% and 68% depending on the test procedure. A final example of a binary tree for audio classification, although not specifically tested with musical sounds, is that of Foote [25]. His tree-based supervised vector quantization with maximization of mutual information uses frame-by-frame 12 Mel-cepstral coefficients plus energy for partitioning the feature space into a number of discrete regions. Each split decision in the tree involves comparing one element of the vector with a fixed threshold, that is chosen to maximize the mutual information between the data and the associated labels that indicate the class of each datum. Once the tree is built, it can be used as a classifier by computing histograms of frequencies of classes in each leaf of the tree and using distance measures between histogram templates derived from the training data and the resulting histogram for the test sound. Support Vector Machines SVMs are a very recently developed technique that is based on statistical learning theory [26]. The basic training principle behind SVMs is finding the optimal linear hyperplane such that the expected classification error for unseen test samples is minimized (i.e. they look for good generalization performance). According to the structural risk minimization inductive principle, a function that classifies the training data accurately and which belongs to a set of functions with the lowest complexity will generalize best regardless of the dimensionality of the input space. Based on this principle, a linear SVM uses a systematic approach to find a linear function with the lowest complexity. For linearly nonseparable data, SVMs can (nonlinearly) map the input to a high dimensional feature space where a linear hyperplane can be found. Although there is no guarantee that a linear solution will always exist in the high dimensional space, in practice it is quite feasible to construct a working solution. In sum, training a SVM is equivalent to solving a quadratic programming with linear constraints and as many variables as
5 data points. A SVM was used in [27] for the classification of eight solo instruments playing musical scores from well-known composers. The best accuracy rate was a 70% using 16 MCCs and sound segments that were 0.2 seconds long. When she attempted classification on longer segments an improvement was observed (83%) although there were two instruments very difficult to classify (trombone and harpsichord). Another worth-mentioning feature of this study is the use of truly independent sets for the learning and for the test sets (and they were mainly solo phrases from commercial recordings). Artificial Neural Networks A very simple feedforward network with a backpropagation training algorithm was used, along with K-NN, in [12]. The network (a 3/5/4 architecture) learnt to classify sounds from 4 very different instruments (piano, marimba, accordion and guitar) with a high accuracy (best figure 97%), although slightly better figures were obtained using the simplest K-NN algorithm (see above). A comparison between a multilayer perceptron, a time-delay network, and a hybrid self-organizing network/radial basis function can be found in [28]. Although very high success rates were found (97% for the perceptron, 100% for the time-delay network, and 94% for the self-organizing network) it should be noted that the experiments used only 40 sounds from 10 different classes and ranging one octave only. Examples of self-organizing map [29] usage can be found in [30], [31],[32], [33]. All these studies use some kind of auditory pre-processing for getting the features that are fed to the network, then build the map, and finally compare the clustering of sounds made by the network with human subjects similarity judgments ([1], [34]). From these maps and comparisons the authors advance timbral spaces to be explored, or confirm/disconfirm theoretical models that explain the data. It can be seen then, that the classification we get from these kind of systems are not directly usable for instrument recognition, as they are not provided with any label to be learnt. Nevertheless, a mechanism for associating their output clusters to specific labels seems feasible to be implemented (e.g. the radial basis function used by Cemgil, see above). The ARTMAP architecture [35] eventually implements this strategy by a complex topology: an associative memory is connected with an input network that self-organizes binary input patterns, with an output network that does the same with binary and real-valued patterns, and with an orienting subsystem that may alter the input processing depending on output and associative memory states. Fragoulis et al [36] successfully used an ARTMAP for the classification of 5 instruments with the help of only ten features (slopes of the first five partials, time delays of the first 4 partials respective to the fundamental, and high frequency energy). The errors (2%) were attributed to not having taken into account different playing dynamics in the training phase. The most thorough study on instrument classification using neural networks is, perhaps, that of Kostek s [37], although it has been a bit neglected in the relevant literature. Her team has carried out several studies [38] [39] on network architecture, training procedures, and number and type of features, although the number of classes to be classified has been always too small. They have used a feedforward NN with one hidden layer, and their classes were trombone, bass trombone, English horn and contrabassoon, instruments with somehow similar sound. Accuracy rates use to be higher than 90%, although they vary depending on the type of training and number of descriptors. Although some ANN architectures are capable of approximate any function, and therefore neural networks are a good choice when the function to be learned is not known in advance, they have some drawbacks: first of all, the computation time for the learning phase is very long, tweaking of their parameters can also be tedious and prohibitive, and over-fitting (excessive number of bad selected examples) can degrade their generalization capabilities. On the positive side, figures coming from available studies do not quite outperform other simpler algorithms but anyway neural networks may
6 exhibit one advantage in front of some of them: once the net has learnt, the classification decision is very fast (compared to K-NN or to binary trees). Higher Order Statistics When signals have Gaussian density distributions, we can describe them thoroughly with second order measures like the autocorrelation function or the spectrum. There are some authors who claim that musical signals, as they have been generated through non-linear processes, do not fit a Gaussian distribution. In that case, using higher order statistics or polyspectra, as for example skewness of bispectrum and kurtosis of trispectrum, it is possible to capture all information that could be lost if using a simpler Gaussian model. With these techniques, and using a Maximum Likelihood classifier, Dubnov and his collaborators [40] have showed that discrimination between 18 instruments from string, woodwind and brass families is possible although they only provide figures for a classification experiment that used generic classes of sounds (not musical notes). Rough Sets Rough sets [41] are a novel technique for evaluating the relevance of the features used for description and classification. It has been developed in the realm of knowledge-based discovery systems and data mining (although similar, not to be mistaken with fuzzy sets). In rough set theory any set of similar or indiscernible objects is called an elementary set and forms a basic granule of knowledge about the universe; on the other hand, the set of discernible objects are considered rough (imprecise or vague). Vague concepts cannot be characterized in terms of information about their elements; however they may be replaced by two precise concepts, respectively called the lower approximation and the upper approximation of the vague concept. The lower approximation consists of all objects that surely belong to the concept whereas the upper approximation contains all objects that possibly belong to the concept. The difference between both approximations is called the boundary region of the concept. The assignment of an object to a set is made through a membership function that as a probabilistic flavor. Once information is conveniently organized into information tables this technique is used to assess the degree of vagueness of the concepts, the interdependency of attributes and therefore the alternatives for reducing complexity in the table without reducing the information it provides. Information tables regarding cases and features can be interpreted as conditional decision rules of the form IF {feature x} is observed THEN {isanyobject}, and consequently they can be used as classifiers. An elementary but formal introduction to rough sets can be found in [42]. Applications of this technique to different problems, including those of signal processing [43], alongside with discussion of software tools implementing these kinds of formalisms, are presented in [44]. When applied to instrument classification [45] reports accuracy rates higher than 80% for classification of the same 4 instruments mentioned in the ANN s section. The main cost of using rough sets is however the need for quantization of features values, a non-trivial issue indeed, because in the previous study different results were obtained depending on the quantization method (see also [46] and [47]). On the other hand, when compared to neural networks or fuzzy sets rules, rough sets have several benefits: they are cheaper in terms of computational cost and the results are similar to those obtained with the other two techniques. Towards classification of sounds in more complex contexts Although we have found that there are several techniques and features which provide a high percent of success when classifying isolated sounds, it is not clear that they can be applied directly and successfully to the more complex task of segmenting monophonic phrases or complex mixtures. Additionally, many of them would not accomplish the requirements discussed in [14] for real-world sound-source recognition systems. Instead of assuming a preliminary source separation stage that facilitates the direct applicability of those algorithms, we are committed with an approach of signal understanding without separation [48].
7 This means that with relatively simple signal-processing and pattern-classification techniques we elaborate judgments about the musical qualities of a signal (hence, describing content). Provided that desideratum, we can enumerate some apparently useful strategies to complement the previously discussed methods: Content awareness (i.e. using metadata when available): the MPEG-7 standard provides descriptors that can help to partially delimitate the search space for instrument classification. For example, if we know in advance that the recording is a string quartet, or a heavy-metal song, several hypotheses regarding the sounds to be found can be used for guiding the process. Context awareness: contextual information can be conveyed not only from metadata, nor from models in a top-down way. It also can spread from local computations at the signal level by using descriptors derived from analysis of groups of frames. Note transition analysis, for example, may provide a suitable context [49]. Use of synchronicities and asynchronicities: co-modulations or temporal coherence of partials may be used for inferring different sources, as some CASA systems do [50;8]. Use of spatial cues: in stereophonic recordings we can find systematic instrument positioning that can be tracked for reducing the candidate classes. Use of partial or incomplete cues: contrasting with the problems of source separation or analysis for synthesis/transformation, our problem does not demand any complete characterization or separation of signals and, consequently, incomplete cues might be enough exploited. Use of neglected features: as for example articulations between notes, expressive features (e.g. vibrato, portamento) or what has been called specificities of instrument sounds [3]. Combining different subsystems: different procedures can make different estimations and errors. Therefore a wise combination may yield better results than figuring out what is the best or what is good in each one [51] [52]. Combinations can be done at different processing stages: at the feature computation (concatenating features), at the output of the classification procedures (combining hypothesis), or also in a serial layout where the output of one classification procedure is the input to another procedure (as Martin s MAP+Fisher projection exemplifies). Use of more powerful algorithms for representing sequences of states: Hidden Markov Models [53] are good candidates for representing long sequences of feature vectors that define an instrument sound, as [54] have demonstrated for generic sounds. Conclusions We have discussed the most commonly used techniques for instrument classification. Although they provide a decent starting point for the more realistic problem of detection and segmentation of musical instruments in real-world audio, conclusive statements after performance figures can be misleading because of inherent biases in each one of the algorithms. Enhancing or tuning them for the specificities of dealing with realistic musical signals seems a more important task than selecting the best existing algorithm. Consequently other complementary strategies should be addressed in order to achieve the kind of signal understanding we aim at. References [1] Grey, J. M., "Multidimensional perceptual scaling of musical timbres," Journal of the Acoustical Society of America, 61, pp , [2] Krumhansl, C. L., "Why is musical timbre so hard to understand?," in Nielzenand, S. and Olsson, O. (eds.) Structure and perception of electroacoustic sound and music Amsterdam: Elsevier, 1989, pp
8 [3] McAdams, S., Winsberg, S., de Soete, G., and Krimphoff, J., "Perceptual scaling of synthesized musical timbres: common dimensions, specificities, and latent subject classes," Psychological Research, 58, pp , [4] Lakatos, S., "A common perceptual space for harmonic and percussive timbres," Perception and Psychophysics, in press, [5] Peeters, G., McAdams, S., and Herrera, P. Instrument Sound Description in the context of MPEG-7. Proc. of the ICMC [6] ISO/MPEG-7. Overview of the MPEG-7 Standard Electronic document: [7] Green, P. D., Brown, G. J., Cooke, M. P., Crawford, M. D., and Simons, A. J. H., "Bridging the Gap between Signals and Symbols in Speech Recognition," in Ainsworth, W. A. (ed.) Advances in Speech, Hearing and Language Processing JAI Press, 1990, pp [8] Ellis, D. P. W., "Prediction-driven computational auditory scene analysis." Ph.D. thesis MIT. Cambridge, MA, [9] Bell, A. J. and Sejnowski, T. J., "An information maximisation approach to blind separation and blind deconvolution," Neural Computation, 7 (6), pp , [10] Varga, A. P. and Moore, R. K. Hidden Markov Model decomposition of speech and noise. Proc. of the ICASSP. pp , [11] Mitchell, T. M., Machine Learning Boston, MA: McGraw-Hill, [12] Kaminskyj, I. and Materka, A. Automatic source identification of monophonic musical instrument sounds. Proc. of the IEEE International Conference On Neural Networks. 1, , [13] Martin, K. D. and Kim, Y. E. Musical instrument identification: A pattern-recognition approach. Proc. of the 136th meeting of the Acoustical Society of America [14] Martin, K. D., "Sound-Source Recognition: A Theory and Computational Model." Ph.D. thesis, MIT. Cambridge, MA, [15] Eronen, A. and Klapuri, A. Musical instrument recognition using cepstral coefficients and temporal features. Proc. of the ICASSP [16] Fujinaga, I., Moore, S., and Sullivan, D. S. Implementation of exemplar-based learning model for music cognition. Proc. of the International Conference on Music Perception and Cognition , [17] Fujinaga, I. Machine recognition of timbre using steady-state tone of acoustical musical instruments. Proc. of the 1998 ICMC , [18] Fraser, A. and Fujinaga, I. Toward real-time recognition of acoustic musical instruments... Proc. of the ICMC , [19] Fujinaga, I. and MacMillan, K. Realtime recognition of orchestral instruments. Proc. of the ICMC [20] Brown, J. C., "Musical instrument identification using pattern recognition with cepstral coefficients as features," Journal of the Acoustical Society of America, 105 (3), pp , [21] Jensen, K. and Arnspang, J. Binary decission tree classification of musical sounds. Proc. of the 1999 ICMC [22] Jensen, K, "Timbre models of musical sounds." Ph.D. thesis University of Copenhaguen, [23] Quinlan, J. R., C4.5: Programs for Machine Learning San Mateo, CA: Morgan Kaufmann, [24] Wieczorkowska, A. Classification of musical instrument sounds using decision trees. Proc. of the 8th International Symposium on Sound Engineering and Mastering, ISSEM'99, pp , [25] Foote, J. T. A Similarity Measure for Automatic Audio Classification. Proc. of the AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora. Stanford, [26] Vapnik, V. Statistical Learning Theory. New York: Wiley [27] Marques, J., "An automatic annotation system for audio data containing music." BS and ME thesis. MIT. Cambridge, MA, [28] Cemgil, A. T. and Gürgen, F. Classification of Musical Instrument Sounds using Neural Networks. Proc. of SIU [29] Kohonen, T., Self-Organizing Maps Berlin: Springer-Verlag, [30] Feiten, B. and Günzel, S., "Automatic indexing of a sound database using self-organizing neural nets," Computer Music Journal, 18 (3), pp , [31] Cosi, P., De Poli, G., and Lauzzana, G., "Auditory Modelling and Self-Organizing Neural Networks for Timbre Classification," Journal of New Music Research, 23, pp , [32] Cosi, P., De Poli, G., and Parnadoni, P. Timbre characterization with Mel-Cepstrum and Neural Nets. Proc. of the 1994 ICMC, pp , 1994.
9 [33] Toiviainen, P., Tervaniemi, M., Louhivuori, J., Saher, M., Huotilainen, M., and Näätänen, R., "Timbre Similarity: Convergence of Neural, Behavioral, and Computational Approaches," Music Perception, 16 (2), pp , [34] Wessel, D., "Timbre space as a musical control structure," Computer Music Journal, 3 (2), pp , [35] Carpenter, G. A., Grossberg, S., and Reynolds, J. H., "ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-orgnising neural network," Neural Networks, 4, pp , [36] Fragoulis, D. K., Avaritsiotis, J. N., and Papaodysseus, C. N. Timbre recognition of single notes using an ARTMAP neural network. Proc. of the 6th IEEE International Conference on Electronics, Circuits and Systems. Paphos, Cyprus [37] Kostek, B., Soft computing in acoustics: applications of neural networks, fuzzy logic and rough sets to musical acoustics Heidelberg: Physica Verlag, [38] Kostek, B. and Krolikowski, R., "Application of artificial neural networks to the recognition of musical sounds," Archives of Acoustics, 22 (1), pp , [39] Kostek, B. and Czyzewski, A. An approach to the automatic classification of musical sounds. Proc. of the AES 108th convention. Paris [40] Dubnov, S., Tishby, N., and Cohen, D., "Polyspectra as Measures of Sound Texture and Timbre," Journal of New Music Research, vol. 26, no. 4, [41] Pawlak, Z., "Rough sets," Journal of Computer and Information Science, 11 (5), pp , [42] Pawlak, Z., "Rough set elements," in Polkowski, L. and Skowron, A. (eds.) Rough Sets in Knowledge Discovery Heidelberg: Physica-Verlag, [43] Czyzewski, A., "Soft processing of audio signals," in Polkowski, L. and Skowron, A. (eds.) Rough Sets in Knowledge Discovery Heidelberg: Physica Verlag, 1998, pp [44] Polkowski, L. and Skowron, A., Rough Sets in Knowledge Discovery Heidelberg: Physica-Verlag, [45] Kostek, B., "Soft computing-based recognition of musical sounds," in Polkowski, L. and Skowron, A. (eds.) Rough Sets in Knowledge Discovery Heidelberg: Physica-Verlag, [46] Kostek, B. and Wieczorkowska, A., "Parametric representation of musical sounds," Archives of Acoustics, 22 (1), pp. 3-26, [47] Wieczorkowska, A., "Rough sets as a tool for audio signal classification," in Ras, Z. W. and Skowron, A. (eds.) Foundations of Intelligent Systems: Proc. of the 11th International Symposium on Foundations of Intelligent Systems (ISMIS-99) Berlin: Springer-Verlag, 1999, pp [48] Scheirer, E. D., "Music-Listening Systems." Ph.D. thesis. MIT. Cambridge, MA [49] Kashino, K. and Murase, H. Music recognition using note transition context. Proc. of the 1998 IEEE ICASSP. Seattle [50] Cooke, M., Modelling auditory processing and organisation Cambridge: Cambridge University Press, [51] Elder IV, J. F. and Ridgeway, G. Combining estimators to improve performance Proc. of the 5th International Conference on Knowledge Discovery and Data Mining [52] Ellis, D. P. W. Improved recognition by combining different features and different systems. To appear in Proc. of the AVIOS-2000, San Jose, CA. May, [53] Rabiner, L. R. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. of the IEEE, 77, pp [54] Zhang, T. and Jay Kuo, C.-C. Heuristic approach for generic audio data segmentation and annotation. ACM Multimedia Conference, pp Orlando, FLA [55] Michie, D., Spiegelhalter, D. J., and Taylor, C. C., Machine Learning, Neural and Statistical Classification. Chichester: Ellis Horwood; 1994.
Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons
Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University
More informationMusic Information Retrieval with Temporal Features and Timbre
Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC
More informationA FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES
A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES Panayiotis Kokoras School of Music Studies Aristotle University of Thessaloniki email@panayiotiskokoras.com Abstract. This article proposes a theoretical
More informationMUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES
MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University
More informationClassification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors
Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Priyanka S. Jadhav M.E. (Computer Engineering) G. H. Raisoni College of Engg. & Mgmt. Wagholi, Pune, India E-mail:
More informationSupervised Learning in Genre Classification
Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;
More informationTopics in Computer Music Instrument Identification. Ioanna Karydi
Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches
More informationClassification of Timbre Similarity
Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common
More informationAutomatic Rhythmic Notation from Single Voice Audio Sources
Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung
More informationInstrument Recognition in Polyphonic Mixtures Using Spectral Envelopes
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional
More informationMultiple classifiers for different features in timbre estimation
Multiple classifiers for different features in timbre estimation Wenxin Jiang 1, Xin Zhang 3, Amanda Cohen 1, Zbigniew W. Ras 1,2 1 Computer Science Department, University of North Carolina, Charlotte,
More informationINTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION
INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for
More informationInternational Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL
More informationPOLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING
POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING Luis Gustavo Martins Telecommunications and Multimedia Unit INESC Porto Porto, Portugal lmartins@inescporto.pt Juan José Burred Communication
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,
More informationSinger Traits Identification using Deep Neural Network
Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic
More informationMusical instrument identification in continuous recordings
Musical instrument identification in continuous recordings Arie Livshin, Xavier Rodet To cite this version: Arie Livshin, Xavier Rodet. Musical instrument identification in continuous recordings. Digital
More informationRecognising Cello Performers using Timbre Models
Recognising Cello Performers using Timbre Models Chudy, Magdalena; Dixon, Simon For additional information about this publication click this link. http://qmro.qmul.ac.uk/jspui/handle/123456789/5013 Information
More informationMUSICAL NOTE AND INSTRUMENT CLASSIFICATION WITH LIKELIHOOD-FREQUENCY-TIME ANALYSIS AND SUPPORT VECTOR MACHINES
MUSICAL NOTE AND INSTRUMENT CLASSIFICATION WITH LIKELIHOOD-FREQUENCY-TIME ANALYSIS AND SUPPORT VECTOR MACHINES Mehmet Erdal Özbek 1, Claude Delpha 2, and Pierre Duhamel 2 1 Dept. of Electrical and Electronics
More informationNeural Network for Music Instrument Identi cation
Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute
More informationWE ADDRESS the development of a novel computational
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 663 Dynamic Spectral Envelope Modeling for Timbre Analysis of Musical Instrument Sounds Juan José Burred, Member,
More informationA QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM
A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr
More information2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t
MPEG-7 FOR CONTENT-BASED MUSIC PROCESSING Λ Emilia GÓMEZ, Fabien GOUYON, Perfecto HERRERA and Xavier AMATRIAIN Music Technology Group, Universitat Pompeu Fabra, Barcelona, SPAIN http://www.iua.upf.es/mtg
More informationA Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon
A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.
More informationMUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)
More informationChord Classification of an Audio Signal using Artificial Neural Network
Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationEnhancing Music Maps
Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing
More informationMIRAI: Multi-hierarchical, FS-tree based Music Information Retrieval System
MIRAI: Multi-hierarchical, FS-tree based Music Information Retrieval System Zbigniew W. Raś 1,2, Xin Zhang 1, and Rory Lewis 1 1 University of North Carolina, Dept. of Comp. Science, Charlotte, N.C. 28223,
More informationDrum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods
Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National
More informationTopic 10. Multi-pitch Analysis
Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds
More informationAutomatic Identification of Instrument Type in Music Signal using Wavelet and MFCC
Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Arijit Ghosal, Rudrasis Chakraborty, Bibhas Chandra Dhara +, and Sanjoy Kumar Saha! * CSE Dept., Institute of Technology
More informationHIT SONG SCIENCE IS NOT YET A SCIENCE
HIT SONG SCIENCE IS NOT YET A SCIENCE François Pachet Sony CSL pachet@csl.sony.fr Pierre Roy Sony CSL roy@csl.sony.fr ABSTRACT We describe a large-scale experiment aiming at validating the hypothesis that
More informationThe Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng
The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,
More informationDetecting Musical Key with Supervised Learning
Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different
More informationRecognising Cello Performers Using Timbre Models
Recognising Cello Performers Using Timbre Models Magdalena Chudy and Simon Dixon Abstract In this paper, we compare timbre features of various cello performers playing the same instrument in solo cello
More informationEE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function
EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)
More informationTOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC
TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu
More informationAutomatic Labelling of tabla signals
ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and
More informationComposer Identification of Digital Audio Modeling Content Specific Features Through Markov Models
Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has
More informationMusic Emotion Recognition. Jaesung Lee. Chung-Ang University
Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or
More informationDAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval
DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca
More informationApplication Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio
Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Jana Eggink and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 11
More informationPULSE-DEPENDENT ANALYSES OF PERCUSSIVE MUSIC
PULSE-DEPENDENT ANALYSES OF PERCUSSIVE MUSIC FABIEN GOUYON, PERFECTO HERRERA, PEDRO CANO IUA-Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain fgouyon@iua.upf.es, pherrera@iua.upf.es,
More informationImproving Frame Based Automatic Laughter Detection
Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for
More informationSYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS
Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL
More informationMUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES
MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate
More informationA CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION
A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION Graham E. Poliner and Daniel P.W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University, New York NY 127 USA {graham,dpwe}@ee.columbia.edu
More informationHidden Markov Model based dance recognition
Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,
More informationMusic Mood. Sheng Xu, Albert Peyton, Ryan Bhular
Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect
More informationCS229 Project Report Polyphonic Piano Transcription
CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project
More informationExploring Relationships between Audio Features and Emotion in Music
Exploring Relationships between Audio Features and Emotion in Music Cyril Laurier, *1 Olivier Lartillot, #2 Tuomas Eerola #3, Petri Toiviainen #4 * Music Technology Group, Universitat Pompeu Fabra, Barcelona,
More informationInteractive Classification of Sound Objects for Polyphonic Electro-Acoustic Music Annotation
for Polyphonic Electro-Acoustic Music Annotation Sebastien Gulluni 2, Slim Essid 2, Olivier Buisson, and Gaël Richard 2 Institut National de l Audiovisuel, 4 avenue de l Europe 94366 Bry-sur-marne Cedex,
More informationAutomatic Construction of Synthetic Musical Instruments and Performers
Ph.D. Thesis Proposal Automatic Construction of Synthetic Musical Instruments and Performers Ning Hu Carnegie Mellon University Thesis Committee Roger B. Dannenberg, Chair Michael S. Lewicki Richard M.
More informationLEARNING AUDIO SHEET MUSIC CORRESPONDENCES. Matthias Dorfer Department of Computational Perception
LEARNING AUDIO SHEET MUSIC CORRESPONDENCES Matthias Dorfer Department of Computational Perception Short Introduction... I am a PhD Candidate in the Department of Computational Perception at Johannes Kepler
More informationSkip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video
Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American
More informationToward Automatic Music Audio Summary Generation from Signal Analysis
Toward Automatic Music Audio Summary Generation from Signal Analysis Geoffroy Peeters IRCAM Analysis/Synthesis Team 1, pl. Igor Stravinsky F-7 Paris - France peeters@ircam.fr ABSTRACT This paper deals
More informationModeling memory for melodies
Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University
More informationAlgorithmic Music Composition
Algorithmic Music Composition MUS-15 Jan Dreier July 6, 2015 1 Introduction The goal of algorithmic music composition is to automate the process of creating music. One wants to create pleasant music without
More informationLEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly
LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS by Patrick Joseph Donnelly A dissertation submitted in partial fulfillment of the requirements for the degree
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Musical Acoustics Session 3pMU: Perception and Orchestration Practice
More informationAUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION
AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate
More informationSinger Recognition and Modeling Singer Error
Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing
More informationExperiments on musical instrument separation using multiplecause
Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk
More informationLOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU
The 21 st International Congress on Sound and Vibration 13-17 July, 2014, Beijing/China LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU Siyu Zhu, Peifeng Ji,
More informationWHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?
WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.
More informationTYING SEMANTIC LABELS TO COMPUTATIONAL DESCRIPTORS OF SIMILAR TIMBRES
TYING SEMANTIC LABELS TO COMPUTATIONAL DESCRIPTORS OF SIMILAR TIMBRES Rosemary A. Fitzgerald Department of Music Lancaster University, Lancaster, LA1 4YW, UK r.a.fitzgerald@lancaster.ac.uk ABSTRACT This
More informationA Survey on: Sound Source Separation Methods
Volume 3, Issue 11, November-2016, pp. 580-584 ISSN (O): 2349-7084 International Journal of Computer Engineering In Research Trends Available online at: www.ijcert.org A Survey on: Sound Source Separation
More informationThe song remains the same: identifying versions of the same piece using tonal descriptors
The song remains the same: identifying versions of the same piece using tonal descriptors Emilia Gómez Music Technology Group, Universitat Pompeu Fabra Ocata, 83, Barcelona emilia.gomez@iua.upf.edu Abstract
More informationSubjective Similarity of Music: Data Collection for Individuality Analysis
Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp
More informationProposal for Application of Speech Techniques to Music Analysis
Proposal for Application of Speech Techniques to Music Analysis 1. Research on Speech and Music Lin Zhong Dept. of Electronic Engineering Tsinghua University 1. Goal Speech research from the very beginning
More informationMelody Retrieval On The Web
Melody Retrieval On The Web Thesis proposal for the degree of Master of Science at the Massachusetts Institute of Technology M.I.T Media Laboratory Fall 2000 Thesis supervisor: Barry Vercoe Professor,
More informationTHE importance of music content analysis for musical
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With
More informationMusical Instrument Identification based on F0-dependent Multivariate Normal Distribution
Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno* *Grad. Sch l of Informatics, Kyoto Univ. **PRESTO JST / Nat
More informationA System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models
A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA
More informationReducing False Positives in Video Shot Detection
Reducing False Positives in Video Shot Detection Nithya Manickam Computer Science & Engineering Department Indian Institute of Technology, Bombay Powai, India - 400076 mnitya@cse.iitb.ac.in Sharat Chandran
More informationMPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND
MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND Aleksander Kaminiarz, Ewa Łukasik Institute of Computing Science, Poznań University of Technology. Piotrowo 2, 60-965 Poznań, Poland e-mail: Ewa.Lukasik@cs.put.poznan.pl
More informationTowards Music Performer Recognition Using Timbre Features
Proceedings of the 3 rd International Conference of Students of Systematic Musicology, Cambridge, UK, September3-5, 00 Towards Music Performer Recognition Using Timbre Features Magdalena Chudy Centre for
More informationSIGNAL + CONTEXT = BETTER CLASSIFICATION
SIGNAL + CONTEXT = BETTER CLASSIFICATION Jean-Julien Aucouturier Grad. School of Arts and Sciences The University of Tokyo, Japan François Pachet, Pierre Roy, Anthony Beurivé SONY CSL Paris 6 rue Amyot,
More informationMusic Genre Classification and Variance Comparison on Number of Genres
Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques
More informationComputational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)
Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,
More informationAutomatic Extraction of Popular Music Ringtones Based on Music Structure Analysis
Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of
More informationAnalysis, Synthesis, and Perception of Musical Sounds
Analysis, Synthesis, and Perception of Musical Sounds The Sound of Music James W. Beauchamp Editor University of Illinois at Urbana, USA 4y Springer Contents Preface Acknowledgments vii xv 1. Analysis
More informationWeek 14 Music Understanding and Classification
Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n
More informationAcoustic Scene Classification
Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of
More informationMusic Mood Classification - an SVM based approach. Sebastian Napiorkowski
Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.
More informationAutomatic characterization of ornamentation from bassoon recordings for expressive synthesis
Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Montserrat Puiggròs, Emilia Gómez, Rafael Ramírez, Xavier Serra Music technology Group Universitat Pompeu Fabra
More informationON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt
ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach
More informationDeep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj
Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be
More informationIMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS
1th International Society for Music Information Retrieval Conference (ISMIR 29) IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS Matthias Gruhne Bach Technology AS ghe@bachtechnology.com
More informationAn Accurate Timbre Model for Musical Instruments and its Application to Classification
An Accurate Timbre Model for Musical Instruments and its Application to Classification Juan José Burred 1,AxelRöbel 2, and Xavier Rodet 2 1 Communication Systems Group, Technical University of Berlin,
More informationComposer Style Attribution
Composer Style Attribution Jacqueline Speiser, Vishesh Gupta Introduction Josquin des Prez (1450 1521) is one of the most famous composers of the Renaissance. Despite his fame, there exists a significant
More informationMusic Source Separation
Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or
More informationAbout Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance
Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About
More informationAn Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions
1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,
More informationA combination of approaches to solve Task How Many Ratings? of the KDD CUP 2007
A combination of approaches to solve Tas How Many Ratings? of the KDD CUP 2007 Jorge Sueiras C/ Arequipa +34 9 382 45 54 orge.sueiras@neo-metrics.com Daniel Vélez C/ Arequipa +34 9 382 45 54 José Luis
More informationInstrument identification in solo and ensemble music using independent subspace analysis
Instrument identification in solo and ensemble music using independent subspace analysis Emmanuel Vincent, Xavier Rodet To cite this version: Emmanuel Vincent, Xavier Rodet. Instrument identification in
More informationAPPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC
APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,
More informationTHE POTENTIAL FOR AUTOMATIC ASSESSMENT OF TRUMPET TONE QUALITY
12th International Society for Music Information Retrieval Conference (ISMIR 2011) THE POTENTIAL FOR AUTOMATIC ASSESSMENT OF TRUMPET TONE QUALITY Trevor Knight Finn Upham Ichiro Fujinaga Centre for Interdisciplinary
More information