Computational Rhythm Similarity Development and Verification Through Deep Networks and Musically Motivated Analysis

Size: px
Start display at page:

Download "Computational Rhythm Similarity Development and Verification Through Deep Networks and Musically Motivated Analysis"

Transcription

1 NEW YORK UNIVERSITY Computational Rhythm Similarity Development and Verification Through Deep Networks and Musically Motivated Analysis by Tlacael Esparza Submitted in partial fulfillment of the requirements for the Master of Music in Music Technology in the Department of Music and Performing Arts Professions in The Steinhardt School New York University Advisor: Juan Bello [DATE:2013/12/06] January 2014

2 NEW YORK UNIVERSITY Abstract Steinhardt Master of Music by Tlacael Esparza In developing computational measures of rhythmic similarity in music, validation methods typically rely on proxy classification tasks on common datasets, equating rhythm sim-ilarity to genre. In this paper, a novel state-of-the-art system for rhythm similarity is proposed that leverages deep network architectures for feature learning and classification, using this standard approach of genre classification on a well-known dataset for validation. In addressing this method of validation, an extensive cross-disciplinary analysis of the performance of this system is undertaken. In addition to analyses through MIR, machine learning and statistical methods, a detailed study of both the results and the dataset are performed from a musicological perspective, delving into the musical, historical and cultural specifics that impact the system. Through this study, insights are gained in further gauging the abilities of this measure of rhythm similarity beyond classification accuracy as well as a deeper understanding of this system design and validation approach as a musically meaningful exercise....

3 Acknowledgements I would like to thank Professor Juan Bello for his guidance, encouragement and dedication to my education, and Eric Humphrey without whom I would have been lost in a deep network somewhere. Many people have helped me along the way with this work and I am very grateful for their time and generosity. These include: Uri Nieto, Mary Farbood, Adriano Santos and Professor Larry Crook, as well as Carlos Silla and Alessandro Koerich for the Latin Music Dataset and their insights into the data collection process. And most importantly, thanks to my family and my fiancée, Ashley Reeb, for their unwavering emotional, spiritual, intellectual and financial support.... ii

4 Contents Abstract i Acknowledgements ii List of Figures List of Tables iv v 1 Introduction 1 2 Explication and Literature Review Computational Music Similarity Measures Rhythm Similarity Onset Patterns Machine Learning Deep Networks for Feature Learning and Classification Approach Onset Patterns Implementation Deep Network Implementation Applying the Deep Network Analytic Approach System Configuration and Results Dataset Methodology OP Parameter Tuning Periodicity Resolution Frequency Resolution Deep Network Parameterization Layer Width Network Depth Optimal System Configuration OP Rhythm Similarity Analysis Tempo Dependence Fine Grain Rhythmic Similarity iii

5 Contents iv 6 Dataset Observations Ground Truths Artist/Chronological Distribution Brazilian Skew System Verification Issues Rhythm-Genre Connection Inter-genre Influence Rhythm as Genre Signifier Conclusions 40 Bibliography 42

6 List of Figures 2.1 Extraction of Onset Patterns (OP) from the audio signal E ect of P on classification. The highest result is highlighted in blue, while significantly di erent results are in red E ect of F on classification. The highest result is highlighted in blue, while significantly di erent results are in red Mean comparison of ANOVA tests on network layer complexity in a 2- layer architecture show significantly lower results for small M Top: Progression from input to output shows increasingly compact genre representation from input to output. Bottom: Progression from input to output shows increasingly distant classes from input to output Gaussian modeled tempo distributions by genre in the LMD Left: OP of a modern Sertaneja track. Right: OP of a Tango recording from Top: Geographical spread of Bolero vs. Brazilian genres. Bottom: Detail of geographical spread of Brazilian genres v

7 List of Tables 2.1 Summary of main approaches in the literature for computational rhythm similarity ANOVA results for classification scores with varying P values shows that periodicity resolution is a significant factor Classification accuracies for di erent features on the LMD Classification accuracies by genre, ordered from highest classification score to lowest, show Brazilian genres generally performing worse than the rest Confusions matrix shows classification a nities between Sertaneja and several other genres Comparison of di erent classifiers on OP data. The proposed system outperforms all others by a margin of 2.23% Results of binary logistic regression with classification success as dependents variable, BPM and Density as input show density is significant while BPM is not Hosmer & Lemeshow test shows BPM and density data to be poor predictors for classification success Feel breakdown by genre showing percentage of tracks in each genre that are swung Comparison of actual genre feel versus predicted genre feel for LMD classification results LMD infometrics vi

8 Chapter 1 Introduction A fundamental goal in the field of music information retrieval (MIR) is to extract musically meaningful information from digitized audio signals through computational methods. This, in its vagueness and breadth, describes most MIR tasks. In practice, and with the field still in its relative infancy, these tasks have often simplified to extracting musical feature representations that highlight basic characteristics like pitch, harmony, melody, tempo, timbre, structure and rhythm, among others. With the assumption that complex musical features such as mood or genre are signified by sets of fundamental musical attributes, it is hoped that these more abstract characteristics can be identified through combinations of these methods [1, 2]. There are many motivations for as well as current successful applications of this work. Pitch and beat tracking algorithms have found widespread use in digital audio workstations such as Pro Tools and Abelton Live, enabling pitch and beat correction, tempo estimation, and time-stretching of recorded audio. These functionalities have been used to great e ect in current popular music and are often audibly detectable, as with the music of the artist T-Pain, known for heavy use of auto-tuning software. Beyond music production, these computational methods can be leveraged to analyze and annotate the ever-growing and intractably large collections of music that the digital age has enabled. With an approximated 75,000 o cial album releases in 2010 alone as a indication of scale [3], and the primary means of maintaining and consuming this music through digital transmission, a computational approach to annotating and cataloging these collections is highly desirable. Indeed, new digital-era companies that serve streams of music to users on demand, such as Spotify, Soundclound and Pandora, have begun to employ many MIR methods (and researchers) for genre detection, playlist generation and music recommendation, among other services. 1

9 Chapter 1. Introduction 2 A main objective of this thesis aims to examine and further develop computational methods of measuring rhythm similarity in music signals. The importance of rhythm to music almost needs no mention. Under composer Edgar Varése s generous definition of music as organized sound, rhythm remains fundamental in that time is one of the few dimensions along which sound can be organized. And so, with the goal of the MIR community to fully parse musical content through computational means, contending with rhythm is an important step in this endeavor. Combining previous research on rhythm from the MIR literature with advances in machine learning, this work presents a state-of-the-art system for measuring rhythm similarity. In the hope of anchoring this abstracted computational process to its stated goal of extracting musically, and specifically, rhythmically meaningful information, this work takes a concerted e ort, beyond what is common in the literature, to analyze not only the results, but also the dataset used and the system s design from a multi-disciplinary perspective. Using a standard MIR verification scheme on a well-known dataset, the Latin Music Dataset [4], through statistical analyses on dataset metadata, analysis of the dataset from a musical, cultural and historical perspective and scrutiny of the basic assumptions built into the design, it is hoped that the system s musical relevance can be understood with greater clarity beyond the common classification score measuring stick. Further, with a great deal of personal interest and domain knowledge in the subject (rhythm) and the specific area of application for this work (latin rhythms/music), it is hoped that this research approach will provide a useful and elucidating look into the analysis of computational rhythm similarity measures, and also act as an encouragement to take on this level of scrutiny in developing computational methods for music applications in general.

10 Chapter 2 Explication and Literature Review Much of MIR research has largely followed a standard and persistent design model in developing novel methods for parsing music signals. This model is comprised of two steps: one, a hand-crafted feature extraction stage that aims to single out a given musical characteristic from an input signal; and two, a semantic interpretation or classification stage applying some function that allows the mapping of this feature to a symbolic representation. This chapter takes a survey of previous work in developing both of these system components as they relate to the current task of measuring rhythm similarity. Sections 2.1 and 2.2 summarizes standard approaches to feature extraction and reviews the various attempts at characterizing rhythm through feature design, highlighting the development of Onset Patterns which serve as a jumping o point for further research. Section 2.3 reviews improvements to feature extraction methods through the use of sophisticated machine learning algorithms, pointing to a blurring in the distinction between feature extraction and classification and setting the direction for this research. 2.1 Computational Music Similarity Measures Though some feature extraction methods produce easily interpretable, musically relevant representations of the signal directly, certain feature representations are imbued with musical meaning only as measures of similarity. For instance, the output of a tempo detection algorithm which looks for strong sub-sonic frequencies can be interpreted easily by looking for a global maximum in the representation, revealing a beats-per-minute value, a standard unit of tempo. Conversely, Mel-Frequency Cepstral Coe cients (MFCCs), widely used as a measure of timbre, are not musically interpretable on their own, but, 3

11 Chapter 2. Explication and Literature Review 4 paired with a distance metric, can be used to identify sounds based on distance to a known example, a common application of which is instrument identification [5, 6]. In this paradigm of measuring similarity, musical facets can be seen as existing in some multi-dimensional space where similar representations are grouped closely together, and the feature extraction algorithm is a mathematical projection of a given signal into one of these spaces. Through this approach, a posteriori-defined properties such as rhythm and structure, esoteric qualities such as timbre and complex characteristics such as mood can be inferred based on their distance to labelled examples in their respective feature space. In this way, classification supplies semantic meaning and a perceptually meaningful framework for analyzing these more complicated features. As an example of this, the previously mentioned quality of sound referred to as timbre is not easily defined, di cult to conceptualize and its manifestations di cult to describe with language. However, agreeing that timbre is the feature that distinguishes the sound of one instrument from another, we can discuss timbre similarity through the task of matching multiple recordings of the same instrument, defining it in finite terms (i.e. the timbre of a flute vs the timbre of a horn). One of the major obstacles to this approach is the necessity of labeled datasets. The development, verification and interpretation of these algorithms relies on classification tasks on pre-labeled examples; without an example of flute timbre to match with, an unlabeled signal cannot be identified with this characteristic. Ideally, when developing new similarity features, the verification dataset suits the task well by representing the desired musical feature homogeneously within a given class, but datasets with feature similarity-based ground-truths can be expensive and time-restrictive to produce. To address this, the MIR community actively compiles and shares labeled datasets for these purposes; examples of widely used datasets include labeled audio of monophonic instruments (McGill University Master Samples), audio accompanied by various descriptive tags (Magnatagatune) and many datasets divided along genre membership (LMD, Ballroom Dance, Turkish Music). But in practice, this has often lead to the use of datasets not created specifically for the given similarity measure of concern, employing a proxy identification with some other more easily identifiable characteristic. This includes the very common use of genre as a proxy for texture and rhythm similarity (which this thesis research employs knowingly) [7 12], and cover song identification for harmonic and melodic similarity [13 15]. Implicit in this approach is the assumption that ground-truths in these datasets correlate strongly enough with the musical characteristic being measured to provide meaningful classification results and system verification.

12 Chapter 2. Explication and Literature Review Rhythm Similarity Rhythm is a complex and wide musical concept with varying definitions in di erent contexts. Though rhythm exists in music on various time-scales and can describe anything from the timing of a melody to textural shifts and large scale events, in this paper (and in the MIR literature on the subject), rhythm is taken to refer to regularly repeating sound events in time on the musical measure-level (approximately 2-8 seconds); that is, rhythm as those looping musical phrases that a percussionist or a bass player in a dance band might play. In the MIR literature, analyzing rhythmic similarity is distinct from rhythm description or transcription tasks. Where the latter seeks to transform a musical signal into symbolic annotations or describe it directly in some manner, the former is concerned only with isolating rhythm as invariant across di erent instances, often using highly abstracted representations. Although [16] provides a framework for understanding rhythm similarity with symbolic sequences, for the rapidly growing body of recorded audio this kind of analysis is not applicable for several reasons: the vast majority of audio recordings typically do not have this level of annotation; providing this information by hand is time restrictive; and computational methods of annotation remain ine ective [17]. Hence, signal-based methods for rhythm analysis are highly desirable. From a conceptual perspective, isolating this level of rhythm as an invariance requires removing pitch, tempo and timbre dependence so that a rhythm played on two di erent instruments, using di erent pitches and at di erent speeds will be recognized as the same. However, previous approaches tailor this list according to the intended application and sometimes include additional dependencies to be removed: phase, referring to the position of the start of a repeating rhythm; and temporal variance, referring to the evolution of a rhythm over longer time frames. Removing phase and temporal variance is a practical consideration specific to signal processing concerns; though a human can often easily recognize the beginning of a rhythmic pattern based on larger musical context, recognizing this computationally has been shown to be problematic [18] and when analyzing a signal, there is no guarantee that the beginning of the signal will correspond to the beginning of a repeating rhythm. Similarly, for track-wise classification, temporal invariance works towards minimizing the e ects of portions of audio where there is no discernible rhythm or where changes in rhythm are not representative of the track on the whole. Aside from a handful of intuitively motivated rhythm similarity systems that extract unit rhythm representations and preserve phase by employing often complicated heuristics to deduce the beginning of a phrase [19 21], most designs remove phase and take a

13 Chapter 2. Explication and Literature Review 6 more abstracted approach. Though di ering in important ways, they typically follow a common script: 1) calculate a novelty function from the signal, removing pitch-level frequencies, and highlighting rhythmic content; 2) produce a periodicity and/or rhythmic decomposition of this novelty function by analyzing consecutive rhythm phrase-length windows (typically 4 to 12 seconds), capturing local rhythm content on this scale; 3) transform this local representation by warping, shifting, resampling or normalizing to remove tempo-dependence; 4) aggregate local representations over time to produce a track-wise rhythm representation, removing temporal dependence. Table 2.1 shows a summary of these four steps for each of the main approaches in the literature. In this table, E ected Dimension refers to the musical dimension that each stage acts to either preserve or remove. Though all of these methods remove pitch content in the novelty function calculation, the Scale-Transform implemented in [11, 22, 23] and the Fluctuation and Onset Patterns implemented in [10, 11, 24] do preserve some level of timbre through multi-band novelty function representations. Most approaches produce local rhythm representations by using the Auto-Correlation Function (ACF) or Discrete Fourier Transform (DFT). [12] notes that these functions are beneficial for their preservation of the sequential order of rhythmic events, but they also remove phase as a periodicity representation, where only rhythm-level frequencies are coded. Rhythm Patterns [9, 25] diverge from this approach by including, in addition to periodicity analysis, Inter-Onset Intervals (IOI) which encodes the spaces between onsets in the novelty function, and Rhythm Patterns which are bar-length representations of the novelty function. This is a robust approach but relies on unreliable heuristics for extracting the downbeat used to determine the Rhythm pattern. All of the approaches make some e ort to remove temporal variance through temporal aggregation over all frames. Approach Novelty Local Rhythm Rhythm Scaling/Morphing Aggregation E ected Dimension Pitch/Timbre Phase Tempo Temporal Beat histogram [26 29] single ACF log-lag + shift detection, sub-sampling histogram Rhythm patterns [9, 25] single rhythm patterns, ACF, IOI bar-length normalization k-means, histogram Hybrid ACF/DFT [12, 30] single DFT, ACF and hybrids resampling with local tempo mean Scale transform [11, 22, 23] single, multiband ACF, DFT scale transform mean Fluctuation/Onset patterns [10, 11, 24] multiband DFT log-frequency + subsampling mean Table 2.1: Summary of main approaches in the literature for computational rhythm similarity. The biggest divergences in these designs can be seen in the various methods for removing tempo-sensitivity from the representation. Noting that relative rhythmic structure can be compared more easily as a shift on a log-scale versus a stretch on a linear scale, a log-lag mapping in the Beat histogram [28, 29] or a log-frequency mapping in the Onset pattern [10] allows for reduced sensitivity to tempo changes, where only large tempo di erences are noticeable. In [10, 29], the e ect of tempo is further reduced by sub-sampling in the log-lag/frequency domain to produce a coarser representation. [28]

14 Chapter 2. Explication and Literature Review 7 employs a shift in the log-lag domain to obtain a fully tempo-insensitive representation, but this relies on determining the proper shift value which is prone to errors. Subject to similar problems are the methods employed in calculating Hybrid [12] and, as mentioned before, Rhythm Pattern [9] which rely on determining tempo and bar boundaries for tempo normalization and bar-length pattern identification. The octave errors common to tempo estimation algorithms are problematic here, leading to inconsistencies in rhythm representations for these methods. [22, 23] o ers a robust, fully tempo-invariant approach that takes the scale transform of the ACF, resulting in a scale-invariant version of the already shift invariant ACF, obviating the need for determining a shift amount to correct for the shift introduced by log-lag mapping. Though the Beat Histogram, and Hybrid ACF/DFT, if applied successfully, do result in a fully pitch, tempo, timbre, and phase invariant rhythm representation, these are less useful when tasked with measuring rhythm similarity in the context of general similarity in multi-instrumental recorded music. Indeed, with most of these methods performing verifications through genre identification on standard dance-based datasets, better classification success has been obtained with the Onset Pattern [10, 11] and the Scale Transform [22, 23], which both preserve some level of timbre dependence through a multi-band representation. This makes sense when the question is not are these rhythms the same? but rather do these two tracks sound similar from a rhythmic perspective?, where the listener looks not only for similar rhythms but for similar orchestrations of those rhythms. While the former might be more conceptually pure with respect to rhythm similarity, the latter is more amenable to a genre classification task and as a tool in measuring general music similarity. It merits repeating here that nearly all of the rhythm similarity studies mentioned above employ genre identification as a verification method. Recalling the common use of already available datasets in lieu of ones tailored for the task - in this case, a dataset labeled according to a specifically defined understanding of rhythm similarity - this has been a common and generally accepted practice in rhythm similarity research. In using dance-based datasets (LMD [11], Ballroom Dance [9 12, 25], Turkish Music [22, 23]), the underlying assumption behind this practice is not only that a rhythm can be reliably associated with a specific genre, but also that a given genre has a representative rhythm, justifying a bijective mapping from one to the other Onset Patterns Taking the perspective that a timbre-sensitive approach to rhythm similarity is desirable for application to multi-instrumental music signals, and noting the importance of

15 Chapter 2. Explication and Literature Review 8 reducing reliance on error-prone heuristics in the design, the Onset Pattern and the Scale Transform stand out as promising approaches. The primary di erence between these two lies in their approach to tempo-invariance, where the Scale Transform achieves full tempo invariance and Onset Pattern shows invariance only for local tempo changes. As [11] e ectively shows, tempo can be an important and identifying characteristic for certain genres. Although the motivation here is not genre identification, this suggests the idea that perhaps tempo is also important for the perception of rhythm similarity. If two songs have the same rhythm but have very di erent tempos to the point that they produce a di erent e ect on the ear, this becomes a characteristic worth tracking. With this in mind, the Onset Pattern, which encodes only relatively large di erences in tempo, is especially promising for further development as a general measure of rhythm similarity in music. Computation of Onset Patterns (OP), as first described in [10] and refined in [11], are relatively straight forward to calculate and follow the signal pipeline mentioned above. As illustrated in Figure 2.1: 1) the signal is transformed to the time-frequency domain, processed to produce a novelty function through spectral-flux, mean removal and half-wave rectification, and sub-sampled to produce log-spaced frequency sub-bands; 2) log 2 -frequency DFTs are applied to these sub-bands over 6-8 second windows to produce a periodicity representation; 3) each frame is subsampled in the frequency and periodicity dimension to generalize the representation; and 4) frames are aggregated to produce a track-level representation. However, not detailed in these steps is the ordering of pooling stages, important to [10] s design, that act to summarize multi-band information into a smaller representation. In particular, pooling occurs in the frequency dimension before and after calculating periodicity. Also left out is a normalization step to correct for artifacts from the various log-compression or pooling steps. However, justifications for these design choices as well as the implementation of this normalization step is left unclear in the original paper. Figure 2.1: Extraction of Onset Patterns (OP) from the audio signal. [11] refines this process by systematically testing di erent designs and parameters. Of particular note in its findings is the importance of window size in the periodicity calculation and the negligible e ect of specific ordering of pooling steps. With an 8-second

16 Chapter 2. Explication and Literature Review 9 long window (versus 6 seconds in [10]), a single pooling stage can be applied at the end with no e ect to overall e cacy. Through this exhaustive search, [11] was able to improve OP performance beyond the original design. However, these results are based on necessarily limited parameter testing, constrained by time and feasibility and largely reliant on ignoring possible e ects of interaction between parameter choices, highlighting the di culties in optimizing feature extractions. 2.3 Machine Learning Until recently, MIR research has taken the approach of designing algorithms to extract some explicit musical feature, using simple data models and distance measures for verification against ground truths (e.g. [10, 11] s use of a K Nearest-Neighbor models with a Euclidean distance on OP features). However, for more complex musical characteristics, some in the field are turning their focus away from feature design to more sophisticated classification models and machine learning algorithms such as support vector machines [31 33], multi-layer perceptrons [34 36], and more recently deep-network architectures [37, 38]. With the standardization of many feature designs such as chroma, MFCCs, among many others, these more advanced machine learning methods have been used to squeeze performance from these features or to extract more complex characteristics from sets of features. In this line of thought, rather than rely on some specific feature extraction method, the task is couched in terms of a data classification problem which allows for leveraging learning algorithms to extract the relevant information based on a desired outcome Deep Networks for Feature Learning and Classification [39] advocates giving learning algorithms, in particular deep network architectures, a more fundamental role in system development; with a su ciently sophisticated learning algorithm, an optimally designed feature can be automatically extracted from a minimally processed input signal. This has the potential to solve several problems that have plagued MIR research for over a decade. Besides obviating the need to spend time rigorously testing algorithms in search of optimal designs and parameters, more importantly, it has the potential to capture musical characteristics that would otherwise be too complex or abstruse to formulate within a feature extraction signal chain. Hand-crafted algorithms are necessarily limited by our own perceptual and technical abilities, and the approach that relies on these alone to explore the function space of

17 Chapter 2. Explication and Literature Review 10 signal-to-feature mappings limits the range of possible solutions. As initially demonstrated in [40] for music information retrieval, deep network architectures can be used to this end for their ability to model high-order functions through system depth. By cascading multiple a ne transformation layers of simpler nonlinear functions they allow for a system complexity su cient to model abstract musical characteristics. As [39] argues, using deep architectures to learn features for MIR follows naturally from the observation that many successful designed features in the literature can be described in terms of deep architectures themselves, combining multiple steps of a ne transformations, non-linearities and pooling. Taking the now standard calculation of MFCCs as an example, steps include: Mel-scale filterbank transform and discrete cosine projection (a ne transformation); and log-scaling and complex modulus (nonlinearity). Hence, from this perspective the primary di erences between di erent feature designs is the choice of parameters. Further, given that these parameters can be optimized for a given task with deep networks, not only is it possible to learn better designs for features such as MFCCs, but this points to the prospect of learning better features altogether that are unconstrained by the specifics of implementation. In the two step paradigm described above, here the distinction between feature extraction and classification becomes obscured where step one is reduced to preparing the data for input to step two, a deep network where each layer is a progressively more refined feature representation and the final output layer performs classification. Deep architectures have found strong use in problems of feature learning for machine vision [41 44], but there has been relatively little research into this approach within the MIR community. Although SVMs as well as other more sophisticated learning algorithms, as mentioned above, have been used to improve classification rates for designed features, the e orts to learn the features themselves have been few. The initial successful uses of deep networks for music retrieval tasks in [40] and [45] show that learned features outperform MFCCs for genre classification and sophisticated temporal pooling methods can be learned to incorporate multi-scale information for better results. Further use of deep networks in [38] shows that Convolutional Neural Nets, a specialized ANN deep network, can be successfully used for the task of chord recognition by extracting chord representations directly from several seconds of tiled pitch spectra. The positive results these approaches achieve are encouraging and justify further research into deep networks for feature design tasks such as rhythm similarity. It is important to note that in these supervised learning schemes, the data used in training and classification plays a more fundamentally important role in feature design. With hand-crafted features, designs are based on some idealized concept of a given musical feature i.e. tempo, timbre, pitch, and classification tasks serve merely as validation of

18 Chapter 2. Explication and Literature Review 11 the design. However, if the feature itself is learned in the process of supervised training of a classification model, it is necessarily shaped by the relationship between class labels and signal attributes in the dataset used for training. This is both a positive characteristic of this approach since, as mentioned, it unhinges the perceived musical characteristic from a pre-determined algorithm, but it requires care and scrutiny when creating or using pre-existing datasets as is a common practice. Although, research in unsupervised deep learning networks shows promise in reducing the reliance on large datasets [46], this work only considers fully-supervised methods.

19 Chapter 3 Approach Based on the observations discussed in the previous chapter, this chapter presents a novel variation of the onset pattern approach. By treating the pooling and normalization stages of feature extraction as layers of a deep learning network, these stages can be optimized to the task of genre classification. In this way, the post processing and pooling steps that are infeasible to optimize manually can be learned as an extension of the Onset Pattern feature in this deep architecture context. Once trained, this transformation is applied independently to all track-wise onset patterns and the outputs are averaged over time, yielding a summary representation for an entire track. 3.1 Onset Patterns Implementation OP calculation here generally follows the processes outlined in [10], [11], but for this application, the calculation is simplified by removing several post processing steps. Operating on mono recordings sampled at 22050Hz, log 2 -frequency DFTs are taken over 1024-sample windows with a hop size of 256 samples. Frequencies span six octaves beginning at 150Hz. The frequency resolution of this transform is kept variable to test optimal resolution levels in later experiments. Multi-band novelty functions are generated by computing spectral flux, removing the mean and half-wave rectifying the result. From here, eight-second long windows of these novelty functions are analyzed at 0.5 seconds intervals to extract a periodicity-spectrum by applying another log 2 -DFT spanning five octaves beginning at 0.5Hz. This corresponds to a Beat-Per-Minute (BPM) range of 30 to 960BPM. This is referred to here as the periodicity range. As with the log 2 -DFT used in the frequency multi-band calculation, periodicity resolution is left as a variable. This gives a frame-matrix with dimensions (F, P) wheref is the number of frequency bins and P is the number of periodicity bins. 12

20 Chapter 3. Approach Deep Network Implementation For feature learning and classification, this research makes heavy use of Eric Humphrey s in-development deep learning network Python libraries, informally presented in [47]. Formally, deep networks transform an input Z 1 into an output Z L through composition with nonlinear functions f l ( l )wherel 2 L, indicating total layer depth. For each layer, Z l 1 is the input to function f l with parameters l. The network is composed of a ne transformations, or fully-connected layers, where the outputs from one layer are distributed fully over the inputs to the next layer. Precisely: F (Z 1 ) = f L (...f 2 (f 1 (Z 1 1 ) 2 ))... L ) (3.1) Where, F =[f 1,f 2,...f L ] is the set of layer functions, = [ 1, 2,... L ] is the corresponding set of layer parameters, the output of one layer is passed as the input to the next as f l (Z l )=Z l+1 and the overall depth of the network is given by L. Layer f l is a fully-connected, or a ne, transformations, defined by the following: f l (Z l l )=h(w l Z l + b l ), l =[W l,b l ] (3.2) Here, the input Z l is flattened to a column vector of length N l and the dot-product is computed with a weight matrix W l of shape (M l,n l ), followed by an additive vector bias term b l with length M l. Note that an a ne layer transforms an N l -dimensional input to an M l -dimensional output, referred to as the width of the layer. The final operation is a point-wise nonlinearity, h( ), defined here as tanh( ), which is bounded on ( 1, 1). When used as a classification system, the first L 1 layers of a deep network can be viewed as feature extractors, and the last layer, f L, is simply a linear classifier. This output can be forced to behave as a probability mass function for membership to a given class by making the length of Z l match the number of classes and by constraining the L 1 -norm of the output to equal 1. This probability mass function P ( ) for an input Z 1 is achieved by applying the softmax operation to the output of the network, Z L, defined as follows: exp(z L ) P (Z 1 ) = (Z L )= P ML m=1 exp (Z L[m]) (3.3)

21 Chapter 3. Approach 14 In this supervised learning implementation, the output Z L of this final layer is used to make a prediction where the most likely class is determined by argmax(p (Z 1 )), that, with a provided target value y, can be combined into loss function. With the network defined as a probability mass function for class membership, it can be trained by iteratively minimizing this loss function using the negative log-likelihood of the correct class for a set of K observations: L = KX log(p (X k = Y k )) (3.4) k=0 where, Z k and Y k are the input data and corresponding class label, respectively of the k th observation. This loss function can then be minimized through gradient descent, which iteratively searches for the minimum value of the loss function. Here, gradients are computed with K>1, but much smaller than the total number of observations, by sampling data points from the training set and averaging the loss over the batch. Specifically, the update rule for is defined as its di erence with the gradient of the scalar loss L with respect to the parameters, weighted by the learning rate, given by the following: L (3.5) K = 100 is used, where the observations are drawn uniformly from each class, i.e. a 10 observations of each genre, and a constant learning rate of = 0.1. Learning proceeded for 3k iterations without early stopping or model selection. Note that all input data is preprocessed before input to the network to have zero-mean and unit variance. This is done by calculating the mean and standard deviation over all data points and was shown to significantly improve system performance Applying the Deep Network Unlike previous classification schemes for rhythm similarity methods, track-level aggregation is held o until after frame-wise classification. Here, the deep network is applied independently to a time-series of onset patterns, producing a posteriorgram. Though there are alternative statistics that could be explored, such as the median or maximum, mean-aggregation is taken for each class prediction over the entire track.

22 Chapter 3. Approach Analytic Approach Chapter 2 highlights two connected issues that have prompted the analysis and discussion approach taken in this research. The first issue concerns the practice of genre identification as a proxy task for rhythm similarity. As mentioned, genre classification is the de-facto proxy task for verifying rhythm similarity measure and there remains a dearth in the literature for: 1) in depth analysis of the suitability of genre for the given feature; 2) informed explications of the assumptions made in system design; 3) and a proper examination of classification results fully taking into account the contents of the dataset used. The facile assumptions made in system verification and the face-value interpretations of classification results commonly accepted belies either a general lack of commitment or naiveté to musical relevance among researchers. As explored in [48] and stated confidently enough to be used as its title Classification Accuracy is Not Enough. The second issue concerns the e ect of the dataset on learned features in rhythm similarity. As discussed at the end of Section 2.3.1, in a deep network, features are learned based on provided labeled training examples. Hence, the feature s characteristics are molded by the class representations in the dataset. Though desirable if working with an ideal dataset for the task, in the case of this research which uses genre membership as a proxy for rhythm similarity, there may be unintended (i.e. not rhythmically relevant) influences on the feature representation. In an e ort to better understand the musical significance of this rhythm similarity research beyond classification score and in an attempt to account for these various factors, a multi-disciplinary approach is taken here to examine the results, the dataset, and the system design. In addition to standard machine learning, MIR and statistical analyses methods, results are examined through rhythmic, musico-cultural and historical analyses, employing personal domain knowledge and borrowing heavily from the related musicological literature.

23 Chapter 4 System Configuration and Results 4.1 Dataset In keeping with standard methods, a genre classification task is used to evaluate this measure of rhythm similarity, utilizing the well-known Latin Music Dataset (LMD). The LMD is a collection of Latin dance music comprised of 3216 tracks 1, split into 10 distinct genres: Axé, Bachata, Bolero, Forró, Gaúcha, Merengue, Pagode, Salsa, Sertaneja and Tango. The LMD is used here for several reasons: for this dance-based dataset, genre is assumed to serve as a good proxy for rhythm; the size of the LMD compares favorably to other smaller dance-based datasets such as the Ballroom set, a requisite for supervised deep-learning tasks; and, perhaps more importantly, this research stems from a deeper interest in Latin music in general. Based on the idea that domain knowledge is important to the development and analysis of computational music similarity measures, personal knowledge and interest in the subject is leveraged for the analyses in Chapters V-VII. Though the LMD provides full recordings, many of the tracks are from live shows and contain non-musical segments (e.g. clapping, spoken introductions). To reduce this noise, only the middle 30 seconds from each track are used for analysis. 4.2 Methodology The following experiments seek to identify the optimal system configuration for genre classification on the LMD. These experiments are broken into two parts: the first concerns resolution of the OP and the second concerns complexity in the feature-learning 1 Though the original LMD has 3,227 total recording, duplicates and tracks that were too short in duration for analysis have been removed. 16

24 Chapter 4. System Configuration and Results 17 Source SS df MS F Prob>F Columns E-07 Error Total Table 4.1: ANOVA results for classification scores with varying P values shows that periodicity resolution is a significant factor. stages of the network. For the OP, best general feature space is desired, one that is maximally informative while avoiding over-representation which can slow down, and even hinder classification. various OP resolutions are examined by testing values for frequency bins (F ) and periodicity bins (P ) as independent factors. Subsequent network test seek to design a network that appropriately fits the complexity of the task. System complexity is determined by layer depth (L) and layer output size (M), several combinations of value for these parameters are examined. For baseline classification, the system defined in Section 3.2 with a single layer network is used, which is simply multi-class linear regression. This is the classifier used for all OP parameter tests. Scores for all classification tests are averaged over 10 cross-validated folds, stratified against genre. 4.3 OP Parameter Tuning Initial tests begin on an OP with F = 30 and P = 25 based on results in [11], taking the minimal dimensions that were shown to perform well Periodicity Resolution Over the seven tested OP configurations, with P in the range [5, 100], P = 15 provides the best results. An analysis of variance test on classification scores shows that periodicity resolution plays a significant role in the outcome. This is indicated by a Prop>F value less than 0.05 as can be seen in Table 4.1. After applying a Tukey HSD adjustment, a comparison of means, Figure 4.1, presents a clear trend, with significantly lower scores for OPs with either too few or too many periodicity bins and the maximum classification rate obtained with P = 15. These tests, showing better results with fewer dimensions, di er from results in [11], but this disparity most likely arises from di erences in data and classification strategy.

25 Chapter 4. System Configuration and Results 18 30x5 30x10 OP Dimension 30x15 30x25 30x50 30x75 30x Mean Accuracies (%) Figure 4.1: E ect of P on classification. The highest result is highlighted in blue, while significantly di erent results are in red Frequency Resolution Setting P = 15 based on the above, F values are then tested in the range [18, 300]. An ANOVA test on these results shows a significant e ect for this parameter with Prop>F =0.001 and, as can be seen in Figure 4.2, accuracy rates go up with higher frequency resolution, leveling out for F 240. Results in [11] show minor but statistically insignificant improvements by increasing the OP frequency resolution, but this is consistent with results here for F apple 120 and does not preclude the higher scores seen for F>120. Based on these tests, going forward OPs are calculated setting F = 240 and P = x15 30x15 OP Dimension 60x15 90x15 120x15 240x15 300x Mean Accuracies (%) Figure 4.2: E ect of F on classification. The highest result is highlighted in blue, while significantly di erent results are in red. 4.4 Deep Network Parameterization With optimal parameters for this feature set in place, the next step is finding the best network architecture for this data. Returning to the notation of Section 3.2, here choices of layer width, M l,l < L 1, and network depth, L are explored. Note that the input and output dimensionality are fixed as N 1 = and M L = 10 due to the previous discussion and the number of classes in the dataset, respectively.

26 Chapter 4. System Configuration and Results Layer Width This parameter search begins with a two-layer network (L = 2), sweeping the width of the first layer, M 1, over increasing powers of 2 in the range [16, 8192]. Results demonstrate a performance pivot around M 1 = 128, achieving a maximum accuracy at M 1 = 2048 but otherwise insignificant variation with M An ANOVA on these results show significance for this factor ( Prob>F = 0.015), but Figure 4.3 indicates minimal impact for M Hidden Layer Size Mean Accuracies (%) Figure 4.3: Mean comparison of ANOVA tests on network layer complexity in a 2-layer architecture show significantly lower results for small M Network Depth Based on the above, deeper architectures are considered by setting M l = 2048,l <L 1 and incrementally adding layers for a maximum depth of L = 6. This fails to show any significant changes in accuracy, with an ANOVA test revealing a Prob>F of , greater than the null-hypothesis threshold of Importantly, while only a limited number of interactions between depth and width are explored, independently varying L or M l for various values shows no significant di erence provided M l 128, consistent with previous findings. 4.5 Optimal System Configuration Further tests continue with a two-layer architecture (L =2,M 1 = 2048) based on the parameters used for the best score in Figure 4.3, expressed completely by the following: P (X 1 ) = (f 2 (f 1 (X 1 1 ) 2 )) (4.1) For clarity, the dimensionality of the first layer, f 1, is given by (M 1 = 2048,N 1 = 3600), and the dimensionality of the second by (M 2 = 10,N 2 = 2048).

27 Chapter 4. System Configuration and Results 20 Feature Accuracy (%) LPQ (Texture Descriptors) [49] OP (Holzapfel) [11] Mel Scale Zoning [50] OP (Proposed) Table 4.2: Classification accuracies for di erent features on the LMD. Genre Total Per Correctly % Genre Predicted Correct Merengue Tango Bachata Pagode Salsa Axé Bolero Gaúcha Forró Sertaneja Total Table 4.3: Classification accuracies by genre, ordered from highest classification score to lowest, show Brazilian genres generally performing worse than the rest. With this configuration, classification on the LMD yielded a peak average score of 91.32%, which surpasses previous attempts at genre classification on this dataset. Table 4.2 shows the proposed approach outperforming others by a margin of more than 8%. One trend that is immediately apparent in the results is a di culty in classifying Brazilian genres. Table 4.3, with genre-wise classification accuracies ordered from highest to lowest, shows Axé, Gaúcha, Forró and Sertaneja, all Brazilian genres, occupying four of the five bottom slots. Also, when looking at class-by-class confusions, as shown in Table 4.4, certain a nities between genres are apparent. The lowest scoring Sertaneja has the majority of its false tags predicted as Bolero, but also many predicted as Gaúcha and Forró. While the next three lowest performing classes, Gaúcha, Forró and Bolero, have most of their false tags predicted as Sertaneja. These trends in class-confusions will be expanded on in subsequent chapters. The increase in accuracy from previous attempts may be partially explained by di erences in methodology (i.e. aggregation strategies, signal noise reduction, etc.), but the strength of this deep-network strategy for classification plays a significant role here. Its e ect can be seen in Table 4.5, by comparing the proposed approach to simpler classification methods on the same OP input, the former outperforming the rest by a margin of

Music Genre Classification

Music Genre Classification Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Neural Network for Music Instrument Identi cation

Neural Network for Music Instrument Identi cation Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Music Complexity Descriptors. Matt Stabile June 6 th, 2008 Music Complexity Descriptors Matt Stabile June 6 th, 2008 Musical Complexity as a Semantic Descriptor Modern digital audio collections need new criteria for categorization and searching. Applicable to:

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Composer Style Attribution

Composer Style Attribution Composer Style Attribution Jacqueline Speiser, Vishesh Gupta Introduction Josquin des Prez (1450 1521) is one of the most famous composers of the Renaissance. Despite his fame, there exists a significant

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

LSTM Neural Style Transfer in Music Using Computational Musicology

LSTM Neural Style Transfer in Music Using Computational Musicology LSTM Neural Style Transfer in Music Using Computational Musicology Jett Oristaglio Dartmouth College, June 4 2017 1. Introduction In the 2016 paper A Neural Algorithm of Artistic Style, Gatys et al. discovered

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video Mohamed Hassan, Taha Landolsi, Husameldin Mukhtar, and Tamer Shanableh College of Engineering American

More information

Music Similarity and Cover Song Identification: The Case of Jazz

Music Similarity and Cover Song Identification: The Case of Jazz Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary

More information

The Latin Music Database A Database for Automatic Music Genre Classification

The Latin Music Database A Database for Automatic Music Genre Classification The Latin Music Database A Database for Automatic Music Genre Classification Carlos N. Silla Jr., Celso A. A. Kaestner, Alessandro L. Koerich 11 th Brazilian Symposium on Computer Music (SBCM2007) São

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Automatic Music Genre Classification

Automatic Music Genre Classification Automatic Music Genre Classification Nathan YongHoon Kwon, SUNY Binghamton Ingrid Tchakoua, Jackson State University Matthew Pietrosanu, University of Alberta Freya Fu, Colorado State University Yue Wang,

More information

A FEATURE SELECTION APPROACH FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

A FEATURE SELECTION APPROACH FOR AUTOMATIC MUSIC GENRE CLASSIFICATION International Journal of Semantic Computing Vol. 3, No. 2 (2009) 183 208 c World Scientific Publishing Company A FEATURE SELECTION APPROACH FOR AUTOMATIC MUSIC GENRE CLASSIFICATION CARLOS N. SILLA JR.

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello

Structured training for large-vocabulary chord recognition. Brian McFee* & Juan Pablo Bello Structured training for large-vocabulary chord recognition Brian McFee* & Juan Pablo Bello Small chord vocabularies Typically a supervised learning problem N C:maj C:min C#:maj C#:min D:maj D:min......

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t MPEG-7 FOR CONTENT-BASED MUSIC PROCESSING Λ Emilia GÓMEZ, Fabien GOUYON, Perfecto HERRERA and Xavier AMATRIAIN Music Technology Group, Universitat Pompeu Fabra, Barcelona, SPAIN http://www.iua.upf.es/mtg

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

jsymbolic 2: New Developments and Research Opportunities

jsymbolic 2: New Developments and Research Opportunities jsymbolic 2: New Developments and Research Opportunities Cory McKay Marianopolis College and CIRMMT Montreal, Canada 2 / 30 Topics Introduction to features (from a machine learning perspective) And how

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval with Temporal Features and Timbre Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS Mutian Fu 1 Guangyu Xia 2 Roger Dannenberg 2 Larry Wasserman 2 1 School of Music, Carnegie Mellon University, USA 2 School of Computer

More information

Modeling memory for melodies

Modeling memory for melodies Modeling memory for melodies Daniel Müllensiefen 1 and Christian Hennig 2 1 Musikwissenschaftliches Institut, Universität Hamburg, 20354 Hamburg, Germany 2 Department of Statistical Science, University

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Experiments on musical instrument separation using multiplecause

Experiments on musical instrument separation using multiplecause Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Music Composition with RNN

Music Composition with RNN Music Composition with RNN Jason Wang Department of Statistics Stanford University zwang01@stanford.edu Abstract Music composition is an interesting problem that tests the creativity capacities of artificial

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj

Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj Deep Neural Networks Scanning for patterns (aka convolutional networks) Bhiksha Raj 1 Story so far MLPs are universal function approximators Boolean functions, classifiers, and regressions MLPs can be

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,

More information

Analysis of local and global timing and pitch change in ordinary

Analysis of local and global timing and pitch change in ordinary Alma Mater Studiorum University of Bologna, August -6 6 Analysis of local and global timing and pitch change in ordinary melodies Roger Watt Dept. of Psychology, University of Stirling, Scotland r.j.watt@stirling.ac.uk

More information

Creating a Feature Vector to Identify Similarity between MIDI Files

Creating a Feature Vector to Identify Similarity between MIDI Files Creating a Feature Vector to Identify Similarity between MIDI Files Joseph Stroud 2017 Honors Thesis Advised by Sergio Alvarez Computer Science Department, Boston College 1 Abstract Today there are many

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

Speech To Song Classification

Speech To Song Classification Speech To Song Classification Emily Graber Center for Computer Research in Music and Acoustics, Department of Music, Stanford University Abstract The speech to song illusion is a perceptual phenomenon

More information

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS 1th International Society for Music Information Retrieval Conference (ISMIR 29) IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS Matthias Gruhne Bach Technology AS ghe@bachtechnology.com

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Understanding Compression Technologies for HD and Megapixel Surveillance

Understanding Compression Technologies for HD and Megapixel Surveillance When the security industry began the transition from using VHS tapes to hard disks for video surveillance storage, the question of how to compress and store video became a top consideration for video surveillance

More information

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis Semi-automated extraction of expressive performance information from acoustic recordings of piano music Andrew Earis Outline Parameters of expressive piano performance Scientific techniques: Fourier transform

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

Pitch correction on the human voice

Pitch correction on the human voice University of Arkansas, Fayetteville ScholarWorks@UARK Computer Science and Computer Engineering Undergraduate Honors Theses Computer Science and Computer Engineering 5-2008 Pitch correction on the human

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn

Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Reconstruction of Ca 2+ dynamics from low frame rate Ca 2+ imaging data CS229 final project. Submitted by: Limor Bursztyn Introduction Active neurons communicate by action potential firing (spikes), accompanied

More information

Analysis of Seabright study on demand for Sky s pay TV services. Annex 7 to pay TV phase three document

Analysis of Seabright study on demand for Sky s pay TV services. Annex 7 to pay TV phase three document Analysis of Seabright study on demand for Sky s pay TV services Annex 7 to pay TV phase three document Publication date: 26 June 2009 Comments on the study: The e ect of DTT availability on household s

More information

The Million Song Dataset

The Million Song Dataset The Million Song Dataset AUDIO FEATURES The Million Song Dataset There is no data like more data Bob Mercer of IBM (1985). T. Bertin-Mahieux, D.P.W. Ellis, B. Whitman, P. Lamere, The Million Song Dataset,

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

Analysis and Clustering of Musical Compositions using Melody-based Features

Analysis and Clustering of Musical Compositions using Melody-based Features Analysis and Clustering of Musical Compositions using Melody-based Features Isaac Caswell Erika Ji December 13, 2013 Abstract This paper demonstrates that melodic structure fundamentally differentiates

More information

CS 591 S1 Computational Audio

CS 591 S1 Computational Audio 4/29/7 CS 59 S Computational Audio Wayne Snyder Computer Science Department Boston University Today: Comparing Musical Signals: Cross- and Autocorrelations of Spectral Data for Structure Analysis Segmentation

More information

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound Pitch Perception and Grouping HST.723 Neural Coding and Perception of Sound Pitch Perception. I. Pure Tones The pitch of a pure tone is strongly related to the tone s frequency, although there are small

More information

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting

Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Luiz G. L. B. M. de Vasconcelos Research & Development Department Globo TV Network Email: luiz.vasconcelos@tvglobo.com.br

More information

Algorithmic Composition: The Music of Mathematics

Algorithmic Composition: The Music of Mathematics Algorithmic Composition: The Music of Mathematics Carlo J. Anselmo 18 and Marcus Pendergrass Department of Mathematics, Hampden-Sydney College, Hampden-Sydney, VA 23943 ABSTRACT We report on several techniques

More information

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

gresearch Focus Cognitive Sciences

gresearch Focus Cognitive Sciences Learning about Music Cognition by Asking MIR Questions Sebastian Stober August 12, 2016 CogMIR, New York City sstober@uni-potsdam.de http://www.uni-potsdam.de/mlcog/ MLC g Machine Learning in Cognitive

More information

Lyrics Classification using Naive Bayes

Lyrics Classification using Naive Bayes Lyrics Classification using Naive Bayes Dalibor Bužić *, Jasminka Dobša ** * College for Information Technologies, Klaićeva 7, Zagreb, Croatia ** Faculty of Organization and Informatics, Pavlinska 2, Varaždin,

More information

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene Beat Extraction from Expressive Musical Performances Simon Dixon, Werner Goebl and Emilios Cambouropoulos Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria.

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

Feature-Based Analysis of Haydn String Quartets

Feature-Based Analysis of Haydn String Quartets Feature-Based Analysis of Haydn String Quartets Lawson Wong 5/5/2 Introduction When listening to multi-movement works, amateur listeners have almost certainly asked the following situation : Am I still

More information

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections

Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections 1/23 Combination of Audio & Lyrics Features for Genre Classication in Digital Audio Collections Rudolf Mayer, Andreas Rauber Vienna University of Technology {mayer,rauber}@ifs.tuwien.ac.at Robert Neumayer

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

Perceptual Evaluation of Automatically Extracted Musical Motives

Perceptual Evaluation of Automatically Extracted Musical Motives Perceptual Evaluation of Automatically Extracted Musical Motives Oriol Nieto 1, Morwaread M. Farbood 2 Dept. of Music and Performing Arts Professions, New York University, USA 1 oriol@nyu.edu, 2 mfarbood@nyu.edu

More information

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.

More information

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1 02/18 Using the new psychoacoustic tonality analyses 1 As of ArtemiS SUITE 9.2, a very important new fully psychoacoustic approach to the measurement of tonalities is now available., based on the Hearing

More information