Audio-based Music Segmentation Using Multiple Features

Size: px
Start display at page:

Download "Audio-based Music Segmentation Using Multiple Features"

Transcription

1 Audio-based Music Segmentation Using Multiple Features Pedro Girão Antunes Dissertation submitted for obtaining the degree of Master in Electrical and Computer Engineering Jury President: Doutor Carlos Filipe Gomes Bispo Supervisor: Doutor David Manuel Martins de Matos Members: Doutora Isabel Maria Martins Trancoso Doutor Thibault Nicolas Langlois December 2011

2

3 Acknowledgements I would like to show my gratitude to my professor David Matos. Also, to Carlos Rosão for his contribution. Also to my family, especially to my parents Ana and António, my brother Francisco and my aunt Maria do Carmo; and to my friends, especially to Mariana Fontes, Gonçalo Paiva, João Fonseca, Bernardo Lopes, Catarina Vazconcelos, Pedro Mendes, João Devesa, Luis Nunes, Manuel Dordio and Miguel Pereira. Lisboa, December 13, 2011 Pedro Girão Antunes

4

5 Resumo A segmentação estrutural baseada em sinal de áudio musical é uma área de investigação em crescimento. Destina-se a segmentar uma peça de música em partes estruturalmente significativas, ou segmentos de alto nível. Entre muitas aplicações, oferece grande potencial para melhorar a compreensão acústica e musicológica de uma peça de música. Esta tese descreve um método para localizar automaticamente os pontos de mudança na música, fronteiras entre segmentos, com base numa representação bidimensional de si mesma, a SDM (Self Distance Matrix)(Matriz de Auto Distância), e em onsets de áudio. Os recursos utilizados para o cálculo da SDM são: os MFCCs, o chromagram e o rhythmogram, sendo também combinados. Os onsets de áudio são determinados usando diversos métodos do estado da arte. A sua utilização baseia-se na suposição de que cada fronteira de segmento deve ser um onset de áudio. Basicamente, a SDM é usada para determinar qual dos onsets detectados é um momento de mudança de segmento. Para tal, usando a SDM, em que um núcleo tabuleiro de xadrez é aplicado ao longo de sua diagonal, obtém-se uma função cujos picos são considerados instantes candidatos a fronteira. Os instantes selecionados são os onsets de áudio mais próximos dos picos detectados. A aplicação do método baseia-se no uso do Matlab e diversas toolboxes. Os resultados obtidos para um corpus de 50 canções, são comparáveis com os do estado da arte.

6

7 Abstract Structural segmentation based in the musical audio signal is a growing area of investigation. It aims to segment a piece of music into structurally significant parts, or higher level segments. Among many applications, it offers great potential for improving the acoustic and musicological modeling of a piece of music. This thesis describes a method for automatically locate points of change in the music, based on a two dimensional representation of itself, the SDM (Self Distance Matrix), and the detection of audio onsets. The features used for the computation of the SDM are: the MFCCs, the chromagram and the rhythmogram which are also combined together. The audio onsets are determined using distinct state of the art methods, they are used in the assumption that every segment changing moment must be an audio onset. Basically, the SDM is used to determine which of the detected onsets are a moment of segment change. To do so, using the SDM, on which a checkboard kernel with radial smoothing is applied along its diagonal, a novelty score function is obtained of which the peaks are considered to be candidate instants. The selected instants are the audio onsets closer to the detected peaks. The application of the method relies on the use of Matlab and several toolboxes. Our results, obtained for a corpus of 50 songs, are comparable with the state of the art.

8 ii

9 Índice 1 Introduction Music - Audio Signal MIR Audio-based Approaches Automatic Music Structural Segmentation Feature extraction Timbre Features Pitch related Features Rhythmic Features Techniques Objective Document Structure Music Structure Analysis Structural Segmentation Types of Approaches Novelty-based Approaches State Approaches Sequence Approaches Segment Boundaries and Note Onsets Summary iii

10 3 Method Extracted Features Window of Analysis Mel Frequency Cepstral Coefficients Chromagram Rhythmogram Segment Boundaries Detection Self Distance Matrix Checkboard Kernel Correlation Peak Selection Mixing Features Note Onsets Summary Evaluation and Discussion of the Results Corpus and Groundtruth Baseline Results Feature Window of Analysis SDM Distance Measure Note Onsets Mixing Features Discussion Summary iv

11 5 Conclusion Conclusion Contributions Future Work v

12 vi

13 List of Figures 1.1 Signals Spectrum Musical Score Audio Signal MIR Tasks Features Representation HMM SDM Sequence Chroma Helix Rhythmogram Flowchart of the method implemented MFCC SDM Checkboard Kernel Novelty-score Computation Novelty-score vii

14 viii

15 List of Tables 3.1 State of the Art Works and Features Method Baseline Setup Corpus Baseline Average F-measure Results Average Results - Window Size Experiment Average Results - Distance Measure Experiment Average Results - Onsets Experiment Best Sum of SDMs Results Best SVD Results Best Intersection Results Average Results - Feature Mixture Experiment Average Results - Feature Mixture Experiment Method Best Setup Best Result State of the Art Results MIREX Boundary recovery results ix

16 x

17 Nomenclature abs Absolute value AT Automatic generated boundaries C k d s Gaussian tapered checkboard Distance measure function F F-measure GT Groundtruth boundary annotations N Novelty-score function P Precision r Correlation coefficient R Recall v Feature vector w s w t Window size Groundtruth threshold xi

18 xii

19 1 Introduction The expansion of music in digital format due to the growing efficiency of compression algorithms led to the massification of music consumption. Such a phenomenon led to the creation of a new research field called musical information retrieval (MIR). Information retrieval (IR) is the science of retrieving from a collection of items a subset that serves some defined purpose. In this case it is applied to music. The goal of this chapter is to present the context in which this thesis has been developed, including the motivation for this work, some practical aspects related to automatic audio segmentation and finally a summary of the work carried out and how it is organized. 1.1 Music - Audio Signal In an objective and simple way music can be defined as the art of arranging sounds and silences in time. Any sound can be described as a combination of sine waves, each with its own frequency of vibration, amplitude, and phase. In particular, the sounds produced by musical instruments are the result of the combination of different frequencies, which are all multiple integers of a fundamental frequency, called harmonics, (figure 1.1). The perception of this frequency is called pitch, which is one of the characterizing elements of a sound alongside loudness (related with the amplitude of the signal) and timbre. Typically, humans cannot perceive the harmonics as separate notes. Instead, a musical note composed of many harmonically related frequencies is perceived as one sound, where the relative strengths of the individual harmonic frequencies gives the timbre of that sound. Considering polyphonic music, sound is composed by various instruments that interact through time, all together, composing the diverse dimensions of music. The main musical dimensions of interest for music retrieval are: Timbre can be simply defined as everything about a sound which is neither loudness nor pitch (Erickson 1975). As an example, it is what is different about the same tone performed in an acoustic guitar and a flute.

20 2 CHAPTER 1. INTRODUCTION Figure 1.1: The figure presents some periodic audio signals on the left and their frequency counterparts, on the right. The first signal is a simple sine wave used to tune musical instruments. As can be seen, the subsequent signals in time present a growing complexity relatively to the sine wave. Their harmonics are the peaks presented on the frequency plot, evident on the violin and flute. Rhythm is the arrangement of sounds and silences in time. It is related to the periodic repetition of a temporal pattern of onsets. The perception of rhythm is closely related to the sound onsets alone, so sounds can be unpitched, as for example the percussion instruments sounds are. Melody is a linear succession of musical tones which is perceived as a single entity. Usually the tones have similar timbre and a recognizable pitch within a small frequency range. Harmony is the conjugation of diverse pitches simultaneously. Harmony can be conveyed by polyphonic instruments, by a group of monophonic instrument, or may be indirectly implied by the melody. Structure is on a different level of the previous dimensions, as it covers them all. Structure, or musical form, relates to the way previous dimensions create determined patterns making structural segments that repeat themselves in some way, like the chorus, the verse and so on. Music can be represented in a symbolic way, as a musical score, used by musicians to read and write music (figure 1.2). Another form of representation, and the more common one, is the auditory representation in a waveform (e.g., WAV, MP3, etc.) (figure 1.3). It is based on this representation that most of MIR researches are made, they are called audio-based approaches.

21 1.2. MIR AUDIO-BASED APPROACHES 3 Figure 1.2: A musical score sample of the famous song Hey Jude by The Beatles. Figure 1.3: Audio signal from the song Northern Sky by Nick Drake. 1.2 MIR Audio-based Approaches The main idea underlying content-based approaches is that a document can be described by a set of features that are directly computed from its content, in this case, audio. Despite the existence of metadata, namely: author name, work title, genre classification and so on; the basic assumption behind audio-based approaches is that metadata may be either not suitable, or unreliable, or missing. On one hand, relying only on the information within the music is advantageous because that is generally the only information available. On the other hand, it presents many difficulties due to the heterogeneity and complexity of musical data. Listening to music, we humans can easily perceive a variety of events: the progression of harmonies and the melodic cadences, although we might not be able to name them; changes of instrumentation, the presence of drum fills, the presence of vocals, etc. We can perceive many events in music, and even without formal musical training, by identifying repetitions and abrupt changes, we can perceive structure. For the past decade, MIR as a research field has grown significantly. Given the multidisciplinary of the field, it brings together experts from many different areas of research: signal processing, database research,

22 4 CHAPTER 1. INTRODUCTION Figure 1.4: Some MIR tasks organized by level. machine learning, musicology, perception, psychology, sociology, etc. Figure 1.4 presents some examples of MIR tasks and their level. Note that, the objectivity of the task tends to be inversely proportional to the level. This thesis focuses on the structural segmentation task. 1.3 Automatic Music Structural Segmentation Every piece of music has an overall plan or structure. This is called the form of the music. Musical forms offer a great range of complexity. For example, most occidental pop music tends to be short and simple, often built upon repetition; on the other hand, classical music traditions around the world tend to encourage longer, more complex forms. Note that, from an abstract point of view, structure is closely related to the human perception of it. For instance, most occidental people can easily distinguish the verse from the chorus of some pop song, but will have trouble recognizing what is going on in a piece of Chinese traditional music for instance. Furthermore, classical music forms may be difficult to recognize without the familiarity that come from study or repeated hearings. Regarding pop music, modern production techniques often use copy and paste to clone multiple segments of the same type, even to clone components within the segment. This obviously facilitates the work of automatic segmentation, thus good results are obtained on this kind of music. This task can be divided in three problems: Determine the segment boundaries - beginning and ending instants of each segment; Determine the recurrent form - grouping the segments that are occurrences of the same musical part. They can be repetitions of the exact same segment or slight variations, that depends on the music genre.

23 1.3. AUTOMATIC MUSIC STRUCTURAL SEGMENTATION 5 The groups are often specified by letters A, B, C... Each group of segments is called a part; Determine the part label - for example, the chorus, the verse, the intro, etc. The second and third problems are similar: they are basically distance measurements. The third one normally depends on the second one, so it will be considered as less important on the scope of this thesis, also because of the extreme difficulty it presents. Some work has been done in this particular problem, for example by Paulus (2010). Furthermore, there are some methods focused only on the detection of the chorus, for example Goto (2006). The second problem is more commonly addressed. On some cases, it is following the first one, i.e., after determining the segment boundaries, each piece of music standing between two boundaries is considered to be a segment. Segments are then grouped by applying a measure of distance. An example of this method is Cooper and Foote (2003). Others address directly the problem of determining the parts. What is generally done using clustering algorithms or using Hidden Markov Models (HMM). The main idea underlying these methods is that music is made of repetition and, in that sense, the states of the HMM would represent the different parts. Note that these methods also determine the segment boundaries, determining the structural parts, the boundary instants are implicitly determined. Finally, the first problem will be the one addressed by this thesis. One example of some work done addressing this problem, is the work carried out by Foote (2000), following his work on a two dimensional representation of a musical signal, the Self-similarity Matrix (SSM) (Foote 1999), one of the most important breakthroughs on the structural segmentation task. Other works, include the one by Tzanetakis and Cook (1999). In chapter 2, the state of the art approaches are presented in more detail. The knowledge of the structure has various useful practical applications, for example: audio browsing i.e., besides browsing an album through songs it could also be possible to browse a song through segments; a starting point for other MIR tasks, including: music summarization (automatic selection of short representative audio thumbnails ), music recommendation (recommend songs with similar structure), genre classification, etc.; and even assist in musicological studies, for example, study the musical structure of songs from a determined culture or time, or the structure of songs that were in the top charts of the last decades. All the procedures start with a feature extraction step, where the audio stream is split into a number of frames from which feature vectors are calculated. Since the audio stream samples themselves do not provide

24 6 CHAPTER 1. INTRODUCTION relevant information, feature extraction is essential. And even more essential is to understand the meaning of the extracted features, i.e. what they represent regarding the musical dimensions. The subsequent steps, depend on the procedure and on the goals that are to be reached (summarization, chorus detection, segment boundaries detection, etc.), however, they are limited to the extracted features and what they represent. So the feature extraction step plays a central role in any MIR procedure Feature extraction Feature extraction is essential for any music information retrieval system. In particular, when detecting segment boundaries. In general, humans can easily perceive segment boundaries in popular music that is familiar to them. But what information contained in a musical signal is important to perceive that event? According to Bruderer et al experiments on humans perception of structural boundaries in popular music (Bruderer et al. 2006); global structure (repetition, break), change in timbre, change in level and change in rhythm, represent the main perceptual cues responsible for the perceiving of boundaries in music. Therefore, in order to optimize the detection of such boundaries, extracted features shall roughly represent the referred perceptual cues. Considering the perceptual cues and the presented musical dimensions, the musical signal is generally summarized in three dimensions: the timbre, the tonal part (pitch related, harmony and melody) and the rhythm. The features used in our method are presented in more detail in chapter Timbre Features Perceptually, timbre is one of the most important dimensions in a piece of music. Its importance relatively other musical dimensions can be easily understood by the fact that anyone can recognize familiar instruments, even without conscious thought, and people are able to do it with much less effort and much more accuracy than for recognizing harmonies or scales. As determined by Terasawa et al. (2005), Mel-frequency cepstral coefficients (MFCC) are a good model for the perceptual timbre space. MFCC is well known as a front-end for speech recognition systems. The first part of figure 1.5 represents a 40 dimensional MFCC vector over time. In addition to the use of MFCCs, in order to complete the timbre information of the musical signal, computation of: spectral centroid, spectral spread and spectral slope can be also useful (Kaiser and Sikora 2010). As an

25 1.3. AUTOMATIC MUSIC STRUCTURAL SEGMENTATION 7 Figure 1.5: Representation of various features as well as the segment boundaries groundtruth (dashed lines). The first corresponds to MFCCs, the second to chromagram and the third to the rhythmogram.

26 8 CHAPTER 1. INTRODUCTION alternative to the use of MFCCs, Levy and Sandler (2008) uses AudioSpactrumEnvelope, AudioSpectrumProjection and SoundModel descriptors of the MPEG-7 standard. Other alternative feature, include the Perceptual Linear Prediction (PLP) (Hermansky 1990), used by Jensen (2007) Pitch related Features Pitch, upon which harmonic and melodic sequences are built, represents an important musical dimension. One example of its importance to the human perception, are the music covers. Music covers usually preserve harmony and melody while using a different set of musical instruments, thus altering the timbre information of the song. However, they are usually accurately recognized by people. In the context of music structural segmentation, chroma features represent the most powerful representation for describing harmonic information (Müller 2007). The most important advantage of chroma features is their robustness to changes in timbre. A similar feature is the Pitch Class Profile coefficients (PCP) (Gómez 2006), used by Shiu et al. (2006) Rhythmic Features The rhythmic features are among the less used in the task of music structural segmentation. Considering the perceptual cue identified by Bruderer et al. study, change in rhythm. In fact, Paulus and Klapuri (2008) noted that the use of rhythmic information in addition to timbre and harmonic features provide useful information to structure analysis. The rhythmic content of a musical signal can be described with a rhythmogram as introduced by Jensen (2004) (third part of figure 1.5). It is comparable to a spectrogram, but instead of representing the frequency spectrum of the signal, it represents the rhythmic content Techniques Some techniques were already referred in the beginning of this section, they are presented in more detail in chapter 2: Self Distance Matrix The Self Distance Matrix (SDM) compares the feature vectors with each other, using some determined distance measure (for example, the euclidean) (Foote 1999).

27 1.4. OBJECTIVE 9 Hidden Markov Models The use of an HMM to represent music, assumes that each state represents some musical information, thus defining a musical alphabet, where each state represents a letter. Clustering The idea underlying the use of clusters to represent music is that different segments are represented by different clusters. Time difference Using the time differential of the feature vector large differences would indicate sudden transitions, thus a possible segment boundaries boundaries. Cost Function The cost function determines the cost of a determined segment, so that, segments where the composing frames have a high degree of self similarity have a low cost. 1.4 Objective The goal of this thesis is to perform structural segmentation on audio stream files, that is, to identify the instants of segment change, boundaries between segments. The computed boundaries will be then compared with manually noted ones in order to evaluated their quality. 1.5 Document Structure After presenting the context in which this thesis has been developed, including the motivation for this work and some practical aspects related to automatic audio segmentation. The remaining of this document is organized as follows: Chapter 2 introduces the state of the art approaches. Chapter 3 introduces the used features, followed by the presentation of the implemented method and each used tool. Chapter 4 introduces the final results discussion and a comparison with the state of art ones. Chapter 5 introduces the conclusions and future work.

28 10 CHAPTER 1. INTRODUCTION

29 2 Music Structure Analysis Music is structured, generally respecting some rules that vary regarding the genre of music. Music can be divided into many genres in many different ways. And each genre of music can also be divided in a variety of styles. For instance, the Pop/Rock genre includes over 50 different styles 1, and most of them are extremely different (for example: Death Metal and Country Rock). Then, even if there is controversy on the way music genres are divided, the diversity of sounds in different genres is unquestionable. In that sense, achieving the capability to adapt to such a variety of sounds presents the major difficulty for the automatic segmentation approaches. The goal of this chapter is to introduce the state of the art approaches to the problem of structural segmentation in music. They are organized in three sets as proposed by Paulus et al. (2010): novelty-based approaches, state approaches and sequence approaches. Additionally, it will discuss the relation between the segment boundaries and the note onsets. 2.1 Structural Segmentation Types of Approaches The various techniques used to solve the structural segmentation problem so far can be grouped according to their paradigm. Peeters (2004) considered dividing the approaches into two sets: sequence approaches and state approaches. The sequence approaches consider that there are sequences of events that are repeated several times in a given music. The state approaches consider the musical audio signal to be a succession of states, where each state produces some part of the signal. Paulus et al. (2010) on the other hand, suggested dividing the methods into three main sets: novelty-based approaches, homogeneity-based approaches and repetition-based approaches. In fact, the homogeneity-based approaches are basically the same as the state approaches defined by Peeters, and the repetition-based approaches are the sequence approaches. The third set proposed by Paulus, novelty-based approach, can be seen as a front-end for one of the other approaches or both. The goal of this section is to introduce each one of the three sets of approaches, as well as the state of the 1

30 12 CHAPTER 2. MUSIC STRUCTURE ANALYSIS art methods referred to each. Starting with the novelty-based approaches, followed by the state approaches and finally the sequence approaches Novelty-based Approaches The goal of the novelty-based approaches is to locate instants where changes occur in a song, usually referred to as segment boundaries. Knowing those, segments can be defined between them. The most common way of doing so is using a Self-Distance Matrix (SDM). The SDM is computed as follows: SDM(i, j) = d s (v i, v j ) i, j = 1,..., n (2.1) Where d s represents a distance measure (for example, Euclidean distance), v represents a feature vector and i and j are the frame numbers, where a frame is the smallest piece of music used. Using a checkboard kernel (figure 3.5) to be correlated along the diagonal of the SDM yields a novelty-score function. The peaks of the novelty score represent candidate boundaries between segments. This method was first introduced by Foote (2000). More about this method is introduced in chapter 3. Other method to detect boundaries was proposed by Tzanetakis and Cook (1999), by using the time differential of the feature vector, defined as the Mahalanobis distance: i = ((v i v i 1 ) T ( ) 1 (v i v i 1 )) (2.2) where is an estimate of the feature covariance matrix, calculated from the training data, and i is the frame number. This measure is related to the Euclidean distance but takes into account the variance and correlations among features. Large differences would indicate sudden transitions, thus a possible boundary. More recently, Jensen (2007) proposed a method where boundaries are detected using a cost function. This cost function determines the cost of a determined segment, so that, segments where the composing frames have a high degree of self similarity have a low cost.

31 2.1. STRUCTURAL SEGMENTATION TYPES OF APPROACHES 13 Figure 2.1: Representation of a simple HMM. Taken from Dannenberg and Goto (2008) State Approaches This kind of approaches considers the music audio signal as a succession of states. The most notable methods included in this set are the ones based on Hidden Markov Models (HMM) (Rabiner 1989). Using an HMM, the concept of state is taken more explicitly. It is assumed that each musical excerpt is represented by a state in the HMM. This way a musical alphabet is defined. Where each musical excerpt (each state) represents a letter (what is referred here as a musical excerpt can be one frame or a group of frames, depending on the approach). Time advances in discrete steps corresponding to feature vectors, transitions from one state to the next are modeled by a probabilistic distribution that only depends on the current state. This forms a Markov model that generates a sequence of states. Note that the states are hidden because only feature vectors are observable. Another probability function models the generation of a determined feature vector from a determined state, figure 2.1. The features are then decoded using the Viterbi algorithm and the most likely sequence of states is determined. The first approaches to use this method (Aucouturier and M.Sandler 2001) (Chu and Logan 2000) (Peeters and Rodet 2002) were initially implemented using a small number of states, in the assumption that each state would represent one part (verse, chorus, etc.). Although this model had a certain appeal, it did not work very well because the result was often temporally fragmented. Considering the analogy used before, in this case, different letters would represent different segments. Levy and Sandler (2008) used the same method with much better results. Using a larger number of states, then calculating histograms of the states with a sliding window over the entire sequence of states. Their assumption was that each segment type is characterized by a particular distribution of states, because roughly each kind of segment contains similar music. In order to implement such an assumption, clustering algorithms are applied to the histogram where each cluster corresponds to a particular part. Considering the analogy, in this case, segments would be composed by sets of letters, words, i.e. a particular part would correspond to a particular word.

32 14 CHAPTER 2. MUSIC STRUCTURE ANALYSIS Figure 2.2: Representation of parallel stripes. On bottom row the zoom of the top one, note that, from left to right the matrix is being processed as described in the main text. Figure taken from Müller (2007). Other common approach is based in clustering instead of HMMs. In Cooper and Foote (2003), clustering is used to determine the most frequent segment, of which the segments were determined using the novelty-score peaks. And Goodwin and Laroche (2004) used and algorithm that performs segmentation and clustering at the same time Sequence Approaches The sequence approaches consider the music audio signal as a repetition of sequences of events. This set of approaches rely mainly on the detection of diagonal stripes parallel to the matrix main diagonal, figure 2.2. These stripes represent similar sequences of features as first verified by Foote (1999). The diagonal stripes, when present, can easily be detected by humans in the SDM. However, the same is not true for automatic detection, due to varied distortions of the musical signal. For example, dynamics (example: retardando). In order to facilitate the detection of such stripes, several authors propose the use of a low pass filtering along the diagonal to smooth the SDM. Peeters (2007) in addition, proposed a high-pass filter in the direction perpendicular to the stripes to enhance such stripes. Others proposed enhancing methods, employing multiple iterations of erosions and dilations filtering along the diagonals (Lu et al. 2004). At this point the discovery of music repetition turned into an image processing task. Goto (2006) proposed the use of a time-lag matrix where the coordinates of the system were changed so that stripes appear horizontally or vertically, and would be easily detected. Shiu et al. (2006) proposed the use of the viterbi algorithm to detect the diagonal stripes of

33 2.2. SEGMENT BOUNDARIES AND NOTE ONSETS 15 musical parts that present a weaker similarity value, for example verses. These approaches somehow fail in a basic assumption that the stripes are parallel the main diagonal. Furthermore, although the detection of sequence repetition represents a great improvement to the musical structure analysis, it is not usually enough to represent the whole higher level structural segmentation, as it requires a part to occur at least twice to be found. Accordingly the combination of state approaches with sequence approaches appear to be the most reasonable. A good example of the combination of both approaches is the work done by Paulus and Klapuri (2009). 2.2 Segment Boundaries and Note Onsets The note onsets are defined as the start of a musical note, not only pitched notes but also unpitched ones, rhythmic notes. In monophonic music a note onset is well defined as well as its duration, however, in polyphonic music, note onsets of various instruments overlap. This makes them more difficult to identify, both automatically and perceptually. A variety of methods to detect the note onsets are presented by Rosão and Ribeiro (2011). Considering the detection of segment boundaries task, it is of our belief that the note onsets can be used to validate the segment boundaries. The assumption is that any segment is defined between note onsets, then any segment must start in a note onset. In that sense, the note onsets are seen as the events that trigger the segment change. Not only the segment change but every other event in music. In the extreme, without note onsets there is absence of sound. 2.3 Summary In this chapter the state of the art approaches were introduced according to the division proposed by Paulus et al. (2010): novelty-based approaches, state approaches and sequence approaches. The first set is focused on the detection of segment boundaries and is generally used as front-end for one of the other approaches. The second set, considers the musical audio signal to be a succession of states, where each state produces some part of the signal. The last set, considers that there are sequences of events repeated several times in a given music. To finalize the chapter, we considered the note onsets to be events that trigger the segment change.

34 16 CHAPTER 2. MUSIC STRUCTURE ANALYSIS

35 3Method Considering the introduced sets of methods, the implemented method belongs to the novelty-based approaches. It is focused on determining the segment boundaries. The goal of this chapter is to introduce the method developed aiming to solve the problem of segmentation of audio music streams, describing each used tool. It starts by considering the features collected from the audio stream and how they were mixed, followed by the introduction of the actual method. 3.1 Extracted Features The extraction of features is a very important step in any MIR system. Table 3.1 shows the features used in some structural segmentation works. In our case the features extracted are an attempt to represent the main three musical dimensions: timbre, tonal (harmony and melody) and rhythmic. In this section, we introduce the extracted features and their mixture, before, we consider the windows of analysis used to collect those features Window of Analysis The audio stream is first downsampled to 22050Hz, since this number of samples is enough. Considering those samples, they are then grouped in windows or frames. In music structure segmentation to compare frames with each other is a usual task, as it is evident in the SDM. Such a task can represent heavy computation depending on the number of frames used. Generally, larger frame length are used (0.1 1s), compared with most of the audio content analysis ( s). This fact reduces the number of frames in a song, thus reducing the SDM size. Moreover, larger frame length allows a larger temporal resolution which according to Peeter represents something musically more meaningful (Peeters 2004). Some proposed methods unlike using fixed length frames tend to use variable ones. This has two benefits: tempo invariance, which means that some melody, for example, that has some tempo fluctuation relatively the same pitch progression melody, can be successfully match; sharper feature differences, preventing sound

36 18 CHAPTER 3. METHOD Authors Task Features Goto (2006) Chorus Detection Chroma Jensen (2007) Music Structure Perceptual Linear Prediction (PLP), Chroma and Rhythmogram Kaiser and Sikora (2010) Music Structure 13 MFCCs, spectral centroid, spectral slope and spectral spread Levy and Sandler (2008) Music Structural Segmentation AudioSpectrumEnvelope, AudioSpectrumProjection, and SoundModel descriptors of the MPEG-7 standard Paulus and Klapuri (2009) Music Structural Segmentation 12 MFCCs (excluding the 0th), Chroma and Rhythmogram Peeters (2007) Music Structural Segmentation 13 MFCCs (excluding the 0th), 12 Spectral Contrast coefficients and Pitch Class Profile coefficients Peiszer et al. (2008) Music Structural Segmentation 40 MFCCs Shiu et al. (2006) Similar Segment Identification Pitch Class Profile coefficients Turnbull and Lanckriet (2007) Music Structural Segmentation MFCCs and Chroma Table 3.1: Compilation of works and features used. events respective features from spreading to other frames. Peiszer et al. (2008) for example, used the note onsets to set out window sizes. In our case, in order to accomplish sharper feature difference, the size of the windows are determined depending on the bpm. The bpm are determined using the function mirtempo() from MIRtoolbox (Lartillot 2011), which estimates the tempo by detecting periodicities from the onset detection curve. In this case the onsets are determined using the function mironsets() also from the MIRtoolbox. The mirtempo() function is quite accurate. It is made the assumption that the tempo in pop music is constant. The window size is determined as follows: w s = 1 2.bpm 60 (3.1) This yields window sizes between 0.15s and 0.3s, which is equivalent to bpm between 100 and 200, depending on the song. We used no overlapping of windows, except for the rhythmogram. The impact of using variable window size compared to fixed is discussed with actual evaluation values in the next chapter.

37 3.1. EXTRACTED FEATURES 19 Figure 3.1: Pitch represented by two dimensions: height, moving vertically in octaves, and chroma, or pitch class determining the rotation position within the helix. Taken from Gómez (2006) Mel Frequency Cepstral Coefficients The MFCCs are extensively used to represent the timbre in music. We used 40 MFCCs calculated using a filter bank composed by linear and logarithmic filters to model loudness compression, in order to simulate the characteristics of the human auditory system. The coefficients are obtained by taking the discrete cosine (DCT) of the log-power spectrum on the Mel-frequency scale. Most of the authors do not use more then 20 coefficients. However, some tests were made with 40 coefficients and final results show that the increase of coefficients has a significant influence (around 5% in our case). This was also verified by Peiszer et al. (2008), as well as Santos (2010), who also used 40 MFCC. The first part of figure 1.5 represents a 40 dimensional MFCC vector over time Chromagram The chroma refers to the 12 traditional pitch classes (the 12 semitones) {C, C#, D,..., A#, B}. As pitch repeats itself every octaves (12 semitones), a pitch class is defined to be the set of all pitches that share the same chroma. This is represented in figure 3.1. For example, the pitch class corresponding to the chroma F is the set {..., F0, F1, F2,...}, where 0, 1 and 2 represent the pitch F of each octave. Therefore, the chroma representation is a 12 dimensional vector, where each dimension is the respective chroma of the signal. Figure 1.5 shows a

38 20 CHAPTER 3. METHOD chromagram, the chroma represented in time. We used Müller and Ewert (2011) method to extract the chroma features. First, the pitch values are determined using a 88 filter centered in each pitch, from A0 to C8. The chroma vector is then calculated simply by adding pitches that correspond to the same chroma Rhythmogram The rhythmogram was first presented by Jensen (2004). It is computed by determining the autocorrelation of the note onsets on intervals of 2s, using a millisecond scale, what produces o vector of dimension 200, figure 3.2. Unlike the other two features extracted, the rhythmogram is calculated using a window of analysis of 2s and a hop size of w s. This way, the rhythmogram will have the same number of samples per song as the MFCCs and the chromagram. We used 4 different onsets: one taken from Peiszer et al. (2008), which was taken from a beat tracker. Other by Rosão (2011), based on the Spectral Flux (Bello et al. 2005). And the others using MIRToolbox function mironsets() (Lartillot 2011): one using the envelope of the signal and the other using the Spectral Flux as well. The first onsets are very few compared with the other three, this suggest that there must have been some selection. That fact is adverse to the usefulness of the rhythmogram. As shown in figure 3.2 (c)), the rhythmogram presents too few information. On the other hand the other note onsets, figure 3.2 (a), b) and d)), convey much more information. This is reflected on the final results as it will be shown in the next chapter. 3.2 Segment Boundaries Detection In this section the algorithm is presented. Figure 3.3 represents a flowchart of the implemented method using Matlab. The used method is based on the approach by Foote (2000), where he first introduced the novelty-score function. Firstly, the SDM matrix is computed then a novelty-score function is calculated from it and finally the peaks of such function are determined as candidates for segment boundaries. Following, each of the steps of the algorithm are presented Self Distance Matrix The SDM is determined by 2.1. The distance measure used was the Manhattan distance measures. As it is known to perform well when dealing with high dimensionality data (Aggarwal et al. 2001), which is the case. But the fact is, that experiences made with the Euclidean and the Cosine distance showed that there is not

39 3.2. SEGMENT BOUNDARIES DETECTION 21 Figure 3.2: Rhythmograms computed using different note onsets. a) Rosão; b) mironsets() using spectral flux; c) from Peiszer et al. (2008) and d) mironsets(). much difference of performance (Chapter 4). The SDM presents some characteristics. At first, since every frame is similar to itself the matrix diagonal will be zero. Furthermore, assuming that the distance measure is symmetric, the matrix will be as well. The SDM can be visualized using a gray-scale image where similar frames are presented as black and infinitely different ones in white or the other way round, this permits a somehow useful visual representation of a music (figure 3.4). The rectangular structures presented in the matrix, represent the structural elements present in a song. In order to detect them a checkboard kernel (3.5) is correlated along the matrix diagonal Checkboard Kernel Correlation A checkboar Kernel is presented in figure 3.5. Such kernel is correlated along the matrix diagonal, according to the novelty-score function: N(i) = k/2 k/2 m= k/2n= k/2 abs(r(c k (m, n), SDM(i + m, i + n))) (3.2) Where C k denotes a Gaussian tapered checkboard kernel of size k, radially symmetric and centered on (0, 0) and i and j are the frame numbers, figure 3.6 illustrates the novelty-score computation. The abs() represents

40 22 CHAPTER 3. METHOD Figure 3.3: Flowchart of the method implemented.

41 3.2. SEGMENT BOUNDARIES DETECTION 23 Figure 3.4: The MFCC SDM for the song Northern Sky by Nick Drake. Figure 3.5: Checkboar kernel with a size of 96 (k = 96).

42 24 CHAPTER 3. METHOD Figure 3.6: Illustration of the novelty-score computation. the absolute value and r() represents the correlation coefficient which is computed as follows: r = ( m m n (A mn Ā)(B mn B) n (A mn Ā))2 ( m n (B mn B)) (3.3) 2 Where A and B represent the Gaussian tapered checker board kernel matrix and the subset of SDM respectively, and Ā and B are the respective scalar means. This computation of N(i) is slightly different from the presented in Foote (2000), presented better final results. This can be justified by the fact that the computation of the correlation takes into account the mean values of both matrices, thus eliminating eventual noise Peak Selection The peaks of the novelty-score function are determined simply by detecting the signal changes in the derivative (positive to negative) of the novelty-score function. Generally the number of peaks detected is way above the number of segment boundaries present in an average 3 minutes pop song, then some selection is needed. One way of doing so is using a windows of 6s, half overlap, to analyze the function. Figure 3.7 shows a novelty-score peak selection, using this method. In each window the local maxima, if any, is chosen. This is done under the assumption that there are no segments smaller than 6s. Another way of doing so, is to define a threshold to eliminate peaks beneath its value. This approach was

43 3.3. MIXING FEATURES 25 Figure 3.7: The novelty-score from the SDM 3.4. The groundtruth is represented by the red interrupted vertical lines and the automatic generated boundaries by the red crosses. tested but with unsatisfying results due to the fact that most top peaks are not actual boundaries. Instead, lower local maxima are. To face this problem an average weighted threshold was tested, which obtained better results than the constant threshold but still below the results obtained with the window approach. 3.3 Mixing Features The idea underlying mixing features is using information from different musical dimensions, so that they complete themselves. In that sense, mixing feature seems a perfectly justified operation and even a simple one to do but in practice it is not. The first basic idea used to combine features, was to validate the boundaries intersecting the novelty-score peaks of the three different features alone. Every instant that was repeated at least twice in two different features, in a threshold of 1.5s, was considered. The intersection was done in three different ways, each one taking as reference one of the three features, i. e., first the MFCCs selected peaks are compared to the chromagram and rhythmogram selected peaks; peaks are discarded if they are not repeated at least once, then the same is done for the rhythmogram and for the chromagram. Note that, this can also be viewed as a peak selection process, and not a feature mixture per se, since the idea of different features completing one another is not present in

44 26 CHAPTER 3. METHOD this approach. The second idea was to sum the SDMs before the computation of the novelty-score function, as follows: SDM(M + R) = αsdm(mf CC) + SDM(Rhythmogram) (3.4) SDM(C + R) = βsdm(chroma) + SDM(Rhythmogram) (3.5) SDM(M + C) = SDM(MF CC) + σsdm(chroma) (3.6) SDM(M + C + R) = αsdm(mf CC) + βsdm(chroma) + SDM(Rhythmogram) (3.7) Where, the SDMs respective features are represented in brackets. M, C and R stand for, MFCC, Chromagram and Rhythmogram respectively. The coefficients alpha, beta and sigma are computed as follows: α = mean(sdm(rhythmogram)) ; (3.8) mean(sdm(mf CC) β = mean(sdm(rhythmogram)) ; (3.9) mean(sdm(chroma) σ = mean(sdm(mf CC)) mean(sdm(chroma) ; (3.10) Where, the operation mean() determines the mean value of the matrix. The purpose of it is balancing the factors of the sum, trying to give the same weight to each one. Finally, the third idea was to use a dimensionality reduce method on the concatenated feature vector, combining features in groups of two and three. This created new feature vectors, then used to compute the SDM and the remainder of the method. To that end, the Singular Value Decomposition (SVD) method was used. The SVD is based on a theorem from linear algebra which says that a rectangular matrix M (which in this case represents the feature vectors) can be broken down into the product of three matrices: an orthogonal matrix U, a diagonal matrix S, and the transpose of an orthogonal matrix V. The decomposition is usually presented as: M mn = U mm S mn V T nn (3.11) According to the diagonal of S, which present a descending curve representing the descending representation of each feature vector, the first n vectors from Vnn T are used, meaning that the ones that are left unused are useless or even adverse (noise) for further computations. The results for each hypothesis are presented and discussed in the next chapter.

Music Structure Analysis

Music Structure Analysis Lecture Music Processing Music Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Audio Structure Analysis

Audio Structure Analysis Tutorial T3 A Basic Introduction to Audio-Related Music Information Retrieval Audio Structure Analysis Meinard Müller, Christof Weiß International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de,

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

AUDIO-BASED MUSIC STRUCTURE ANALYSIS

AUDIO-BASED MUSIC STRUCTURE ANALYSIS 11th International Society for Music Information Retrieval Conference (ISMIR 21) AUDIO-ASED MUSIC STRUCTURE ANALYSIS Jouni Paulus Fraunhofer Institute for Integrated Circuits IIS Erlangen, Germany jouni.paulus@iis.fraunhofer.de

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

AUDIO-BASED MUSIC STRUCTURE ANALYSIS

AUDIO-BASED MUSIC STRUCTURE ANALYSIS AUDIO-ASED MUSIC STRUCTURE ANALYSIS Jouni Paulus Fraunhofer Institute for Integrated Circuits IIS Erlangen, Germany jouni.paulus@iis.fraunhofer.de Meinard Müller Saarland University and MPI Informatik

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 1 Methods for the automatic structural analysis of music Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 2 The problem Going from sound to structure 2 The problem Going

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Classification of Timbre Similarity

Classification of Timbre Similarity Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

An Examination of Foote s Self-Similarity Method

An Examination of Foote s Self-Similarity Method WINTER 2001 MUS 220D Units: 4 An Examination of Foote s Self-Similarity Method Unjung Nam The study is based on my dissertation proposal. Its purpose is to improve my understanding of the feature extractors

More information

CS 591 S1 Computational Audio

CS 591 S1 Computational Audio 4/29/7 CS 59 S Computational Audio Wayne Snyder Computer Science Department Boston University Today: Comparing Musical Signals: Cross- and Autocorrelations of Spectral Data for Structure Analysis Segmentation

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

The song remains the same: identifying versions of the same piece using tonal descriptors

The song remains the same: identifying versions of the same piece using tonal descriptors The song remains the same: identifying versions of the same piece using tonal descriptors Emilia Gómez Music Technology Group, Universitat Pompeu Fabra Ocata, 83, Barcelona emilia.gomez@iua.upf.edu Abstract

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Tempo and Beat Tracking

Tempo and Beat Tracking Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Tempo and Beat Tracking Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310,

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310, Aalborg Universitet A Causal Rhythm Grouping Jensen, Karl Kristoffer Published in: Lecture Notes in Computer Science Publication date: 2005 Document Version Early version, also known as pre-print Link

More information

MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC

MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC Maria Panteli University of Amsterdam, Amsterdam, Netherlands m.x.panteli@gmail.com Niels Bogaards Elephantcandy, Amsterdam, Netherlands niels@elephantcandy.com

More information

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS

IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS 1th International Society for Music Information Retrieval Conference (ISMIR 29) IMPROVING RHYTHMIC SIMILARITY COMPUTATION BY BEAT HISTOGRAM TRANSFORMATIONS Matthias Gruhne Bach Technology AS ghe@bachtechnology.com

More information

Audio Feature Extraction for Corpus Analysis

Audio Feature Extraction for Corpus Analysis Audio Feature Extraction for Corpus Analysis Anja Volk Sound and Music Technology 5 Dec 2017 1 Corpus analysis What is corpus analysis study a large corpus of music for gaining insights on general trends

More information

Music Structure Analysis

Music Structure Analysis Overview Tutorial Music Structure Analysis Part I: Principles & Techniques (Meinard Müller) Coffee Break Meinard Müller International Audio Laboratories Erlangen Universität Erlangen-Nürnberg meinard.mueller@audiolabs-erlangen.de

More information

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA

More information

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS Rui Pedro Paiva CISUC Centre for Informatics and Systems of the University of Coimbra Department

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 marl music and audio research lab

Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 marl music and audio research lab Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 Sequence-based analysis Structure discovery Cooper, M. & Foote, J. (2002), Automatic Music

More information

Music Alignment and Applications. Introduction

Music Alignment and Applications. Introduction Music Alignment and Applications Roger B. Dannenberg Schools of Computer Science, Art, and Music Introduction Music information comes in many forms Digital Audio Multi-track Audio Music Notation MIDI Structured

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Priyanka S. Jadhav M.E. (Computer Engineering) G. H. Raisoni College of Engg. & Mgmt. Wagholi, Pune, India E-mail:

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound Pitch Perception and Grouping HST.723 Neural Coding and Perception of Sound Pitch Perception. I. Pure Tones The pitch of a pure tone is strongly related to the tone s frequency, although there are small

More information

Audio Structure Analysis

Audio Structure Analysis Lecture Music Processing Audio Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Music Structure Analysis Music segmentation pitch content

More information

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics) 1 Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics) Pitch Pitch is a subjective characteristic of sound Some listeners even assign pitch differently depending upon whether the sound was

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

WE ADDRESS the development of a novel computational

WE ADDRESS the development of a novel computational IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 663 Dynamic Spectral Envelope Modeling for Timbre Analysis of Musical Instrument Sounds Juan José Burred, Member,

More information

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Lie Lu, Muyuan Wang 2, Hong-Jiang Zhang Microsoft Research Asia Beijing, P.R. China, 8 {llu, hjzhang}@microsoft.com 2 Department

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification 1138 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 6, AUGUST 2008 Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification Joan Serrà, Emilia Gómez,

More information

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t MPEG-7 FOR CONTENT-BASED MUSIC PROCESSING Λ Emilia GÓMEZ, Fabien GOUYON, Perfecto HERRERA and Xavier AMATRIAIN Music Technology Group, Universitat Pompeu Fabra, Barcelona, SPAIN http://www.iua.upf.es/mtg

More information

Toward Automatic Music Audio Summary Generation from Signal Analysis

Toward Automatic Music Audio Summary Generation from Signal Analysis Toward Automatic Music Audio Summary Generation from Signal Analysis Geoffroy Peeters IRCAM Analysis/Synthesis Team 1, pl. Igor Stravinsky F-7 Paris - France peeters@ircam.fr ABSTRACT This paper deals

More information

MUSIC is a ubiquitous and vital part of the lives of billions

MUSIC is a ubiquitous and vital part of the lives of billions 1088 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. 5, NO. 6, OCTOBER 2011 Signal Processing for Music Analysis Meinard Müller, Member, IEEE, Daniel P. W. Ellis, Senior Member, IEEE, Anssi

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals

More information

Towards Supervised Music Structure Annotation: A Case-based Fusion Approach.

Towards Supervised Music Structure Annotation: A Case-based Fusion Approach. Towards Supervised Music Structure Annotation: A Case-based Fusion Approach. Giacomo Herrero MSc Thesis, Universitat Pompeu Fabra Supervisor: Joan Serrà, IIIA-CSIC September, 2014 Abstract Analyzing the

More information

Music Similarity and Cover Song Identification: The Case of Jazz

Music Similarity and Cover Song Identification: The Case of Jazz Music Similarity and Cover Song Identification: The Case of Jazz Simon Dixon and Peter Foster s.e.dixon@qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary

More information

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral

More information

Automatic Labelling of tabla signals

Automatic Labelling of tabla signals ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

10 Visualization of Tonal Content in the Symbolic and Audio Domains

10 Visualization of Tonal Content in the Symbolic and Audio Domains 10 Visualization of Tonal Content in the Symbolic and Audio Domains Petri Toiviainen Department of Music PO Box 35 (M) 40014 University of Jyväskylä Finland ptoiviai@campus.jyu.fi Abstract Various computational

More information

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Content-based music retrieval

Content-based music retrieval Music retrieval 1 Music retrieval 2 Content-based music retrieval Music information retrieval (MIR) is currently an active research area See proceedings of ISMIR conference and annual MIREX evaluations

More information

A New Method for Calculating Music Similarity

A New Method for Calculating Music Similarity A New Method for Calculating Music Similarity Eric Battenberg and Vijay Ullal December 12, 2006 Abstract We introduce a new technique for calculating the perceived similarity of two songs based on their

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Music Complexity Descriptors. Matt Stabile June 6 th, 2008 Music Complexity Descriptors Matt Stabile June 6 th, 2008 Musical Complexity as a Semantic Descriptor Modern digital audio collections need new criteria for categorization and searching. Applicable to:

More information

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013 73 REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation Zafar Rafii, Student

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 1159 Music Structure Analysis Using a Probabilistic Fitness Measure and a Greedy Search Algorithm Jouni Paulus,

More information

Automatic Music Clustering using Audio Attributes

Automatic Music Clustering using Audio Attributes Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,

More information

Music Structure Analysis

Music Structure Analysis Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Music Structure Analysis Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY Matthias Mauch Mark Levy Last.fm, Karen House, 1 11 Bache s Street, London, N1 6DL. United Kingdom. matthias@last.fm mark@last.fm

More information

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller) Topic 11 Score-Informed Source Separation (chroma slides adapted from Meinard Mueller) Why Score-informed Source Separation? Audio source separation is useful Music transcription, remixing, search Non-satisfying

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Analysis, Synthesis, and Perception of Musical Sounds

Analysis, Synthesis, and Perception of Musical Sounds Analysis, Synthesis, and Perception of Musical Sounds The Sound of Music James W. Beauchamp Editor University of Illinois at Urbana, USA 4y Springer Contents Preface Acknowledgments vii xv 1. Analysis

More information

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

The Intervalgram: An Audio Feature for Large-scale Melody Recognition The Intervalgram: An Audio Feature for Large-scale Melody Recognition Thomas C. Walters, David A. Ross, and Richard F. Lyon Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043, USA tomwalters@google.com

More information

An Accurate Timbre Model for Musical Instruments and its Application to Classification

An Accurate Timbre Model for Musical Instruments and its Application to Classification An Accurate Timbre Model for Musical Instruments and its Application to Classification Juan José Burred 1,AxelRöbel 2, and Xavier Rodet 2 1 Communication Systems Group, Technical University of Berlin,

More information

Research Article Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony

Research Article Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 007, Article ID 7305, pages doi:0.55/007/7305 Research Article Multiple Scale Music Segmentation Using Rhythm, Timbre,

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

FINDING REPEATING PATTERNS IN ACOUSTIC MUSICAL SIGNALS : APPLICATIONS FOR AUDIO THUMBNAILING.

FINDING REPEATING PATTERNS IN ACOUSTIC MUSICAL SIGNALS : APPLICATIONS FOR AUDIO THUMBNAILING. FINDING REPEATING PATTERNS IN ACOUSTIC MUSICAL SIGNALS : APPLICATIONS FOR AUDIO THUMBNAILING. JEAN-JULIEN AUCOUTURIER, MARK SANDLER Sony Computer Science Laboratory, 6 rue Amyot, 75005 Paris, France jj@csl.sony.fr

More information

Sparse Representation Classification-Based Automatic Chord Recognition For Noisy Music

Sparse Representation Classification-Based Automatic Chord Recognition For Noisy Music Journal of Information Hiding and Multimedia Signal Processing c 2018 ISSN 2073-4212 Ubiquitous International Volume 9, Number 2, March 2018 Sparse Representation Classification-Based Automatic Chord Recognition

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. Pitch The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. 1 The bottom line Pitch perception involves the integration of spectral (place)

More information

The Effect of DJs Social Network on Music Popularity

The Effect of DJs Social Network on Music Popularity The Effect of DJs Social Network on Music Popularity Hyeongseok Wi Kyung hoon Hyun Jongpil Lee Wonjae Lee Korea Advanced Institute Korea Advanced Institute Korea Advanced Institute Korea Advanced Institute

More information