An Examination of Foote s Self-Similarity Method

Size: px
Start display at page:

Download "An Examination of Foote s Self-Similarity Method"

Transcription

1 WINTER 2001 MUS 220D Units: 4 An Examination of Foote s Self-Similarity Method Unjung Nam The study is based on my dissertation proposal. Its purpose is to improve my understanding of the feature extractors used in the field of content-based music retrieval/classification. I am particularly interested in Jonathan Foote s self-similarity method. I have summarized various articles related to feature extractors used in the field of audio information retrieval/classification. I have also included two experiments with Foote s method on real musical pieces. Even though the analyses of these experiments are still in progress, this study has helped me in understanding the structural process of Foote's method. It seems that each variable in the system such as frame rate, kernel size, etc., should be optimized for each case by conducting thorough empirical experiments on different kinds of musical signals. Moreover, processing with this system on a full piece of music (usually more than 1 minute long) requires a lot of computing time. Hence, one of the final aim of my study is to suggest ways of improving some of the limitations of this system. Table of Contents Motivation 2 1. Features used in Content-Based Music Retrieval Methods Spectral Features Temporal Features Other Musical Features Overall Structural Features 5 2. Foote s Self-Similarity Method Parameterization Distance Matrix Embedding Kernel Correlation Experiments with Real Music Bach s Well-Tempered Clavier I, Prelude no. 1 C major Bach s Air on a G String Discussion 22 Appendix 23 Bibliography 30

2 Motivation Last quarter I worked on an automatic musical style classification system that classifies three different genres (classical, pop/rock, and jazz) of music based on the three acoustic features, spectral centroid, short-time energy function, and short-time zero-crossing rate. Though the system worked well in clustering a small number of digital music files (20 files) by K-means clustering algorithm and K nearest neighbor classifier, it failed to choose the most similar model space as the input signal. At first, I concluded that the failure was due to the classification model space based on only a small number of music files and an inappropriate selection of features/clustering method. However, as I continued my research, the increase in the number of features and music files of the classification model space or more sophisticated clustering method did not necessarily improve the system. An expert, whom I met at my poster demo session of this system, pointed out that the more features I use, the more complicated the system becomes, and consequently it becomes more difficult for the system to work well. I think that this method can be used as a finer level of the system since it is only successful in clustering different musical genres in a coarse way. It is necessary to search for methods for a lower level (a more detailed level: that is, classifying music with different tempo or with different lead vocals, etc) classification system. In this case the time-varying characteristic of music should be taken into account. I think that Foote s method seems promising for this purpose because his approach uses the signal to model itself, and thus does not rely on particular acoustic cues nor requires training. Also, since it can find individual note boundaries or even natural segment boundaries such as verse/chorus or speech/music transitions, it is much more effective in comparing similar musical pieces in terms of their musical information. His method is interesting because regardless of the parameterization used, the result is a compact vector of parameters for every frame and the actual parameterization is not crucial as long as similar sounds yield similar parameters. It is no doubt an advantage that different parameterization may be used for different applications. 2

3 1.Features used in Content-Based Music Retrieval Methods Sounds have traditionally been described by their pitch, loudness, and timbre. Many of the previous works related to general audio content retrieval tried to extract these features. It is apparent that music needs more high-level information such as musical characteristics than just the features listed above. I think that for some cases, features used in general audio content retrieval may not be efficient when used in music retrieval/classification. Here are the features used in audio information retrieval/classification systems. I will further experiment with these and try to find useful feature extractors for future music classification systems. 1.1 Spectral Features Timbre-related features are used often in audio information classification/retrieval. In Scheirer and Slaney's work (1997), for speech/music discriminator, spectral centroid and its variance are used assuming that music involve percussive sounds which, by including high-frequency noise, push the spectral mean higher. In addition, excitation energies can be higher for music than for speech. They also calculated spectral flux, which means the 2-norm of the frame-to-frame spectral amplitude difference vector. Their assumption is that music has a higher rate of range, and goes through more drastic frame-to-frame changes than speech does. Also, roll off of the spectrum is used for measuring the "skewness" of the spectral shape assuming that unvoiced speech has a high proportion of energy contained in the high-frequency range of the spectrum, where most of the energy for unvoice/music is contained in lower bands. Classifying audio content into 10 groups: animal, bells, crowds, laughter, machine, instrument, male speech, female speech, telephone, and water. Wold (1996} analyzed Mel-Filtered Cepstral Coefficients and centroid of STFT for the purpose of clustering different kinds of audio sources. At the music analysis level, they have tried to identify the source instruments by training all possible cases of instruments using histogram modelling. Zhang and Kuo (1999a, 1999b) and Tzanetakis and Cook (1999} also used the features listed above in a similar way. Whether the methods listed above will work fine or not in order to cluster musical pieces in terms of their instrumentation should be examined carefully with more numbers of music from the database. Some of the spectral features that I might add in my future research are as the follows. The Spectrogram might show the textural complexity in a sound mixture. It may be possible to analyze how many sources are playing in it. With the spectrogram, it may be possible to figure out if voice singing is involved in the signal and if it is a female/male or purely instrumental. Also, high frequency distribution may show if there is a drum beat or not. With the spectral centroid it is possible to show the frequency components in the signal indicating what instrument combination might be used in it. 3

4 Mel-Frequency Cepstral Coefficient is a promising feature extractor that allows us to see the spectral shape of the source signal. I will include a detailed explanation/experiment with MFCC in my next report. 1.2 Temporal Features Tempo or a particular rhythmic pattern is a crucial component for measuring similarity between musical pieces. Tempo can be classified as fast, medium, slow and indicate some kinds of styles of music. A particular rhythmic pattern can be used to track a type of folk music such as Latin dance, Salsa, or Tango. A substantial amount of work has been performed in beat tracking of music. Most previous beat-tracking systems dealt with MIDI signals (Allen, 1990} and wasn't successful in processing, in real-time, audio signals containing sounds of various instruments. Scheirer (1998) uses correlated energy peaks across sub-bands attempting to track beats in popular music. A more complete approach to beat tracking of acoustic signals was developed by Goto and Muraoka (1994,1998). They developed a system called BTS which uses frequency histograms to find significant peaks in the low frequency regions, corresponding to the frequencies of the bass and snare drums, and then tracked these low frequency signals by matching patterns of onset times to a set of pre-stored drum beat patterns. This method was successful in tracking the beat of most of the popular songs on which it was tested. Their later system allowed music without drums to be tracked by recognizing chord changes, assuming that significant harmonic changes occur at strong rhythmic positions. Dixon (2000) said that these systems required a powerful parallel computer in order to run in real time and he has built a modest one. In his system, Dixon detects the salient note onsets in time-domain and determines possible inter-beat intervals by clustering algorithm. Then he employs multiple agents to find a sequence of events that represent the beat of the popular music. 1.3 Other Musical Features Other music-related features are harmony and melody. It might be impossible to extract these features from all kinds of music. Harmony and melody extractors will be applied mainly to classical music and to music that is relatively in simple texture. Fujishima (1999) developed a real-time system that recognizes music chords from acoustic signals. His system uses DFT spectrum and derives Pitch Class Profile (PCP). And pattern-matching algorithm is applied to PCP to determine the chord type and root. His system was tested on classic music pieces and it mentioned that the future works include a better classifier, the table tuning technique, bass note tracking, used of musical context for the symbol level error correction. Another interesting work by Purwins (2000) was explored at CCRMA. They developed a system that uses "cq-profiles" applied to track tonal modulations in music. Cq-profiles are 12-dimensional vectors, each component referring to a pitch class. They are calculated with the constant Q filter bank (Brown, 1992) and by using the cq-profile techniques as a simple auditory model. Self Organizing Map (SOM) was combined for an 4

5 arrangement of key emerges, that resembles results from psychological experiments, and from music theory. There is also a work exploring to track melody and bass lines from sound mixtures. Goto (2000) built a system for estimating the fundamental frequency (F0) of melody and bass lines in monaural real-world musical audio signals. They propose to use a predominant- F0 estimation method called PreFEst that obtains the most predominant F0 supported by harmonics within an intentionally limited frequency range. It evaluates the relative dominance of every possible F0 by using the Expectation-Maximazation algorithm and considers the temporal continuity of F0s by using a multiple-agent architecture. There are several research going on in the field of automatic music transcription. 1.4 Overall Structural Features It is possible to analyze if the changes in tempo or harmony occur in music or not. Furthermore, whether it occurs in a regular pattern or not might determine a certain style of music. Scheirer (1999) developed a new technique for correlogram (Licklider, 1951} whose approach is to understand musical signal without separation. The algorithm was demonstrated to locate perceptual events in time and frequency. The model stands as a theoretical alternative to methods that use pitch as their primary grouping cues. It operates within a strict probabilistic framework, which makes it convenient to incorporate into a larger signal-understanding tested. Though his methods locates many of the perceptual objects, it needs a pre-determined interpretation of its mapping. Foote (1999a) did a very interesting work on music self-similarity analysis. His method automatically locates points of significant change in music by analyzing local self-similarity. This method can find individual note boundaries or even natural segment boundaries such as verse/chorus or speech/music transitions, even in the absence of cues such as silence. This approach uses the signal to model itself, and thus does not rely on particular acoustic cues nor requires training. His method has applications in indexing, segmenting, summarization, and beat tracking of music. The details of the Foote's method will be summarized based on his paper (Foote, 1999a) and reviewed in the following section. 5

6 2. Foote s Self-Similarity Method The method produces a time series that is proportional to the acoustic novelty of source audio at any instant. High values and peaks correspond to large audio changes. The novelty score can be thresholded to find these instances, which can be used as segment boundaries. The system flow chart is shown below (Foote, 1999a). Source Audio Parameterization Distance matrix embedding Kernel correlation Novelty Score Thresholding Segment boundaries 2.1 Parameterization The first step is to parameterize the audio. This is typically done by windowing the audio waveform. Variable window width and overlaps can be used. Each frame is then parameterized using a standard analysis such as a Fourier transform or Mel-Frequency Cepstral Coefficients analysis. Other parameterization includes those based on linear prediction, psychoacoustic considerations (Slaney, 1998) or even a combination (Perceptual Linear Prediction). He mentions that regardless of the parameterization used, the result is a compact vector of parameters for every frame. Though the actual parameterization is not crucial as long as similar sounds yield similar parameters. Different parameterization may be very useful for different applications; for example he has shown that the MFCC representation, which preserves the coarse spectral shape while discarding fine harmonic structure due to pitch, may be particularly appropriate for certain applications. Thus MFCCs will tend to match similar timbres rather than exact pitches. He claims that thus the system is very flexible and can subsume most any existing audio analysis method. 2.2 Distance Matrix Embedding Once the audio has been parameterized, it is then embedded in a 2 dimensional representation. The next figure shows the step schematically. 6

7 start Waveform i j end Similarity Matrix S D(i,j) end i start j A measure D of the (dis)similarity between feature vectors v i and v j is calculated between every pair of audio frames i and j. A simple distance measure is the Euclidean distance in the parameter space, that is the square root of the sum of the squares of the differences of each vector parameter. D E (i,j) v i v j Another useful metric of vector similarity is the scalar (dot) product of the vectors. To remove the dependence on magnitude, (and hence energy, given our features), the product can be normalized to give the cosine of the angle between the parameter vectors. This is equivalent to the cosine of the angle between the vectors and has the property that it yields a large similarity score even if the vectors are small in magnitude. D c (i, j) vi vj vi vj Using the cosine measure means that windows with low energy, such as those containing silence, will be spectrally similar, which is a generally desirable. Because windows, hence feature vectors, occur at a rate much faster than typical musical events, a better similarity measure can be obtained by computing the vector correlation over a window w. This also captures the time dependence of the vectors. To result in a high similarity score, vectors in a window must not only be similar but their sequence must be similar as well. 1 1 w D (i,j,w) w k = 0 D( i + k, j + k) 7

8 The distance measure is a function of two frames, hence instants in the source signal. It is convenient to consider the similarity between all possible instants in a signal. Embedding the distance measure in a two-dimensional representation does this. The matrix S contains the similarity metric calculated for all frame combinations, hence time indexes i and j such that the i, jth element of S is D (i, j). In general S will have maximum values on the diagonal (because every window will be maximally similar to itself); furthermore if D is symmetric then S will be symmetric as well. To simplify computation, the similarity can be represented in the slanted domain L (i,l) wherelisthelagl=i-j. S can be visualized as a square image such that each pixel i, j is given a gray scale value proportional to the similarity measure D (i, j), and scaled such that the maximum value is given the maximum brightness. Regions of high audio similarity, such as silence or long sustained notes, appear as bright squares on the diagonal. Repeated figures, such as themes, phrases, or choruses, will be visible as bright off-diagonal rectangles. If the music has a high degree of repetition, this will be visible as diagonal stripes or checkerboards, offset from the main diagonal by the repetition time. Let s take an example. This is a 3-dimensional feature vector from the parameterization, repeating its pattern globally three times by 12 elements as follows: ftv = red blue purple 8

9 Here is a matrix S calculated by Euclidean distance method. X-axis/Y-axis are time (s). Note that the diagonal portion is black because the Euclidian distance is 0 when the distance of the frame is calculated to itself. Also since Euclidian distance method gets values from 0 to infinity, it may not be a useful tool for plotting similarity matrix. Here is a matrix S calculated cosine similarity method. 9

10 As I draw the red lines in each pattern, it can be clearly seen that the feature vectors are repeated three times by 13 elements. The repeated pattern (marked as a bracket in the feature vectors and which are the 3rd, 4th, 5th elements in blue box) shows a relatively bright color. The 3,4,5,6,7th elements are not changing much, so they function like a sustained tone or repeated pattern. Thus the purple line portion shows a bright color. ConsiderasegmentfromthecosinematrixS. White squares on the diagonal correspond to the notes, which have high self-similarity; black squares on the off-diagonals correspond to regions of low cross-similarity. If we assume we are using the cosine similarity matrix S, similar regions will be close to 1 while dissimilar regions will be closer to -1. In the figure above, sky-blue lined-square is the value of cross-similarity between two sky-blue lined-rectangles. The same rule is applied to the red lined-square. The sky-blue lined square shows brighter color than the red-lined square which means that the red-lined rectangles have a less similar correlation than the sky-blue lined-rectangles. Here is a windowed matrix S calculated cosine similarity method. The window size is 4. 10

11 You can see the matrix S smoothed. This windowed matrix S smoothes the radical change appeared in the original S. It can be useful to track individual notes or other musical events because window rates (or the rates of feature vectors) usually are higher than typical musical events. Window rates cannot be set to same rates as musical events because it must be higher than that and track detailed feature vectors. 2.3 Kernel Correlation The structure of S is the key to the novelty measure. Finding the instant when the notes change can be done by correlating S with a kernel that itself looks like a checkerboard. Perhaps the simplest checkerboard kernel is the 2x2 unit kernel. 1 1 The unit checkerboard kernel can be decomposed into coherence and anticoherence kernelsasfollows: = The first term measures the self-similarity on either side of the center point; this will be high when both regions are self-similar. The second term measures the cross-similarity between the two regions; this will be high when the regions are substantially similar, thus with little difference across the center point. The difference of the two values estimates the novelty of the signal at the center point; this will have a high value when the two regions are self-similar but different from each other. Correlating a checkerboard kernel with the similarity matrix S results in a measure of novelty. To see how this works, imagine sliding C along the diagonal of our example, and summing the element-by-element product of C and S. WhenCis over a relatively uniform region, such as a sustained note, the positive and negative regions will tend to sum to zero. Conversely, when C is positioned exactly at the crux of the checkerboard, the negative regions will multiply the negative regions of low cross-similarity and the overall sum will be large. Thus calculating this correlation along the diagonal of S gives a time-aligned measure of audio novelty N(i), where i is the frame number, hence time index, corresponding to the original source audio. By convention, the kernel C N(i) = w / 2 m= w / 2 w / 2 n= w / 2 C( m, n) S( i + m, i + n) 11

12 has a width (lag) of W and is centered 0,0. For computation, S can be zero-padded to avoid undefined values, or, as in the present examples, only computed for the interior of the signal where the kernel overlaps S completely. Thus only regions of S with a lag of W or smaller are used; slant representation is particularly helpful. Also typically both S and C are symmetric thus only one-half of the values under the double summation (those for m n) need to be computed. The width of the kernel W directly affects the properties of the novelty measure. A small kernel detects novelty on a short time scale, such as beats or notes. Increasing the kernel size decreases the time resolution, and increases the length of novel events that can be detected. Larger kernels average over short-time novelty and detect longer structure, such as musical transitions like that between verse and chorus, key modulations, or symphonic movements. This method has no a-priori knowledge of musical phrases or pitch, but finding perceptually and musically significant points. For example, the next figure show a novelty score produced from cosine matrix S with kernel size 2 and 4 respectively. X-axis is time (s). Y-axis is novelty score. Kernels can be smoothed to avoid edge effects using windows (such as a Hamming) that taper towards zero at the edges. For the experiments presented here, a radially-symmetric Gaussian function is used. 12

13 Below figure shows a novelty score of the previous figure with radial Gaussian taper. Each has kernels of 2 and 4 respectively. 3. Experiments with Real Music The sound files I have got the excerpt are the followings. They have sampled at Hz, 16 bits, and mono. I have used MFCC (code from Slaney s Auditory Toolbox) for the parameterization. Please see my webpage to listen to these sound samples at Bach s Well-Tempered Clavier I, Prelude no. 1 C major Since Foote did one of his experiments on this musical piece, I tried to test my code on it. I have tested on two version of this piece of music. The duration of the pieces is about 12 seconds. They are: 1) acoustic realization from MIDI data 2) chorus singing arrangement. This is the score of this music. 13

14 The following figures are cosine similarity matrices of each. 1) Acoustic realization from MIDI data. X-axis/Y-axis are time (s). 2) Chorus singing arrangement. 14

15 Below figure shows a novelty score of the similarity matrices above with radial Gaussian taper. 1) Acoustic realization from MIDI data: size(s) = 548 With kernel size 32, each peak clearly shows individual notes in the music. X-axis is time (s). Y-axis is novelty score. With kernel size = 400 Note that the figure above only shows the novelty score for about 2 seconds. It is because the kernel size 400, which is about 80% of the size of the similarity matrix. The checkerboard with kernel size 400 slides on the diagonal of the matrix only for about 2 seconds showing the novelty score calculated on 400x400 checkerboard kernel. In the 15

16 figure below, the large checkerboard kernel slides along the diagonal of the similarity matrix several times though it is marked three times in the graph. However, because the size of the kernel is too larger, only short duration of the novelty score can be produced as we have seen in the previous figure. The blue line is the duration for which the novelty score is produced. 2) Chorus singing arrangement (size (S) = 1728) With kernel size= 64 16

17 With kernel size=1000 Note that the figure above only shows the novelty score for calculated for the duration of 2 seconds in S. It is because the kernel size 1000, which is about 60 % of the size of the similarity matrix. It seems that the red circled peaks are the perceptually salient pitch progressing e-f-f-e. It gets repeated 2 times in 8 occurrences. 3.2 Bach s Air on a G String The similarity matrix on the same music with different versions is tested. If they return similar matrices, this method may be used for clustering same musical pieces played by different instrument arrangements. In other words, this method works timbre independently.û I have experimented with four different versions of Bach s Air on a G String. The duration is about 30 seconds long and sampled at Hz, 16 bits, and mono. Four different versions of audio files are played by: 1) acoustic realization from midi data 2) quartet 3) wind orchestra 4) jazz arrangement. This is the score of this music. 17

18 The following figures are the cosine similarity matrices of the four pieces. The tempo of each one are a bit different, thus has little bit different durations. X-axis is time(s) and y- axis time(s). 1) Acoustic realization from MIDI data (framerate=50). X-axis/Y-axis are time (s). 2) Quartet (framerate=50) 18

19 3) Wind Orchestra (framerate=25) 4) Jazz arrangement (framrate=50) 19

20 The following figures show the novelty score graph with different kernel width. 1) Acoustic realization from MIDI data: kernel size 64 with Gaussian taper In this figure each peak represent the individual bass 8 th notes. They appear approximately 32 of 8 th notes which I have drawn read circles. X-axis is time (s). Y-axis is novelty score. 2) Quartet: kernel size 64 with Gaussian taper In this figure each peak represent the individual bass 8 th notes. They appear approximately 36 of 8 th notes. 20

21 3) Wind orchestra This figure itself shows very low novelty score compared to others. I think that it is because the novelty score of this signal is very low and comparatively stable. It can be seen as the smoothened brightness in the similarity matrix. When I hear this music, it is played very smoothly. However, I should study on this case further in the future in order to clarify what is happening here. Note that X-axis shows 10-3 values.. 4) Jazz arrangement. In this figure each peak represent the individual bass 8 th notes. They appear approximately 36 of 8 th notes.. 21

22 3.3 Discussion By just looking at the graphs and by comparing them we see some similarities. However, there should be a scientific measure that can analyze the similarity between the two samples of graphs. I haven t yet developed a method on how to measure the (dis) similarities among the graphs. One way is to build a representative case and compare other versions of the same musical pieces with this representative case. My hypothesis is that there must be a way to use musical information embedded in symbolic representation of music, like MIDI or Humdrum and make some general rule to utilize them in order to plot a representative case of a piece of music. The beat spectrum on the novelty score will also be explored in the next stage of my research. As I mentioned earlier, each variable in the system such as frame rate, kernel size, and etc. should be carefully examined. The thorough empirical experiments on different kinds of musical signals will yield optimal conditions for each case. Also, the problem of expensive computing time could be solved by using C code together with MATLAB. 22

23 Bibliography Allen, P. E. and R. B. Dannenberg (1990). Tracking musical beats in real time. In Proceedings of 1990 ICMC, pp Assayag, G., S. Dubnov, and O. Delerue (1999). Guessing the composer s mind: applying universal prediction to musical style. In Proceedings of the 1999 ICMC, Beijing, China. Brown, J. C. and M. S. Puckette (1992). An efficient algorithm for the calculation of a constant q transform. Journal of Acoustical Society of America 92 (5), Dixon, S. (2000). A lightweight multi- agent musical beat tracking system. In Proceedings of the AAAI Workshop on Artificial Intelligence and Music: Towards Formal Models for Composition, Performance and Analysis. Duda, R. O., P. E. Hart, and D. G. Stork (2000). Pattern Classification (second ed.). New York: WileyInterscience. Foote, J. and S. Uchihashi. (2001) The Beat Spectrum: A New Approach to Rhythm Analysis submitted to ICME Foote, J. (1999a) Methods for the Automatic Analysis of Music and Audio. FXPAL Technical Report TR , December 1999.Û Foote, J. (1999b) Visualizing Music and Audio using Self-Similarity. In Proceedings of ACM Multimedia 99, (Orlando, FL) ACM Press, pp , Foote, J. (1997a) Content-Based Retrieval of Music and Audio. InC.-C.J.Kuoetal., editor, Multimedia Storage and Archiving Systems II, Proc. of SPIE, Vol. 3229, pp , Foote, J. (1997b) A Similarity Measure for Automatic Audio Classification. In Proc. AAAI 1997 Spring Symposium on Intelligent Integration and Use of Text, Image, Video, and Audio Corpora. Stanford, March Fujishima, T. (1999). Realtime chord recognition of musical sound: a system using common lisp music. In Proceedings of 1999 ICMC. Goto, M. (2000). A robust predominant-f0 estimation method for real- time detection of melody and bass lines in CD recordings. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Goto, M. and Y. Muraoaka (1994). A beat tracking system for acoustic signals of music. In Proceedings of 1994 ACM Multimedia. 23

24 Goto, M. and Y. Muraoaka (1998). An audio- based real- time beat tracking system and its applications. In Proceedings of 1998 ICMC. Hippel, P. V. (Winter 2000). Questioning a melodic archetype: do listeners use gap- fill to classify melodies? Music Perception 18 (2), Huron, D. (2000). Perceptual and cognitive applications in music information retrieval. In Proceedings of International Symposium on Music Information Retrieval, MUSIC IR Licklider, J. R. (1951). A duplex theory of pitch perception. Experientia 7, Logan, B. and S. Chu (2000). Music summarization using key phrases. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Purwins, H., B. Blankertz, and K. Obermayer (2000). A new method for tracking modulations in tonal music in audio data format. In Proceedings of the IEEE- INNS- ENNS International Joint Conference on Neural Networks. Scheirer, E. and M. Slaney (1997). Construction and evaluation of a robust multifeature speech/ music discriminator. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp Scheirer, E. D. (1998). Tempo and beat analysis of acoustic musical signals. Journal of Acoustical Society of America 103 (1), Scheirer, E. D. (1999). Towards music understanding without separation: segmenting music with correlogram comodulation. In Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Tzanetakis, G. and P. Cook (1999). Multifeature audio segmentation for browsing and annotation. In Proceedings of 1999 IEEE WASPAA. Wold, E., T. Blum, D. Keislar, and J. Wheaton (1996). Content- based classification, search and retrieval of audio data. IEEE Multimedia Magazine 3 (3), Zhang, T. and C.- C. J. Kuo (1999a). Heuristic approach for generic audio data segmentation and annotation. In Proceedings of 7th ACM on Multimedia, pp Zhang, T. and C.- C. J. Kuo (1999b). Hierarchical system for content- based audio classification and retrieval. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

The song remains the same: identifying versions of the same piece using tonal descriptors

The song remains the same: identifying versions of the same piece using tonal descriptors The song remains the same: identifying versions of the same piece using tonal descriptors Emilia Gómez Music Technology Group, Universitat Pompeu Fabra Ocata, 83, Barcelona emilia.gomez@iua.upf.edu Abstract

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

A Beat Tracking System for Audio Signals

A Beat Tracking System for Audio Signals A Beat Tracking System for Audio Signals Simon Dixon Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria. simon@ai.univie.ac.at April 7, 2000 Abstract We present

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

CS 591 S1 Computational Audio

CS 591 S1 Computational Audio 4/29/7 CS 59 S Computational Audio Wayne Snyder Computer Science Department Boston University Today: Comparing Musical Signals: Cross- and Autocorrelations of Spectral Data for Structure Analysis Segmentation

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Audio Retrieval by Rhythmic Similarity

Audio Retrieval by Rhythmic Similarity Audio Retrieval by Rhythmic Similarity Jonathan Foote Matthew Cooper Unjung Nam FX Palo Alto Laboratory, Inc. FX Palo Alto Laboratory, Inc. CCRMA 34 Hillview Ave. 34 Hillview Ave. Department of Music Building

More information

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310,

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310, Aalborg Universitet A Causal Rhythm Grouping Jensen, Karl Kristoffer Published in: Lecture Notes in Computer Science Publication date: 2005 Document Version Early version, also known as pre-print Link

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 1 Methods for the automatic structural analysis of music Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 2 The problem Going from sound to structure 2 The problem Going

More information

10 Visualization of Tonal Content in the Symbolic and Audio Domains

10 Visualization of Tonal Content in the Symbolic and Audio Domains 10 Visualization of Tonal Content in the Symbolic and Audio Domains Petri Toiviainen Department of Music PO Box 35 (M) 40014 University of Jyväskylä Finland ptoiviai@campus.jyu.fi Abstract Various computational

More information

Music Structure Analysis

Music Structure Analysis Lecture Music Processing Music Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam GCT535- Sound Technology for Multimedia Timbre Analysis Graduate School of Culture Technology KAIST Juhan Nam 1 Outlines Timbre Analysis Definition of Timbre Timbre Features Zero-crossing rate Spectral

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Audio Feature Extraction for Corpus Analysis

Audio Feature Extraction for Corpus Analysis Audio Feature Extraction for Corpus Analysis Anja Volk Sound and Music Technology 5 Dec 2017 1 Corpus analysis What is corpus analysis study a large corpus of music for gaining insights on general trends

More information

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark 214 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION Gregory Sell and Pascal Clark Human Language Technology Center

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval with Temporal Features and Timbre Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC

More information

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Priyanka S. Jadhav M.E. (Computer Engineering) G. H. Raisoni College of Engg. & Mgmt. Wagholi, Pune, India E-mail:

More information

Scoregram: Displaying Gross Timbre Information from a Score

Scoregram: Displaying Gross Timbre Information from a Score Scoregram: Displaying Gross Timbre Information from a Score Rodrigo Segnini and Craig Sapp Center for Computer Research in Music and Acoustics (CCRMA), Center for Computer Assisted Research in the Humanities

More information

Speech To Song Classification

Speech To Song Classification Speech To Song Classification Emily Graber Center for Computer Research in Music and Acoustics, Department of Music, Stanford University Abstract The speech to song illusion is a perceptual phenomenon

More information

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA

More information

Music Database Retrieval Based on Spectral Similarity

Music Database Retrieval Based on Spectral Similarity Music Database Retrieval Based on Spectral Similarity Cheng Yang Department of Computer Science Stanford University yangc@cs.stanford.edu Abstract We present an efficient algorithm to retrieve similar

More information

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Analytic Comparison of Audio Feature Sets using Self-Organising Maps Analytic Comparison of Audio Feature Sets using Self-Organising Maps Rudolf Mayer, Jakob Frank, Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology,

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

A prototype system for rule-based expressive modifications of audio recordings

A prototype system for rule-based expressive modifications of audio recordings International Symposium on Performance Science ISBN 0-00-000000-0 / 000-0-00-000000-0 The Author 2007, Published by the AEC All rights reserved A prototype system for rule-based expressive modifications

More information

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION

TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION TOWARDS IMPROVING ONSET DETECTION ACCURACY IN NON- PERCUSSIVE SOUNDS USING MULTIMODAL FUSION Jordan Hochenbaum 1,2 New Zealand School of Music 1 PO Box 2332 Wellington 6140, New Zealand hochenjord@myvuw.ac.nz

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Computer Coordination With Popular Music: A New Research Agenda 1

Computer Coordination With Popular Music: A New Research Agenda 1 Computer Coordination With Popular Music: A New Research Agenda 1 Roger B. Dannenberg roger.dannenberg@cs.cmu.edu http://www.cs.cmu.edu/~rbd School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Classification of Timbre Similarity

Classification of Timbre Similarity Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common

More information

Reducing False Positives in Video Shot Detection

Reducing False Positives in Video Shot Detection Reducing False Positives in Video Shot Detection Nithya Manickam Computer Science & Engineering Department Indian Institute of Technology, Bombay Powai, India - 400076 mnitya@cse.iitb.ac.in Sharat Chandran

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System for Audio Signals

Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System for Audio Signals Beat Tracking based on Multiple-agent Architecture A Real-time Beat Tracking System for Audio Signals Masataka Goto and Yoichi Muraoka School of Science and Engineering, Waseda University 3-4-1 Ohkubo

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013 73 REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation Zafar Rafii, Student

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY Matthias Mauch Mark Levy Last.fm, Karen House, 1 11 Bache s Street, London, N1 6DL. United Kingdom. matthias@last.fm mark@last.fm

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui

More information

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Lie Lu, Muyuan Wang 2, Hong-Jiang Zhang Microsoft Research Asia Beijing, P.R. China, 8 {llu, hjzhang}@microsoft.com 2 Department

More information

Analysing Musical Pieces Using harmony-analyser.org Tools

Analysing Musical Pieces Using harmony-analyser.org Tools Analysing Musical Pieces Using harmony-analyser.org Tools Ladislav Maršík Dept. of Software Engineering, Faculty of Mathematics and Physics Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech

More information

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach

More information

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene Beat Extraction from Expressive Musical Performances Simon Dixon, Werner Goebl and Emilios Cambouropoulos Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria.

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Music Structure Analysis

Music Structure Analysis Overview Tutorial Music Structure Analysis Part I: Principles & Techniques (Meinard Müller) Coffee Break Meinard Müller International Audio Laboratories Erlangen Universität Erlangen-Nürnberg meinard.mueller@audiolabs-erlangen.de

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

Music Structure Analysis

Music Structure Analysis Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Music Structure Analysis Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Roger B. Dannenberg and Ning Hu School of Computer Science, Carnegie Mellon University email: dannenberg@cs.cmu.edu, ninghu@cs.cmu.edu,

More information

Analysis of local and global timing and pitch change in ordinary

Analysis of local and global timing and pitch change in ordinary Alma Mater Studiorum University of Bologna, August -6 6 Analysis of local and global timing and pitch change in ordinary melodies Roger Watt Dept. of Psychology, University of Stirling, Scotland r.j.watt@stirling.ac.uk

More information

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC

MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC Maria Panteli University of Amsterdam, Amsterdam, Netherlands m.x.panteli@gmail.com Niels Bogaards Elephantcandy, Amsterdam, Netherlands niels@elephantcandy.com

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

Melody Retrieval On The Web

Melody Retrieval On The Web Melody Retrieval On The Web Thesis proposal for the degree of Master of Science at the Massachusetts Institute of Technology M.I.T Media Laboratory Fall 2000 Thesis supervisor: Barry Vercoe Professor,

More information

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach

EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach EE373B Project Report Can we predict general public s response by studying published sales data? A Statistical and adaptive approach Song Hui Chon Stanford University Everyone has different musical taste,

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

Music Understanding At The Beat Level Real-time Beat Tracking For Audio Signals

Music Understanding At The Beat Level Real-time Beat Tracking For Audio Signals IJCAI-95 Workshop on Computational Auditory Scene Analysis Music Understanding At The Beat Level Real- Beat Tracking For Audio Signals Masataka Goto and Yoichi Muraoka School of Science and Engineering,

More information

Toward Automatic Music Audio Summary Generation from Signal Analysis

Toward Automatic Music Audio Summary Generation from Signal Analysis Toward Automatic Music Audio Summary Generation from Signal Analysis Geoffroy Peeters IRCAM Analysis/Synthesis Team 1, pl. Igor Stravinsky F-7 Paris - France peeters@ircam.fr ABSTRACT This paper deals

More information

Audio Structure Analysis

Audio Structure Analysis Tutorial T3 A Basic Introduction to Audio-Related Music Information Retrieval Audio Structure Analysis Meinard Müller, Christof Weiß International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de,

More information

Panel: New directions in Music Information Retrieval

Panel: New directions in Music Information Retrieval Panel: New directions in Music Information Retrieval Roger Dannenberg, Jonathan Foote, George Tzanetakis*, Christopher Weare (panelists) *Computer Science Department, Princeton University email: gtzan@cs.princeton.edu

More information

Features for Audio and Music Classification

Features for Audio and Music Classification Features for Audio and Music Classification Martin F. McKinney and Jeroen Breebaart Auditory and Multisensory Perception, Digital Signal Processing Group Philips Research Laboratories Eindhoven, The Netherlands

More information

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Music Emotion Recognition. Jaesung Lee. Chung-Ang University Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or

More information

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

The Intervalgram: An Audio Feature for Large-scale Melody Recognition The Intervalgram: An Audio Feature for Large-scale Melody Recognition Thomas C. Walters, David A. Ross, and Richard F. Lyon Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043, USA tomwalters@google.com

More information

An Empirical Comparison of Tempo Trackers

An Empirical Comparison of Tempo Trackers An Empirical Comparison of Tempo Trackers Simon Dixon Austrian Research Institute for Artificial Intelligence Schottengasse 3, A-1010 Vienna, Austria simon@oefai.at An Empirical Comparison of Tempo Trackers

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information