Research Article Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony

Size: px
Start display at page:

Download "Research Article Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony"

Transcription

1 Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 007, Article ID 7305, pages doi:0.55/007/7305 Research Article Multiple Scale Music Segmentation Using Rhythm, Timbre, and Harmony Kristoffer Jensen Department of Medialogy, Aalborg University Esbjerg, Niels Bohrs Vej 6, Esbjerg 6700, Denmark Received 30 November 005; Revised 7 August 006; Accepted 7 August 006 Recommended by Ichiro Fujinaga The segmentation of music into intro-chorus-verse-outro, and similar segments, is a difficult topic. A method for performing automatic segmentation based on features related to rhythm, timbre, and harmony is presented, and compared, between the features and between the features and manual segmentation of a database of 48 songs. Standard information retrieval performance measures are used in the comparison, and it is shown that the timbre-related feature performs best. Copyright 007 Hindawi Publishing Corporation. All rights reserved.. INTRODUCTION Segmentation has a perceptual and subjective nature. Manual segmentation can be due to different attributes of music, such as rhythm, timbre, or harmony. Measuring similarity between music segments is a fundamental problem in computational music theory. In this work, automatic music segmentation is performed, based on three different features that are calculated so as to be related to the perception of rhythm, timbre, and harmony. Segmentation of music has many applications such as music information retrieval, copyright infringement resolution, fast music navigation, and repetitive structure finding. In particular, the navigation has been a key motivation in this work, for possible inclusion in the mixxx [] DJ simulation software. Another possibility is the use of the automatic segmentation for music recomposition []. In addition to this, the visualization of the rhythm, timbre, and harmony related features is believed to be a useful tool for computer-aided music analysis. Music segmentation is a popular topic in research today. Several authors have presented segmentation and visualization of music using a self-similarity matrix [3 5] with good results. Foote [5] used a measure of novelty calculated from the selfsimilarity matrix. Cooper and Foote [6] use singular value decomposition on the selfsimilarity matrix for automatic audio summary generation. Jensen [7] optimized the processing cost by using a smoothed novelty measure, calculated on a small square on the diagonal of the selfsimilarity matrix. In [8] short and long features are used for summary generation using image structuring filters and unsupervised learning. Dannenberg and Hu [9] use ad hoc dynamic programming algorithms on different audio features for identifying patterns in music. Goto [0]detects thechorus section using identification of repeated section on the chroma feature. Other segmentation approaches include informationtheoretic methods []. Jehan [] recently proposed a recursive multiclass approach to the analysis of acoustic similarities in popular music using dynamic programming. A previous work used a model of rhythm, the rhythmogram, to segment popular Chinese music [3]. The rhythmogram is calculated by taking overlapping autocorrelations of large blocks of a feature (the perceptual spectral flux PSF) that give a good estimate of the note onset. In this work, two other features are used, one feature that provides an estimate of the timbral content of the music (the timbregram), and one estimate that gives an estimate of the harmonic content (the chromagram). Both these features are calculated on a novel spectral feature, the Gaussian weighted average spectrogram (GWS). This feature multiplies and sums all the STFT frequency bins with a Gaussian with varying position and a given standard deviation. Thus, an average measure of the STFT can be obtained, with the major weight on an arbitrary time position, and a given influence of the surrounding time position. This model has several advantages, as will be detailed below. A novel method to compute segmentation splits using a shortest path algorithm is presented, using a model of the cost of a segmentation as the sum of the individual costs of segments. It is shown that with this assumption, the problem

2 EURASIP Journal on Advances in Signal Processing can be solved efficiently to optimality. The method is applied to three different databases of rhythmic music. The segmentation based on the rhythm, timbre, and chroma features is compared to the manual segmentation using standard IR measures. This paper is organized as follows. First, the feature extraction is presented, then the self-similarity is detailed, and the shortest path algorithm outlined. The segmentation is compared to the optimum results of manually segmented music in the experiment section, and finally a conclusion is given.. FEATURE EXTRACTION In audio signal segmentation, the feature used for segmentation can have an important influence on the segmentation result. The rhythmic feature used here (the rhythmogram) [7] is based on the autocorrelation of the PSF [7]. The PSF has high energy in the time position where perceptually important sound components, such as notes, have been introduced. The timbre feature (the timbregram) is based on the Gaussian weighted averaged perceptual linear prediction (PLP), a speech front-end [4], and the harmony feature (the chromagram) is based on the chroma [3], calculated on the Gaussian weighted short-time Fourier transform (STFT). The Gaussianweighted spectrogram (GWS) introduced here is shown to have several advantages, including resilience to noise and independence on block size. The STFT performs a fast Fourier transform (FFT) on short overlapping blocks. Each FFT thus gives information of the frequency content of a given time segment. The STFT is often visualized in the spectrogram. A speech front-end, such as the PLP alters the STFT data by scaling the intensity and frequency so that it corresponds to the way the human auditory system perceives sounds. The chroma maps the energy of the FFT into twelve bands, corresponding to the twelve notes of one octave. By using the rhythmic, timbral, and harmonic contents to identify the structure of the music, a rather complete understanding is assumed to be found... Rhythmogram Any model of rhythm should have as basis some kind of feature that reacts to the note onsets. The note onsets mark the main characteristics of the rhythm. In a previous work [7], a large number of features were compared to an annotated database of twelve songs, and the perceptual spectral flux (PSF) was found to perform best. The PSF is calculated as N b/ ps f (n) = W ( ) f { ( k a n) /3 ( k a n ) /3 } k, () k= where n is the feature block index, N b is the block size, and a k and f k are the magnitude and frequency of the bin k of the short-time Fourier transform (STFT), obtained using a Hanning window. The step size is 0 milliseconds, and the block size is 46 milliseconds. W is the frequency weighting used to obtain a value closer to the human loudness contour. This frequency weighting is obtained in this work by a simple equal loudness contour model [5]. The power function is used to simulate the intensity-loudness power law and reduce the random amplitude variations. These two steps are inspired from the PLP front-end [4] used in speech recognition. The PSF was compared to other note onset detection features with good results on the percussive case in a recent study [6]. The PSF feature detects most of the manual note onsets correctly, but it still has many peaks that do not correspond to note onsets, and many note onsets do not have a peak in the PSF. In order to obtain a more robust rhythm feature, the autocorrelation of the feature is now calculated on overlapping blocks of 8 seconds, with half a second step size ( Hz feature sample rate), n/ f sr+8/f sr i rg n (i) = ps f (j)ps f (j + i). () j=n/ f sr+ f sr is the feature sample rate, and n is the block index. Only the information between zero and two seconds is retained. The autocorrelation is normalized so that the autocorrelation at zero lag equals one. If visualized with lag time on the y-axis, time position on the x-axis, and the autocorrelation values visualized as colors, it gives a fast overview of the rhythmic evolution of a song. This representation, called rhythmogram [7], provides information about the rhythm and the evolution of the rhythm in time. The autocorrelation has been chosen, instead of the fast Fourier transform FFT, for two reasons. First, it is believed to be more in accordance with the human perception of rhythm [7], and second, it is believed to be more easily understood visually. The rhythmograms for two songs, Whenever, Wherever by Shakira and All of me by Billie Holiday are shown in Figure. The recent Shakira pop song has a steady rhythm, with only minor changes in instrumentation that changes the weight of some of the rhythm intervals, without affecting the fundamental beat, while All of Me does not seem to have any stationary rhythm... Gaussian windowed spectrogram While the rhythmogram indeed gives a good estimate of the changes in the music, as it is believed to encompass changes in instrumentation and rhythm, while not taking into account singing and solo instruments that are liable to have influence outside the segment, it has been found that the manual segmentation sometimes prioritize the singing or solo instrument over the rhythmic boundary. Therefore, other features have been included that are calculated from the spectral content of the music. If these features are calculated on short segments (0 to 50 milliseconds), they give detailed information in time, too varying to be used in the segmentation method used here. Instead, the features are calculated on a large segment, but localized in time by using the average of many STFT blocks multiplied with a Gaussian, sr / gws k (t) = st f t k (i)g(μ, σ). (3) i=

3 Kristoffer Jensen 3 Rhythm interval (s) Rhythm interval (s) Figure : Rhythmogramof Whenever, Wherever and All of Me. Here st f t k (i) is the kth bin (corresponding to the frequency f k = k sr /N b )oftheith block of the short-time Fourier transform, and g(μ, σ) is the Gaussian, defined as ( ) g(μ, σ) = σ e (t μ) /(σ ). (4) π Thus, by varying μ, information about different time localizations canbe obtained, and by increasingσ, more influence from the surrounding time steps can be included.... Comparison to large window FFT Theadvantagesofsuchahybridmodelarenumerous. Noise Assuming the signal is consisting of a sum of sinusoids plus a rather stationary noise, this noise is smoothed in the GWS. Thus the voiced part will stand out stronger and be more pertinent to observation or subsequent processing. Transients A transient will be averaged out over the full length of the block, in case of the FFT, while it will have a strong presence in the GWS when in the middle of the Gaussian. Peak width The GWS has a peak width that is independent of the actual duration that the GWS encompasses, while the FFT has a decreasing peak width with increasing blocksize. In case of music with sligthly varying pitch, such as live music, or when using vibrato, a small peak widthisadvantageous. Peak separation In case of two partials at proximity, the partials will retain their separation with the GWS, while the separation will increase with the FFT. While this is not an issue in itself, the rise of space between strong partials that contain noise is. Processor cost The FFT has a processor cost of O(N log N), while the GWS has a processor cost of O(M(N + N log N )), where M is the number of STFT blocks, and N is the STFT blocksize. In case FFT is rewritten as O(MN log (MN )), to have the same total blocksize as the GWS, the GWS is approximately log (MN )/log N faster. Comparison to common speech features While a speech feature, such as the PLP, hasabetter time resolution, it has no frequency resolution with regards to individual partials. The GWS, in comparison, still takes into account new notes in otherwise dense spectrum. In conclusion, the GWS permits analyzing the music with a varying time resolution, giving noise elimination, while maintaining the frequency resolution at all time resolutions and at a lower cost than the large window FFT.

4 4 EURASIP Journal on Advances in Signal Processing Bark frequency 5 Bark frequency Figure : Timbregram: PLP calculated usingthegws of Whenever, Wherever and All of Me..3. Timbre The timbre is understood here as the spectral estimate and done here using the Gaussian average on the perceptual linear Prediction, PLP [4]. This involves using the bark [8] scale, together with an amplitude scaling that gives an approximation of the human auditory system. The PLP is calculated with a blocksize of approximately 0 milliseconds and with an overlap of /. The GWS is calculated from the PLP in steps of / second, and with σ = 00. This gives a 3 db width of a little less than one second. A smaller σ would give too scattered information, while a too large value would smooth the PLP too much. An example of the PLP for the same two songs as above is shown in Figure. The timbregram is just as informative as the rhythmogram, although it does not give similar information. While the rhythm evolution is illustrated in the rhythmogram, it is the evolution of the timbre that is shown with the timbregram. This includes the insertion of new instruments, such as the trumpet solo in All of Me at approximately 30. The voice is most prominent in the timbregram. The repeating chorus sections are very visible in Whenever, Wherever, mainly because of the repeating singing style in each chorus, while the choruses are less visible in All of Me, since it s sung differently each time..4. Harmony The harmony is calculated on an average spectrum, using the Gaussian average, as is the spectral estimate. In this case, the chroma [3] is used as the measure of harmony. Thus, only the relative content of energy in the twelve notes of the octave is found. No information of the octave of the notes is included in the chromagram. Itiscalculatedfrom the STFT, using a blocksize of 46 milliseconds. and a stepsize of 0 milliseconds. The chroma is obtained by summing the energy of all peaks of log of the frequencies having multiples of. By averaging, using the Gaussian average, no specific time localization information is obtained of the individual notes or chords. Instead an estimate of the notes played in the short interval is given as an estimate of the scale used in the interval. A step size of / second is used, together with a σ value of 00, corresponding to a 3 db window of approximately 3 seconds. The chromagram of the same two songs as above is shown in Figure 3. It is obvious that the chromagram shows yet another aspect of the music. While the rhythmogram pinpoints rhythmic similarities, and the timbregram indicates the spectral part of the timbre, the chromagram gives rather precise information about the chroma of the notes played in the vicinity of the time location. Often, these three aspects of the music change simultaneously at the segment boundary. Sometimes, however, not one of the features can help in, for instance, identifying similar segments. This is the case for the title chorus of All of me, where Billie Holiday and the rhythmic section change the key, the rhythm, and the timbre between the first and the second occurrence. Even so, most often, the segment splits are well indicated by any of the features. This is proven in the next section, where first the selfsimilarity of the features are calculated, the segment splits are calculated using a shortest path algorithm with variable segment split cost, and finally these segment splits are matched to manual segment splits of different rhythmic music..5. Visualization Both the rhythmogram, the timbregram, and the chromagram give pertinent information about the evolution in time of the

5 Kristoffer Jensen 5 H H A# A# A A G# G# G G F# F F# F E E D# D# D D C# C# C C.3.3 Figure 3: gram: chroma calculated usingthegws of Whenever, Wherever and All of Me. rhythm, timbre, and chroma, as can be seen in Figures,, and 3. This is believed to be a great help in tasks involving manipulation and analysis of music, for instance for music theorists, DJs, digital turntablist, and others involved in the understanding and distribution of music. 3. SELF-SIMILARITY In order to get a better representation of the similarity of the song, a measure of selfsimilarity is used. This was first used in [9] to give evidence of recurrence in dynamic systems. Self similarity calculation is a means of giving evidence of the similarity and dissimilarity of the features. Several studies have used a measure of selfsimilarity [8] inautomatic music analysis. Foote [4] used the dot product on mfcc sampled at a 00 Hz rate to visualize the selfsimilarity of different music excerpt. Bartsch and Wakefield [3] used the chromabased representation to calculate the cross-correlation and identify repeated segments, corresponding to the chorus, for audio thumbnailing. Later Foote [5] introduced a checkerboard kernel correlation as a novelty measure that identifies notes with small time lag, and structure with larger lags with good success. Jensen [7] used smoothed novelty measure to identify structure without the costly calculation of the full checkerboard kernel correlation. In this work, the L norm is used to calculate the distance between two blocks. The selfsimilarities of Whenever, Wherever and All of Me calculated for the rhythmogram, the timbregram, and the chromagram are shown in Figure 4. It is clear that Whenever, Wherever contains more similar music (indicated with a dark color) than All of Me. It has a distinctly different intro and outro, and three repetitions of the chorus, the third one repeated. While this is visible, in part, in the rhythmogram, and quite so in the timbregram, it is most prominent in the chromagram, where the three repetitions of the chorus stand out. As for the intro and the outro, they are quite similar with regard to rhythm, as can be seen in the rhythmogram, rather dissimilar with regard to the timbre, and more dissimilar with respect to the chromagram. This is explained by the fact that the intro is played on a guitar, the outro on pan-flute, and although they have similar note durations, the timbres of a pan-flute and a guitar are quite dissimilar, and they do not play the same notes. The situation for All of Me is that the rhythm is changing all the time, in short segments with a duration of approximately 0 seconds. The saxophone solo at 30 is rather homogenous and similar to the piano intro and some parts of the vocal verse. A large part of the song is more similar with respect to timbre than rhythm or harmony, although most of the song is only similar to itself in short segments of approximately 0 seconds for the timbre, as it is for the chromagram. 4. SHORTEST PATH Although the segments are visible in the self-similarity plots, there is still a need for a method for identifying the segment splits. Such a method was presented in [3]. In order to segment the music, a model for the cost of one segment and the segment split is necessary. When this is obtained, the problem is solved using the shortest path algorithm for the directed acyclic graph. This method provides the optimum solution. 4.. Cost of one segment For all features, a sequence,,..., N of N blocks of music is to be divided into a number of segments. c(i, j) is the cost of a segment from block i to block j, where i j N. This cost of a segment is chosen to be a measure of the selfsimilarity of the segment, such that segments with a high

6 6 EURASIP Journal on Advances in Signal Processing (c) (d).3.3 (e).3.3 (f) Figure 4: L self-similarity for the rhythmogram (left), timbregram (middle), and chromagram (right), of Whenever, Wherever (top) and All of me (bottom). degree of selfsimilarity have a low cost, ( c(i, j) = j i + ) j k A lk. (5) k= l=i This cost function computes the sum of the average selfsimilarity of each block in the segment to all other blocks in the segment. While a normalization by the square of the segment length j i + would give the true average, this would severely impede the influence of new segments with larger self-similarity in a large segment, since the large values would be normalized by a relatively large segment length. 4.. Cost of segment split Let i j, i j,..., i K j K be a segmentation into K segments, where i =, i = j +,i 3 = j +,..., j K = N. The total cost of this segmentation is the sum of segment costs plus an additional cost, which is a fixed cost for a new segment, K { ( )} E = α + c in, j n. (6) k= By increasing α, the number of resulting segments is decreased. The appropriate value of α is found by optimizing the matching of automatic and manual segment splits Shortest path In order to compute a best possible segmentation, an edgeweighted directed graph G = (V, E) is constructed. The set of nodes is V =,,..., N +. For each possible segment ij, where i j N, anedgei,j + exists in E. The weight of the edge i, j +isα + c(i, j). A path in G from node to node N + corresponds to a complete segmentation, where each edge identifies the individual segments. The weight of the path is equal to the total cost of the corresponding segmentation. Therefore, a shortest path (or path with minimum total weight) from node to node N +givesa segmentation with minimum total cost. Such a shortest path canbecomputedintimeo( V + E ) = O(N ), since G is acyclic and has E =O(N )edges[0]. An illustration of the directed acyclic graph for a short sequence is shown in Figure Function of split cost The segment split cost (α) of the segmentation algorithm is analyzed here. What is interesting is mainly to investigate whether the total cost of a segmentation (6) has a local minimum. Unfortunately, this is not the case. The total cost is very small for small α and it increases with α. This is clear, as a new segmentation (with one less segment) is chosen (for an

7 Kristoffer Jensen 7 α + c(, ) α + c(, 3) α + c(, ) α + c(, ) α + c(3, 3) 3 4 α + c(, 3) Figure 5: Example of a directed acyclic graph with three segments. increased α) once the cost of the new segmentation is equal to the original segmentation. The new segmentation cost is now increased with α, until yet another segmentation is chosen at equal cost. Another interesting parameter is the total number of segments. It is plausible that the segmentation system is to be used in a situation where a given number of segments is wanted. This number decreases with the segment cost, as expected. Experiments with a large number of songs show that the number of segments for a given α is comprised between the half and the double of a median number of segments for most songs. 5. EXPERIMENTS The segmentation system is now complete. It consists of three different features (the rhythmogram, timbregram, and chromagram), a selfsimilarity measure, and finally the segmentation based on a shortest path algorithm. Two things are interesting in the evaluation of the automatic segmentation system. The first is how the automatic segmentation using the different features actually compare to how humans would segment the music. The second one is whether the different features identify the same segmentation points. In order to test the result, a database on rhythmic music has been collected and manually marked. This database is used here. Three different databases have been segmented manually by three different persons, and segmented automatically using the rhythmic, the timbral, and the harmonic feature. The segmentation points are then matched, and the performance of the segmentation is calculated. No cross-validation has been performed between the subjects. 5.. Material Three different databases have been collected. One, consisting of Chinese music, has been segmented using the Chinese numbered notation system [3]. This music consists of randomly selected popular Chinese songs which come from Chinese Mainland, Taiwan, and Hong Kong. They have a variety in tempo, genre, and style, including pop, rock, lyrical, and folk. This music is mainly from 004. The second database consists of 3 songs, of mainly electronica and techno, from 004, and the third database consists of 5 songs, with varying style; alternative rock, ethno pop, pop, and techno. This music is from the 940s to Manual segmentation In order to compare the automatic segmentation, the databases of music have been manually segmented by three different persons. Each database has been segmented by one person only. While cross-validation of the manual segmentation could prove useful, the added confusion of the experimental results is believed to confuse the situation. The Chinese pop music was segmented with the aid of a notation system and listening, the other two by listening only. The instructions to the subjects were to try to segment the music according to the assumed structure of popular music, consisting of an intro, chorus, and verse, bridge and outro, with repetitions and omissions, and potentially other segments (solos, variations, etc.). The persons performing the segmentation are professional musicians with a background in jazz and rhythmic music. Standard audio editing software was used (Peak and Audacity on Macintosh). For the total database, there is an average of 3 segments per song (first and third quartile are 9 and 7, resp.). The average length of a segment is 0 seconds Matching The last step in the segmentation is to compare the manual and the automatic segment splits for different values of the new segment cost (α). To do this, the automatic segmentations are calculated for increasing values of α; a low value induces many segments, while a high value gives few segments. The manual and automatic segment split positions are now matched, if they are closer than a threshold. For each value of α, the relative ratios of matched splits to total number of manual splits and to number of automatic splits (recall, R and precision, P, resp.) are found, and the distance to the optimal result is minimized: d(α) = ( P(α) ) + ( R(α) ). (7) Since this distance is not common in information retrieval, it is used for matching only here. In the rest of the text, the recall and precision measures, and the weighted sum of these, F,areused. The threshold for identifying a match is important for the matching result. A too short threshold will make the correct, but slightly misplaced segment point unmatched. An analysis of the number of correct matched manual splits shows that it decreases from between 0- to approximately 9 when the matching threshold decreases from 5 seconds to second. The number of automatic splits increases significantly, from between 5 7 to 86 (rhythmogram), 7 (timbregram), and 88 (chromagram). The performance of the matching, as a function of the matching threshold is shown in Figure 6.The performance, measured as F, increases with the threshold, mainly because the number of automatic splits decreases. While no asymptotic behavior can be detected for threshold values up to 0 seconds, a flattening of the F increase seems to occur at a threshold of between 3-4 seconds. 4 seconds would also permit the subsequent identification of the first beat of the correct measure for tempos up to 60 B/min.

8 8 EURASIP Journal on Advances in Signal Processing F performance Matching threshold (s) Rhythm Timbre Figure 6: Mean of F performance as a function of the matching threshold for 49 songs. Table : F of the total database for comparison between the segmentation using the rhythmogram, timbregram, and chromagram. Feature Rhythmoghram Timbregram gram Rhythmoghram.00 Timbregram gram It is, therefore, used as the matching threshold in the experiments Comparison between features A priori the rhythmic, timbre, and chroma features should produce approximately the same segmentations. In order to verify this, the distance between the three features has been calculated for all the songs. This has been done for α = 5.8,.3, and 6. for rhythm, timbre, and chroma, respectively. These are the mean values found in the task of optimizing the automatic splits to the manual splits in the next section. The features generally match well. Only a handful of songs has a perfect match. The F performance measure for matching the automatic splits using the three different features are shown in Table.AnF value of 0.6 corresponds approximately to a recall and performance value of between 50 70%. If the comparison between features are done by selecting an α value that renders a fixed number of splits (for instance the same number as the manual segmentation), the F value increases approximately by 3%. This still hides some discrepancies, however, as some songs have rather different segmentations for the different features. One such example for the first minute of The Marriage is shown in Figure 7. The rhythm only have two The Marriage of Hat and Boots by August Engkilde presents Electronic Panorama Orchestra (Popscape 004). Table : F of the three databases for the segmentation using the rhythmogram, timbregram,andchromagram. Database Rhythmoghram Timbregram gram Chinese pop Electronica Varied Total Total with fixed α segment splits (at 3 and 37 seconds) in the first minute, when the bass-rhythm starts and another when the drums join in. The timbre has one additional split at seconds, start of singing, and another, just before one minute. The chroma have the same splits as the timbre, although the split is earlier, at 4 seconds, seemingly because of the slide guitar changing note Comparison with manual segmentation In this section, the match between the automatic and manual segmentations is investigated. For the full database, the rhythm has an average of 0.04 matched splits of 3.39 manual (recall = 75%) and of 7.65 automatic splits (precision = 56.9%). F = 0.7. The timbre has an average of 0.73 matched splits, of 3.39 manual (recall = 80.%), and 5.96 automatic splits (precision = 67.3%). F = The chroma has an average of 0. matched splits, of 3.39 manual (recall = 75.6%), and 0.59 automatic splits (precision = 49.%). F = The Chinese pop database has an F values of 0.7, 0.75, and 0.66 for rhythm, timbre, and harmony, the electronica 0.74, 0.77, and 0.66, while the varied database has 0.68, 0.74, and 0.7. These results can be seen in Table. These results have been obtained for an optimal alpha value, found using (7). The mean α values for each feature are 5.8,.3, and 6., for rhythm, timbre, and chroma,respectively. The α values are rather invariant with respect to the song, with a first and third quartile always between ± 50%. The mean α is used to separate training and test. The matching performance for the automatic segmentation using the mean α can be seen in Table. The timbre has a better performance in all cases, and it seems that this is the main attribute used when segmenting music. The rhythm has the next best performance results for the Chinese pop and the electronica, indicating that either the music is more rhythmically based, or that the person performed the manual segmentation based on rhythm, while in the varied database the chroma has the second best performance. All in all, the segmentation identifies most of the manual splits correctly, while keeping the false hits down. The features have comparable results. As the shortest path is the optimum solution, given the error criteria, the performance errors are a result of either bad features, or errors in the manual segmentation. The automatic segmentation has 65% coincidence between the rhythm and timbre feature, 60% between rhythm and chroma, 63% between timbre and chroma, and 5% coincidence between all three segmentations. While [] finds

9 Kristoffer Jensen 9 Rhythm interval (s) Bark frequency H A# A G# G F# F E D# D C# C (c) (d) (e) (f) Figure 7: Rhythm, timbre and chroma of The Marriage.Feature (top)and self-similarity (bottom).the automatic segmentation points are marked with vertical solid lines. Rhythm interval (s) Bark frequency H A# A G# G F# F E D# D C# C (c) Figure 8: Rhythm, timbre, (c) and chroma of Whenever, Wherever. 55% correspondence between subjects in a free segmentation task, the results are not easily exploitable because of the short sound files ( minute). However, since manual segmentation seemingly does not perform better than the matching automatic and manual splits, it is believed that the results presented here are rather good. Indeed, by manual inspection of the automatic and manual segmentation, the automatic segmentation often makes better sense than the manual one, when conflicting. As an example of the result, the rhythmogram, timbregram,andchromagramfor Whenever, Wherever and All of You are shown in Figures 8 and 9, respectively.themanualsegmentation is shown in dashed line and the automatic in solid line. The performance for Whenever, Wherever is F = 0.83, 0.8, and 0.8. Good match on all features. All of Me has F = 0.48, 0.8, and 0.7. Obviously, in this song, the manual segmentation was made on the timbre only, as it has a significantly better matching score. 6. CONCLUSION This paper has introduced three features, one associated with rhythm called rhythmogram, one associated with timbre called timbregram, and one with harmony called chromagram. All three features are calculated as an average over time, the timbregram and chromagram using a novel smoothing based on the Gaussian window. The three features are used to calculate the selfsimilarity. The feature and the selfsimilarity

10 0 EURASIP Journal on Advances in Signal Processing Rhythm interval (s) Bark frequency H A# A G# G F# F E D# D C# C (c) Figure 9: Rhythm, timbre, and (c) chroma of All of Me. are excellent candidates for visualizing the primary attributes of music; rhythm, timbre, and harmony. The songs are segmented using a shortest path algorithm based on a model of the cost of one segment and the segment split. The variable cost of the segment split makes it possible to choose the scale of segmentation, either fine, which creates many segments of short length, or coarse, which creates a few long segment. The rhythm, timbre, and chroma create approximately the same number of segments at the same locations in most of the cases. The matching performances (F ), when compared to the manual segmentations are 0.7, 0.75, and 0.68 for rhythm, timbre, and chroma, giving indications that the timbre is the main feature for the task of segmenting music manually. This decreases 0% when separating training and test data, but it is always better than how the automatic segmentation compares between features. The automatic segmentation is considered to provide an excellent performance, giving how it is dependent on the music, the person performing the segmentation, or the tools used. The features and the segmentation can be used for audio thumbnailing, making a preview, for use in intelligent music scrolling, or in music recomposition. REFERENCES [] T. H. Andersen, Mixxx: towards novel dj interfaces, in Proceedings of the International Conference on New Interfaces for Musical Expression (NIME 03), pp , Montreal, Quebec, Canada, May 003. [] D. Murphy, Pattern play, in Additional Proceedings of the nd International Conference on Music and Artificial Intelligence,A. Smaill, Ed., Edinburgh, Scotland, September 00. [3]M.A.BartschandG.H.Wakefield, Tocatchachorus:using chroma-based representations for audio thumbnailing, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 5 8, New Paltz, NY, USA, October 00. [4] J. Foote, Visualizing music and audio using self-similarity, in Proceedings of the 7th ACM International Multimedia Conference & Exhibition, pp , Orlando, Fla, USA, November 999. [5] J. Foote, Automatic audio segmentation using a measure of audio novelty, in Proceedings of IEEE International Conference on Multimedia and Expo (ICME 00), vol., pp , New York, NY, USA, July-August 000. [6] M. Cooper and J. Foote, Summarizing popular music via structural similarity analysis, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 03), pp. 7 30, New Paltz, NY, USA, October 003. [7] K. Jensen, A causal rhythm grouping, in Proceedings of nd International Symposium on Computer Music Modeling and Retrieval (CMMR 04), vol. 330 of Lecture Notes in Computer Science, pp , 005. [8] G. Peeters and X. Rodet, Signal-based music structure discovery for music audio summary generation, in Proceedings of International Computer Music Conference (ICMC 03), pp. 5, Singapore, Octobre 003. [9] R. B. Dannenberg and N. Hu, Pattern discovery techniques for music audio, Journal of New Music Research, vol. 3, no., pp , 003. [0] M. Goto, A chorus-section detecting method for musical audio signals, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 03), vol. 5, pp , Hong Kong, April 003. [] S. Dubnov, G. Assayag, and R. El-Yaniv, Universal classification applied to musical sequences, in Proceedings of the International Computer Music Conference (ICMC 98), pp , Ann Arbor, Mich, USA, October 998. [] T. Jehan, Hierarchical multi-class self similarities, in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 05), pp. 3 34, New Paltz, NY, USA, October 005. [3] K. Jensen, J. Xu, and M. Zachariasen, Rhythm-based segmentation of popular chinese music, in Proceedings of 6th International Conference on Music Information Retrieval (ISMIR 05), pp , London, UK, September 005. [4] H. Hermansky, Perceptual linear predictive (PLP) analysis of speech, Journal of the Acoustical Society of America, vol. 87, no. 4, pp , 990.

11 Kristoffer Jensen [5] K. Jensen, Perceptual atomic noise, in Proceedings of the International Computer Music Conference (ICMC 05), pp , Barcelona, Spain, September 005. [6] N. Collins, A comparison of sound onset detection algorithms with emphasis on psychoacoustically motivated detection functions, in Proceedings of AES 8th Convention, Barcelona, Spain, May 005. [7] P. Desain, A (de)composable theory of rhythm, Music Perception, vol. 9, no. 4, pp , 99. [8] A. Sekey and B. A. Hanson, Improved -bark bandwidth auditory filter, Journal of the Acoustical Society of America, vol. 75, no. 6, pp , 984. [9] J. P. Eckmann, S. O. Kamphorst, and D. Ruelle, Recurrence plots of dynamical systems, Europhysics Letters, vol. 4, no. 9, pp , 987. [0] T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson, Introduction to Algorithms, The MIT Press, Cambridge, UK; McGraw-Hill, New York, NY, USA, nd edition, 00. [] G. Tzanetakis and P. Cook, Multifeature audio segmentation for browsing and annotation, in Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 99), pp , New Paltz, NY, USA, October 999. Kristoffer Jensen obtained his Masters degree in 988 in computer science from the Technical University of Lund, Sweden, and a D.E.A in signal processing in 989 from the ENSEEIHT, Toulouse, France. His Ph.D. was delivered and defended in 999 at the Department of Computer Science, University of Copenhagen, Denmark, treating signal processing applied to music with a physical and perceptual point of view. This mainly involved classification and modeling of musical sounds. He has been involved in synthesizers for children, state-of-the-art next generation effect processors, and signal processing in music informatics. His current research topic is signal processing with musical applications, and related fields, including perception, psychoacoustics, physical models, and expression of music. He currently holds a position at the Software and Media Technology Department, Aalborg University Esbjerg as Associate Professor.

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310,

Citation for published version (APA): Jensen, K. K. (2005). A Causal Rhythm Grouping. Lecture Notes in Computer Science, 3310, Aalborg Universitet A Causal Rhythm Grouping Jensen, Karl Kristoffer Published in: Lecture Notes in Computer Science Publication date: 2005 Document Version Early version, also known as pre-print Link

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 1 Methods for the automatic structural analysis of music Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 2 The problem Going from sound to structure 2 The problem Going

More information

Music Structure Analysis

Music Structure Analysis Lecture Music Processing Music Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Music Structure Analysis

Music Structure Analysis Overview Tutorial Music Structure Analysis Part I: Principles & Techniques (Meinard Müller) Coffee Break Meinard Müller International Audio Laboratories Erlangen Universität Erlangen-Nürnberg meinard.mueller@audiolabs-erlangen.de

More information

An Examination of Foote s Self-Similarity Method

An Examination of Foote s Self-Similarity Method WINTER 2001 MUS 220D Units: 4 An Examination of Foote s Self-Similarity Method Unjung Nam The study is based on my dissertation proposal. Its purpose is to improve my understanding of the feature extractors

More information

Audio Feature Extraction for Corpus Analysis

Audio Feature Extraction for Corpus Analysis Audio Feature Extraction for Corpus Analysis Anja Volk Sound and Music Technology 5 Dec 2017 1 Corpus analysis What is corpus analysis study a large corpus of music for gaining insights on general trends

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Music Structure Analysis

Music Structure Analysis Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Music Structure Analysis Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

Audio Structure Analysis

Audio Structure Analysis Lecture Music Processing Audio Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Music Structure Analysis Music segmentation pitch content

More information

Computer Coordination With Popular Music: A New Research Agenda 1

Computer Coordination With Popular Music: A New Research Agenda 1 Computer Coordination With Popular Music: A New Research Agenda 1 Roger B. Dannenberg roger.dannenberg@cs.cmu.edu http://www.cs.cmu.edu/~rbd School of Computer Science Carnegie Mellon University Pittsburgh,

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

CS 591 S1 Computational Audio

CS 591 S1 Computational Audio 4/29/7 CS 59 S Computational Audio Wayne Snyder Computer Science Department Boston University Today: Comparing Musical Signals: Cross- and Autocorrelations of Spectral Data for Structure Analysis Segmentation

More information

Audio Structure Analysis

Audio Structure Analysis Tutorial T3 A Basic Introduction to Audio-Related Music Information Retrieval Audio Structure Analysis Meinard Müller, Christof Weiß International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de,

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors

Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Polyphonic Audio Matching for Score Following and Intelligent Audio Editors Roger B. Dannenberg and Ning Hu School of Computer Science, Carnegie Mellon University email: dannenberg@cs.cmu.edu, ninghu@cs.cmu.edu,

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

TOWARD AUTOMATED HOLISTIC BEAT TRACKING, MUSIC ANALYSIS, AND UNDERSTANDING

TOWARD AUTOMATED HOLISTIC BEAT TRACKING, MUSIC ANALYSIS, AND UNDERSTANDING TOWARD AUTOMATED HOLISTIC BEAT TRACKING, MUSIC ANALYSIS, AND UNDERSTANDING Roger B. Dannenberg School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 USA rbd@cs.cmu.edu ABSTRACT Most

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Lecture 15: Research at LabROSA

Lecture 15: Research at LabROSA ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 15: Research at LabROSA 1. Sources, Mixtures, & Perception 2. Spatial Filtering 3. Time-Frequency Masking 4. Model-Based Separation Dan Ellis Dept. Electrical

More information

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH Proc. of the th Int. Conference on Digital Audio Effects (DAFx-), Hamburg, Germany, September -8, HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH George Tzanetakis, Georg Essl Computer

More information

Tempo and Beat Tracking

Tempo and Beat Tracking Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Tempo and Beat Tracking Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

Music Alignment and Applications. Introduction

Music Alignment and Applications. Introduction Music Alignment and Applications Roger B. Dannenberg Schools of Computer Science, Art, and Music Introduction Music information comes in many forms Digital Audio Multi-track Audio Music Notation MIDI Structured

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

Week 14 Music Understanding and Classification

Week 14 Music Understanding and Classification Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n

More information

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY

STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY STRUCTURAL CHANGE ON MULTIPLE TIME SCALES AS A CORRELATE OF MUSICAL COMPLEXITY Matthias Mauch Mark Levy Last.fm, Karen House, 1 11 Bache s Street, London, N1 6DL. United Kingdom. matthias@last.fm mark@last.fm

More information

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach

More information

Computational Modelling of Harmony

Computational Modelling of Harmony Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond

More information

The song remains the same: identifying versions of the same piece using tonal descriptors

The song remains the same: identifying versions of the same piece using tonal descriptors The song remains the same: identifying versions of the same piece using tonal descriptors Emilia Gómez Music Technology Group, Universitat Pompeu Fabra Ocata, 83, Barcelona emilia.gomez@iua.upf.edu Abstract

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS Mutian Fu 1 Guangyu Xia 2 Roger Dannenberg 2 Larry Wasserman 2 1 School of Music, Carnegie Mellon University, USA 2 School of Computer

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Analysis, Synthesis, and Perception of Musical Sounds

Analysis, Synthesis, and Perception of Musical Sounds Analysis, Synthesis, and Perception of Musical Sounds The Sound of Music James W. Beauchamp Editor University of Illinois at Urbana, USA 4y Springer Contents Preface Acknowledgments vii xv 1. Analysis

More information

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING Luis Gustavo Martins Telecommunications and Multimedia Unit INESC Porto Porto, Portugal lmartins@inescporto.pt Juan José Burred Communication

More information

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Proceedings of the 3 rd International Conference on Control, Dynamic Systems, and Robotics (CDSR 16) Ottawa, Canada May 9 10, 2016 Paper No. 110 DOI: 10.11159/cdsr16.110 A Parametric Autoregressive Model

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013 73 REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation Zafar Rafii, Student

More information

Audio Structure Analysis

Audio Structure Analysis Advanced Course Computer Science Music Processing Summer Term 2009 Meinard Müller Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Music Structure Analysis Music segmentation pitch content

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,

More information

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

The Intervalgram: An Audio Feature for Large-scale Melody Recognition The Intervalgram: An Audio Feature for Large-scale Melody Recognition Thomas C. Walters, David A. Ross, and Richard F. Lyon Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043, USA tomwalters@google.com

More information

Analytic Comparison of Audio Feature Sets using Self-Organising Maps

Analytic Comparison of Audio Feature Sets using Self-Organising Maps Analytic Comparison of Audio Feature Sets using Self-Organising Maps Rudolf Mayer, Jakob Frank, Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology,

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Content-based Music Structure Analysis with Applications to Music Semantics Understanding

Content-based Music Structure Analysis with Applications to Music Semantics Understanding Content-based Music Structure Analysis with Applications to Music Semantics Understanding Namunu C Maddage,, Changsheng Xu, Mohan S Kankanhalli, Xi Shao, Institute for Infocomm Research Heng Mui Keng Terrace

More information

Music Information Retrieval

Music Information Retrieval Music Information Retrieval When Music Meets Computer Science Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Berlin MIR Meetup 20.03.2017 Meinard Müller

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Journal of Energy and Power Engineering 10 (2016) 504-512 doi: 10.17265/1934-8975/2016.08.007 D DAVID PUBLISHING A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations

More information

A Beat Tracking System for Audio Signals

A Beat Tracking System for Audio Signals A Beat Tracking System for Audio Signals Simon Dixon Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria. simon@ai.univie.ac.at April 7, 2000 Abstract We present

More information

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Onset Detection and Music Transcription for the Irish Tin Whistle

Onset Detection and Music Transcription for the Irish Tin Whistle ISSC 24, Belfast, June 3 - July 2 Onset Detection and Music Transcription for the Irish Tin Whistle Mikel Gainza φ, Bob Lawlor*, Eugene Coyle φ and Aileen Kelleher φ φ Digital Media Centre Dublin Institute

More information

AUTOMASHUPPER: AN AUTOMATIC MULTI-SONG MASHUP SYSTEM

AUTOMASHUPPER: AN AUTOMATIC MULTI-SONG MASHUP SYSTEM AUTOMASHUPPER: AN AUTOMATIC MULTI-SONG MASHUP SYSTEM Matthew E. P. Davies, Philippe Hamel, Kazuyoshi Yoshii and Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan

More information

IMPROVING MARKOV MODEL-BASED MUSIC PIECE STRUCTURE LABELLING WITH ACOUSTIC INFORMATION

IMPROVING MARKOV MODEL-BASED MUSIC PIECE STRUCTURE LABELLING WITH ACOUSTIC INFORMATION IMPROVING MAROV MODEL-BASED MUSIC PIECE STRUCTURE LABELLING WITH ACOUSTIC INFORMATION Jouni Paulus Fraunhofer Institute for Integrated Circuits IIS Erlangen, Germany jouni.paulus@iis.fraunhofer.de ABSTRACT

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene

However, in studies of expressive timing, the aim is to investigate production rather than perception of timing, that is, independently of the listene Beat Extraction from Expressive Musical Performances Simon Dixon, Werner Goebl and Emilios Cambouropoulos Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Vienna, Austria.

More information

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Jana Eggink and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 11

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

Music Database Retrieval Based on Spectral Similarity

Music Database Retrieval Based on Spectral Similarity Music Database Retrieval Based on Spectral Similarity Cheng Yang Department of Computer Science Stanford University yangc@cs.stanford.edu Abstract We present an efficient algorithm to retrieve similar

More information

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION

ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION ONLINE ACTIVITIES FOR MUSIC INFORMATION AND ACOUSTICS EDUCATION AND PSYCHOACOUSTIC DATA COLLECTION Travis M. Doll Ray V. Migneco Youngmoo E. Kim Drexel University, Electrical & Computer Engineering {tmd47,rm443,ykim}@drexel.edu

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS Rui Pedro Paiva CISUC Centre for Informatics and Systems of the University of Coimbra Department

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Introductions to Music Information Retrieval

Introductions to Music Information Retrieval Introductions to Music Information Retrieval ECE 272/472 Audio Signal Processing Bochen Li University of Rochester Wish List For music learners/performers While I play the piano, turn the page for me Tell

More information

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Efficient Vocal Melody Extraction from Polyphonic Music Signals http://dx.doi.org/1.5755/j1.eee.19.6.4575 ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 19, NO. 6, 213 Efficient Vocal Melody Extraction from Polyphonic Music Signals G. Yao 1,2, Y. Zheng 1,2, L.

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Measurement of overtone frequencies of a toy piano and perception of its pitch

Measurement of overtone frequencies of a toy piano and perception of its pitch Measurement of overtone frequencies of a toy piano and perception of its pitch PACS: 43.75.Mn ABSTRACT Akira Nishimura Department of Media and Cultural Studies, Tokyo University of Information Sciences,

More information

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui

More information

MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC

MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC Maria Panteli University of Amsterdam, Amsterdam, Netherlands m.x.panteli@gmail.com Niels Bogaards Elephantcandy, Amsterdam, Netherlands niels@elephantcandy.com

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Timing In Expressive Performance

Timing In Expressive Performance Timing In Expressive Performance 1 Timing In Expressive Performance Craig A. Hanson Stanford University / CCRMA MUS 151 Final Project Timing In Expressive Performance Timing In Expressive Performance 2

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Shades of Music. Projektarbeit

Shades of Music. Projektarbeit Shades of Music Projektarbeit Tim Langer LFE Medieninformatik 28.07.2008 Betreuer: Dominikus Baur Verantwortlicher Hochschullehrer: Prof. Dr. Andreas Butz LMU Department of Media Informatics Projektarbeit

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information