Popular Song Summarization Using Chorus Section Detection from Audio Signal

Size: px
Start display at page:

Download "Popular Song Summarization Using Chorus Section Detection from Audio Signal"

Transcription

1 Popular Song Summarization Using Chorus Section Detection from Audio Signal Sheng GAO 1 and Haizhou LI 2 Institute for Infocomm Research, A*STAR, Singapore 1 gaosheng@i2r.a-star.edu.sg 2 hli@i2r.a-star.edu.sg Abstract Music signal is a one-dimensional temporal sequence. It thus incurs difficulty for the listeners to quickly capturing the mostly attracting parts in popular songs, unless the listeners play the song until the ending. In order to improve the listening experience, music summarization, a tool to summarize the song using the most attractive sections, is needed. In the paper, a system and method is presented to summarize the popular songs by detecting the chorus sections from the input audio signal. The proposed summarization system uses the unique audio feature representation method, i.e. the octavedependent probabilistic latent semantic analysis, and the chorus detection algorithm that combines the repeated segment extraction and chorus identification. The performance of music summarization is evaluated on the song database with the ground truth of chorus sections, i.e. the start and ending timestamp of each chorus section. As we know, it is the first systematically evaluation of music summary performance. In terms of multiple metrics such as the boundary accuracy, precision, recall and F1, we show that the proposed system is much superior to the widely accepted methods. I. INTRODUCTION Music signal is a one-dimensional temporal sequence. It thus incurs difficulty for the listeners to quickly capturing the mostly attracting parts in popular songs, unless the listeners play the song sequentially until the ending. Sometimes, it is annoying to listen to the whole songs with the length of a few minutes. Although the listener can play using the fast forward or backward functions in the play bar, it is still not easy to correctly identify the beginning of interesting segments. Therefore, in order to improve the listening experience, music summarization, a tool to summarize the song content using most attractive sections, is needed. Music is a highly structured signal. The artists exploit the repeated lyrics (sometimes a few word modifications), theme, tone, etc. to express their emotions and concepts. In terms of signal, we can visually identify the repeated frequencytemporal pattern in the frequency domain. Among the different parts in popular music, the chorus sections are mostly important. The chorus contains the main idea, or big picture, of what is being expressed lyrically and musically. It is repeated throughout the song, and the melody and lyric rarely vary 1. So the chorus sections become the good 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), Oct , 2015, Xiamen, China /15/$ IEEE. 1 candidates for music summarization. In general, the popular song has two modality features: text, i.e. lyrics, and audio. In order to use lyrics for chorus detection, the boundary of lyrics (i.e. phrases or sentences) must be known as a prior. Unfortunately, it is not always available for the popular song. In comparison, audio signal is always ready. From the spectrum reading, it is observed that the repeated audio segments have the similar frequencytemporal pattern, even if there are a few noises because of some modifications of lyric words, tone or accompany instruments. In the frame-level (a few milliseconds), the chorus part is indistinguishable from other parts. But in the long temporal window (e.g. 1 second), it is quite distinguishable. In the paper, we will exploit the observation from spectrum reading and visual analysis into the design of chorus detection system. Derived from the widely accepted chroma feature [9, 12, 13], we will develop a novel octave-dependent probabilistic latent semantic analysis (OdPlsa) to analyse the audio signal, based on which a chorus detection system is developed. In Section II, the related work will be discussed. In Section III, the octave-dependent probabilistic latent semantic analysis (OdPlsa) is presented in detail. In Section IV, the chorus detection algorithm is introduced. In Section V, the experiments are carried out to evaluate the performance of the chorus detection based music summary system. Finally we summarize our work. II. RELATED WORK Audio signal, similar to speech, is a one-dimensional time serial sequence. Thus, the feature extraction techniques in speech processing can be applied to the tasks of music information retrieval (MIR) such as measuring audio similarity, audio classification, music structure analysis (e.g. [4], [12], [15], [16], and [22]). A comprehensive survey can be found in [15]. Among many kinds of speech features, Melfrequency cepstral coefficient (MFCC) is found useful in MIR tasks. For example, in [2] MFCC is extracted to measure audio similarity for music summary. In [7], it is used in chorus detection and emotion classification. It is also found that MFCC is sensitive to key and accompaniment changes. To address those issues, chroma (pitch class profile) feature is proposed and becomes an effective representation for music signal [6][8][9][13][18-21]. It is a 12-dimensional vector corresponding to 12 distinct semitones in music theory.

2 Chroma characterizes the magnitude distribution in 12 semitones of each frame. It is obtained by mapping the FFT frequency bands to the semitones, and, for each semitone, the energies in different octaves are combined. It has different implementations. Chroma feature is widely used in chorus detection, music segmentation, cover song detection, etc. Despite many successes, chroma has two drawbacks in characterizing music signals. One is that it only describes the frame-level magnitude distribution regardless of the long context magnitude distribution in the song. As a result, its discrimination capability is limited. Another is that the linear method is used to fuse the magnitude across the octaves without considering the representation capability of the semitone in each octave. These two issues relate to the spectrum distributions in the song level and the octave relation. It motivates us to investigate the octave-dependent probabilistic latent analysis (OdPlsa), an extension from probabilistic latent analysis [1]. OdPlsa learns the latent clusters by analysing the magnitude distribution of semitones along the octave and temporal dimensions, and then represents the audio segment in the latent clusters. It exploits the music structure in semitone, octave, and temporal, which is different from the existing latent component technique used for audio signal analysis. For example, in [10], PLSA is directly applied to model the occurrence of symbolic audio tokens in a song. In [3], the shift-invariance PLSA is used to modelling the magnitude distribution at the 12-dimensional semitone and time axes and the shift property in the model can solve the key change. Following the steps in [9, 13], the chorus detection is carried out based on the extracted audio feature. Many papers have been published under the chorus detection or music structure detection [6-9, 11-15, 18-21]. Firstly, the pair-wise audio segment similarity is calculated. It is expected that the repeated segments have high similarity value. Otherwise, the lower values are observed. When viewing the matrix as an image, the repeated segments will form a line along the diagonal. However, due to the non-perfect audio features, sometimes the line will not be observable. It means that the repeated segments may not have similarity values significantly higher than the other dissimilar segments. It causes the difficulty to extract the lines. In [9], Goto applies the 2D-filter method to enhance the possible similar candidate points and suppress the others. It enhances the lines. In order to get chorus section, he also designs the methods to measure the possibility of a repeated segment as a chorus. Similarly, in [9, 18, 20], the researchers also exploit various heuristic algorithms to find the most significant lines from the matrix. In general, the chorus detection is heuristic and unsupervised. III. OCTAVE-DEPENDENT PROBABILISTIC LATENT SEMANTIC ANALYSIS (ODPLSA) A. OdPlsa based Audio Analysis Because OdPlsa is used to model the spectrogram, the FFTbased short-time frequency analysis is first applied on audio signal. Then FFT frequency band is mapped to its corresponding midi note, whose magnitude is the weighted summary on all related FFT bands. The process is similar to [9]. Each midi note is denoted with its octave level and semitone. Thus the FFT sequence in the octave space is denoted as,,. Here : the octave level : the semitone between 0,11 : the time stamp (the basic unit is frame) between 0, 1 with being the sequence length The short-time frame feature (16 ms in the paper) has weak discrimination so it is often to segment the sequence using a window, that is continuous frames. Thus the sequence becomes to be the -frame chunk sequence with the length,. For the i-th chunk, it is a sequence as:,,,,, 1,,,, 1 with 1. Thus in OdPlsa, each component pattern models the magnitude distribution in both semitone and time dimensions (L-frame length). To simplify the notation, semitone and time are combined into a variable,. Without any confusion, the chunk sequence is still noted as,, ( : the t-th chunk. : refer to a specific semitone and time location in the rectangle region 12*L.). The occurrence probability of a particular spectrum at the o-th octave and f-th location for the t-th chunk,,,, is modelled by the following mixture models with K components as,,,,,. (1) In the model, models the magnitude distribution of the k-th pattern,, tells us how the pattern is distributed across the octaves in each chunk, and, describes the importance of the octave in each chunk. Thus the log-likelihood of the model to generate the observation sequence,, is defined as,,,,,,, (2) To estimate the pattern models, an auxiliary variable,,,, is introduced into Eq.(2) (abbreviated as. ). And we get,,,,,,.. (3) Then the traditional EM algorithm can be applied to estimate model parameters,,, and. In the E-step,,, is estimated as,,, X o, f, t log, P k. (4) In Eq.(4), is a constant. In the M-step, the model parameters are estimated as,,,,,, (5),,,,, (6),,,, (7),,

3 Similar to Eq. (4),,, in the above equations are also constants. B. Audio Representation in OdPlsa Latent Space The component pattern model is in OdPlsa. It describes the magnitude distribution in 2-dimensional axes. In the current implementation, it is 12-dimensional semitones and L-frame time window. It is shared by the different octaves. We can image that the patterns are the most prominent information occurred in the music signal such as frequently repeated melody or chord. Figure 1 depicts an example of spectrogram and learned patterns for a song. From the learned models, we can know how the patterns distribute in the octaves, i.e.,, according to the following equations.,,, (8),,, (9),,,, (40) For each chunk of music we can represent its content in the audio pattern space like as,,,,, (51) (a) Spectrum in MIDI notes of a song (blue row meaning this MIDI note missed due to the FFT resolution) (b) Learned audio patterns, i.e., from the song in (a) (reshaped to 12- seimitone dimension and L-frame window. K=8, L=20) Fig. 1. Illustrations of learned patterns from spectrum of a song If the octave size is, then the feature dimension will be. This representation depends on the octave. If we want to get an octave-invariance feature, we can sum up the pattern distribution over all octaves such as,, (12) IV. MUSIC SUMMARIZATION WITH CHORUS SECTIONS The task of chorus detection is to extract repeated chorus section from a single song. The chorus sections will have similar melody along the temporal despite of some changes of lyric or accompany. It is expected that the temporal information plays the important role. As discussed in the above section, OdPlsa can find temporal patterns in the signal. So we apply the OdPlsa algorithm to extract the latent space from the single song and each audio segment is represented in the latent space (See Section III). Then the pair-wise segment similarity is calculated (Here the negative Euclidean distance is used.). The similarity matrix is symmetric. So only the diagonal and lower-triangle points are considered (See Fig.2a). When visualizing the similarity matrix, the repeated two segments will form an observable line along the diagonal in the 2D image in theory. Otherwise, they should have very low values. In practice, however, it is not so simple. If one chorus section of the song almost duplicates its previous (e.g. same lyric, accompany, etc.), the line will be noticeable in the image. But if the chorus section has a few changes comparing to its previous, such as a few modifications of lyrics, accompany and tone, their similarity value will be not significant higher than the other non-repeated segment. The line will not be observable (In Fig.2a, it is the case). We first summarize the steps of chorus detection algorithm in Table I. Then each step is explained in the next. In order to detect the possible line from the similarity matrix based image, we study the distribution of the similarity values and find that it is nearly the Gaussian distribution. So we normalize the similarity values using its mean and variance and only the top-n values are kept for further processing (See Table I, step 1). The top-n values are assigned to one (While points in Fig.2b) while the others are zeros (Black points in Fig.2b). Thus, a black-white image (See Fig.2b) is built. In this image, the diagonal line is more obvious than the original (See Fig.2a). The second step (See Table I, step 2) is to find all possible lines along the diagonals. For each diagonal, it is a zero-one sequence in which one means the corresponding segments are similar while zeros means dissimilar. The sequence with the continuous ones forms a possible line. Each line corresponds to two similar song sections (with beginning and ending timestamps). In practice, we find that too many short lines are extracted. To reduce the effect of short lines, we add a few constrains about the minimal length of each section, and how much overlap ratio between two similar sections are allowed. Thus, the possible lines are found (Marked with green colour in Fig.2b and the beginning and ending marked by red X). From the candidate lines in the above, finally the chorus section is found (see Table, step 3 & 4). When we analyse these lines, some lines, i.e. corresponding sections, are

4 overlapped. So a heuristic method is developed to merge the overlapped lines. For example, the two lines will be merged if the overlapping ratio of their corresponding sections is more than a threshold (e.g. 80%). After this step, the size of the lines is significantly reduced. From these lines, we get the corresponding sections. For each section, we have a value to indicate its length and a value to indicate the number of repeated sections in the song. They are the indicators whether the section is a chorus. If a section is a chorus, it should repeat more than two times and its length should be more than a threshold such as 10 seconds. According to the indicators, the sections are chosen as the chorus, if they have the maximal repeated times and their lengths satisfy the constraints. In Fig.2c, the green lines mark the found chorus sections. These chorus sections are used to summarize the song. TABLE I CHORUS DETECTION ALGORITHM Input: similarity matrix, (see Fig.2a). N: the size of audio chunk., : (i,j)-th entry in Output: chorus sections 1. Find significant points with highly possibility lying in the similar segments and binarize as following ( : mean of similarity scores. : standard variance.), 1,, 1 0, Here 1 is set to keep top 2% points. Only the 1-value points are considered in the next steps (see Fig.2b). 2. Find all lines from the diagonals with the conditions (see Fig.2b): a) if two neighbor points along the diagonal is less than smoothing threshold ( 1 ), then merge them with previous points; b) if the line length is less than a threshold ( 2 ), ignore it. 3. Merge overlapped lines from the line candidates and generate new lines. Only keep the top-n (=30) longest lines for the next step. 4. Get all segments from the above lines, and count the repetition number of each segment. Select the longest segments with repetition numbers more than 2 as chorus sections (each section is marked with its starting and ending timestamps) (See Fig.2c). than n-second, then the chorus of the song is considered to be correct. b) n-percent overlap accuracy (M2): if the overlap between any one of the detected chorus sections and the ground-truth chorus section is more than n-percent of length of the ground-truth, then the song chorus is considered to be correct. (a) Similarity matrix (only lower triangle part of matrix) (b) Significant points (white dots) and detected lines (green) with red X marking starting and ending) V. EXPERIMENTAL RESULTS The evaluated music dataset contains 247 popular songs with large diversity in the styles, which is provided by our industry collaborator. For each song, the boundaries of all chorus sections are manually tagged. On average there are ~2.6 chorus per song. A. Evaluation Metrics The performance is reported using multiple metrics to reflect different aspects of the detection system. a) n-second starting time accuracy (M1): if the absolute difference between the starting time of any detected chorus section for a song and the ground-truth starting time is less (c) Detected chorus sections (green lines) Fig. 2. Visualizing chorus detection: from similarity matrix to chorus c) n-second starting time based precision, recall and F1 (M3): it is based on the n-second starting time accuracy, from which we will know if the detected section is true positive or

5 false positive. It gives us a complete image about the detected sections. B. Experimental Setup Each song is 16bit/8k sample rate and mono channel. The 1024-point FFT (128ms frame length) spectrum is extracted with 112ms overlapping between the consecutive frames. We only consider FFT bands in the range between 32.7Hz (C1) and 4000Hz (B7) covering 7 octaves. The OdPlsa based feature is extracted as in Section III (K=8) with a 20-frame (0.32s) audio chunk without any overlap. The benchmark system is based on the 12- dimensional chroma feature implemented as in [9]. In addition, the effect of the first-order difference feature, Δ = 1 1 2, is also studied considering its success in speech recognition [16]. We segment the sequence into a 0.64s-length section with 0.32s overlap and calculate the pair-wise segment similarity scores (here similarity function is negative Euclidean distance). Then the chorus detection algorithm (see Table I) is started to find all chorus sections. C. Comparison between OdPlsa and Chroma The results are reported in Figure 3 in terms of 1-second staring time accuracy (M1), 80% overlap accuracy (M2), and 1-second starting time F1 (M3). The relative gains reach up to 18.4% (M1), 34.4% (M2) and 39.8% (M3) respectively when comparing with the chroma-based benchmark system. OdPlsabased feature significantly outperforms the popular chroma feature. The improvement is because OdPlsa feature characterizes the pattern distribution in both octave and temporal dimensions as well as the distribution of each frequency band in the song level. But chroma only describes the distribution along the frequency band. Our results demonstrate that OdPlsa is a powerful tool for audio content representation Benchmark OdPlsa M1 M2 M3 Fig. 2. Performance comparison between OdPlsa and chroma D. Effect of n-order difference feature In speech recognition, the n-order difference features will improve the accuracy. Now we study the effect of the firstorder difference feature on the performance of chorus detection based on OdPlsa feature. The comparison is shown in Figure 4. With the addition of first-order feature, the relative improvements are 13.3% (M1), 9.7% (M2) and 5.3% (M3). So the first-order OdPlsa-based feature is a good addition for chorus detection system. We don t observe benefits from higher-order features No first-order feaure M1 M2 M3 Fig. 4. Effect of first-order feature on performance with OdPlsa E. Effect of Various Implementation OdPlsa with first-order feaure In Section III-B, we have discussed different implementations to represent the audio segment. In the above experiments, the OdPlsa feature is calculated according to Eq.11. The octave effect is considered in the feature and the feature dimension is the multiply between the latent cluster number and the octave size. The OdPlsa feature can also be computed according to Eq.12, i.e. ignoring the octave effect. In Table II, the performances of the two different implementations are shown, separately for the system with or without the first-order feature. In the table, OdPlsa_OctaveInd refers to the feature according to Eq.12. From this comparison, it is found that the octave dependent OdPlsa feature is much better than the octave independent feature. The first-order feature also improves the octave independent system, which is the conclusion same as in the above section. TABLE III PERFORMANCE EFFECT OF DIFFERENT IMPLEMENTATIONS OF ODPLSA FEATURE (ODPLSA_OCTAVEIND, ODPLSA) M1 M2 M3 without delta OdPlsa_OctaveInd OdPlsa with delta OdPlsa_OctaveInd OdPlsa VI. CONCLUSION In the paper, a chorus detection based music summary system is presented based on the novel octave-dependent PLSA feature extraction algorithm and unique chorus detection method. The performance of music summary is reported on the popular song database with the ground-truth of chorus sections. As we know, it is the first time to report the performance of chorus detection against a few hundreds songs. Our experiments show that the proposed technique and system is superior to the widely chroma feature based system. ACKNOWLEDGEMENTS We thank Lei Jia and Hui Song from Baidu Inc. for the support of the project and kindly providing the music dataset.

6 REFERENCES [1] T. Hofmann, Probabilistic latent semantic indexing, Proceedings of ACM SIGIR 99. [2] M. Cooper & J. Foote, Automatic music summarization via similarity analysis, Proc. of IC MIR 02. [3] R. J. Weiss & J. P. Bello, Unsupervised discovery of temporal structure in music, IEEE Journal of Selected Topics in Signal Processing, 5(6): , Oct [4] D. Turnbull, G. Lanckriet, E. Pampalk & M. Goto, A supervised approach for detecting boundaries in music using difference features and boosting, Proc. of ISMIR 07. [5] J. Serra, M. Muller, P. Grosche & J. Ll. Arcos, Unsupervised detection of music boundaries by time series structure features, Proc. of AAAI 12. [6] J. V. Balen, J. A. Burgoyne, F. Wiering, R. C.Veltkamp, An analysis of chorus features in popular song, Proceedings of ISMIR 13. [7] C.-H. Yeh, Y.-D. Lin, M.-S. Lee and W.-Y. Tseng, Popular music analysis: chorus and emotion detection, Proceedings of APSIPA 10. [8] A. Eronen, Chorus detection with combined use of MFCC and chroma features and image processing filters, Proc. of DAFx 07. [9] M. Goto, A chorus section detection method for musical audio signals and its application to a music listening station, IEEE Trans. on ASL, Vol.14, No.5, Sept., [10] P. Smaragdis, M. Shashanka & B. Raj, Topic models for audio mixture analysis, Proc. of NIPS Workshop on Applications for Topic Models: Text and Beyond, [11] C. Burges, D. Plastina, J. Platt, E. Renshaw & H. S. Malvar, Using audio fingerprinting for duplicate detection and audio thumbnails, Proc. of ICASSP 05. [12] B. McFee & D. P.W. Ellis, Learning to segment songs with ordinal linear discriminant analysis, Proceedings of ICASSP 14. [13] M. Goto, SmartMusicKIOSK: music listening station with chorussearch function, Proc. of ACM Symposium on User Interface Software and Technology (UIST), [14] M. Muller, P. Grosche & N.-Z. Jiang, A segment-based fitness measure for capturing repetitive structures of music recordings, Proc. of ISMIR 11. [15] M. Schedl, E. Gómez & J. Urbano, Music information retrieval: recent developments and applications, Foundations and Trends in Information Retieval, Vol.8, Issue 2-3, p , [16] L.R. Rabiner & B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, [17] A. Bosch, A. Zisserman & X. Munoz, Scene classification via plsa, Proc. of ECCV 06. [18] M. A. Bartsch & G.H. Wakefield, To catch a chorus: using chromabased representations for audio thumbnailing, Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, [19] J. Paulus, M. Muller & A. Klapuri, Audio-based music structure analysis, Proc. of ISMIR 10. [20] V. Mildner, P. Klenner, K.-D. Kammeyer, Chorus detection in songs of pop music, Proc. of ESSV 03. [21] N.C. Maddage, C.-.S. Xu, M. d. Kankanhalli & X. Shao, Content-based music structure analysis with applications to music semantics understanding, Proc. of ACM MM 04. [22] P. Golik, B. Harb, A. Misra, M. Riley, A. Rudnick, E. Weinstein, Mobile music modeling, analysis and recognition, Proc. of ICASSP 12. [23] J. Arenas-Garcia, A. Meng, K. B. Petersen, T. Lehn-Schioler, L. K. Hansen, J. Larsen, Unveiling music structure via plsa similarity fusion, Proc. of IEEE Workshop on Machine Learning for Signal Processing, [24] S. Ravuri and D. P.W. Ellis, Cover song detection: From high scores to general classification, Proc. Of ICASSP 10.

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data

Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Lie Lu, Muyuan Wang 2, Hong-Jiang Zhang Microsoft Research Asia Beijing, P.R. China, 8 {llu, hjzhang}@microsoft.com 2 Department

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

Music Structure Analysis

Music Structure Analysis Lecture Music Processing Music Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 marl music and audio research lab

Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 marl music and audio research lab Grouping Recorded Music by Structural Similarity Juan Pablo Bello New York University ISMIR 09, Kobe October 2009 Sequence-based analysis Structure discovery Cooper, M. & Foote, J. (2002), Automatic Music

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Music Structure Analysis

Music Structure Analysis Overview Tutorial Music Structure Analysis Part I: Principles & Techniques (Meinard Müller) Coffee Break Meinard Müller International Audio Laboratories Erlangen Universität Erlangen-Nürnberg meinard.mueller@audiolabs-erlangen.de

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology 26.01.2015 Multipitch estimation obtains frequencies of sounds from a polyphonic audio signal Number

More information

Audio Structure Analysis

Audio Structure Analysis Lecture Music Processing Audio Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Music Structure Analysis Music segmentation pitch content

More information

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010

Methods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 1 Methods for the automatic structural analysis of music Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 2 The problem Going from sound to structure 2 The problem Going

More information

Analysing Musical Pieces Using harmony-analyser.org Tools

Analysing Musical Pieces Using harmony-analyser.org Tools Analysing Musical Pieces Using harmony-analyser.org Tools Ladislav Maršík Dept. of Software Engineering, Faculty of Mathematics and Physics Charles University, Malostranské nám. 25, 118 00 Prague 1, Czech

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Audio Structure Analysis

Audio Structure Analysis Advanced Course Computer Science Music Processing Summer Term 2009 Meinard Müller Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Music Structure Analysis Music segmentation pitch content

More information

Content-based Music Structure Analysis with Applications to Music Semantics Understanding

Content-based Music Structure Analysis with Applications to Music Semantics Understanding Content-based Music Structure Analysis with Applications to Music Semantics Understanding Namunu C Maddage,, Changsheng Xu, Mohan S Kankanhalli, Xi Shao, Institute for Infocomm Research Heng Mui Keng Terrace

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity Holger Kirchhoff 1, Simon Dixon 1, and Anssi Klapuri 2 1 Centre for Digital Music, Queen Mary University

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

Music Structure Analysis

Music Structure Analysis Tutorial Automatisierte Methoden der Musikverarbeitung 47. Jahrestagung der Gesellschaft für Informatik Music Structure Analysis Meinard Müller, Christof Weiss, Stefan Balke International Audio Laboratories

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

Semantic Segmentation and Summarization of Music

Semantic Segmentation and Summarization of Music [ Wei Chai ] DIGITALVISION, ARTVILLE (CAMERAS, TV, AND CASSETTE TAPE) STOCKBYTE (KEYBOARD) Semantic Segmentation and Summarization of Music [Methods based on tonality and recurrent structure] Listening

More information

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

The Intervalgram: An Audio Feature for Large-scale Melody Recognition The Intervalgram: An Audio Feature for Large-scale Melody Recognition Thomas C. Walters, David A. Ross, and Richard F. Lyon Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043, USA tomwalters@google.com

More information

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL

A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL A TEXT RETRIEVAL APPROACH TO CONTENT-BASED AUDIO RETRIEVAL Matthew Riley University of Texas at Austin mriley@gmail.com Eric Heinen University of Texas at Austin eheinen@mail.utexas.edu Joydeep Ghosh University

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION Graham E. Poliner and Daniel P.W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University, New York NY 127 USA {graham,dpwe}@ee.columbia.edu

More information

Semi-supervised Musical Instrument Recognition

Semi-supervised Musical Instrument Recognition Semi-supervised Musical Instrument Recognition Master s Thesis Presentation Aleksandr Diment 1 1 Tampere niversity of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Audio Structure Analysis

Audio Structure Analysis Tutorial T3 A Basic Introduction to Audio-Related Music Information Retrieval Audio Structure Analysis Meinard Müller, Christof Weiß International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de,

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

Classification of Timbre Similarity

Classification of Timbre Similarity Classification of Timbre Similarity Corey Kereliuk McGill University March 15, 2007 1 / 16 1 Definition of Timbre What Timbre is Not What Timbre is A 2-dimensional Timbre Space 2 3 Considerations Common

More information

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING Zhiyao Duan University of Rochester Dept. Electrical and Computer Engineering zhiyao.duan@rochester.edu David Temperley University of Rochester

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Hidden Markov Model based dance recognition

Hidden Markov Model based dance recognition Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval

Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval Yi Yu, Roger Zimmermann, Ye Wang School of Computing National University of Singapore Singapore

More information

CS 591 S1 Computational Audio

CS 591 S1 Computational Audio 4/29/7 CS 59 S Computational Audio Wayne Snyder Computer Science Department Boston University Today: Comparing Musical Signals: Cross- and Autocorrelations of Spectral Data for Structure Analysis Segmentation

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

The song remains the same: identifying versions of the same piece using tonal descriptors

The song remains the same: identifying versions of the same piece using tonal descriptors The song remains the same: identifying versions of the same piece using tonal descriptors Emilia Gómez Music Technology Group, Universitat Pompeu Fabra Ocata, 83, Barcelona emilia.gomez@iua.upf.edu Abstract

More information

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function EE391 Special Report (Spring 25) Automatic Chord Recognition Using A Summary Autocorrelation Function Advisor: Professor Julius Smith Kyogu Lee Center for Computer Research in Music and Acoustics (CCRMA)

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

SEGMENTATION, CLUSTERING, AND DISPLAY IN A PERSONAL AUDIO DATABASE FOR MUSICIANS

SEGMENTATION, CLUSTERING, AND DISPLAY IN A PERSONAL AUDIO DATABASE FOR MUSICIANS 12th International Society for Music Information Retrieval Conference (ISMIR 2011) SEGMENTATION, CLUSTERING, AND DISPLAY IN A PERSONAL AUDIO DATABASE FOR MUSICIANS Guangyu Xia Dawen Liang Roger B. Dannenberg

More information

Topic 10. Multi-pitch Analysis

Topic 10. Multi-pitch Analysis Topic 10 Multi-pitch Analysis What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification 1138 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 6, AUGUST 2008 Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification Joan Serrà, Emilia Gómez,

More information

Music structure information is

Music structure information is Feature Article Automatic Structure Detection for Popular Music Our proposed approach detects music structures by looking at beatspace segmentation, chords, singing-voice boundaries, and melody- and content-based

More information

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS

A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS A STATISTICAL VIEW ON THE EXPRESSIVE TIMING OF PIANO ROLLED CHORDS Mutian Fu 1 Guangyu Xia 2 Roger Dannenberg 2 Larry Wasserman 2 1 School of Music, Carnegie Mellon University, USA 2 School of Computer

More information

Lecture 9 Source Separation

Lecture 9 Source Separation 10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research

More information

An Examination of Foote s Self-Similarity Method

An Examination of Foote s Self-Similarity Method WINTER 2001 MUS 220D Units: 4 An Examination of Foote s Self-Similarity Method Unjung Nam The study is based on my dissertation proposal. Its purpose is to improve my understanding of the feature extractors

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS Justin Salamon Music Technology Group Universitat Pompeu Fabra, Barcelona, Spain justin.salamon@upf.edu Emilia

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

A Music Retrieval System Using Melody and Lyric

A Music Retrieval System Using Melody and Lyric 202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Music Alignment and Applications. Introduction

Music Alignment and Applications. Introduction Music Alignment and Applications Roger B. Dannenberg Schools of Computer Science, Art, and Music Introduction Music information comes in many forms Digital Audio Multi-track Audio Music Notation MIDI Structured

More information

Audio Feature Extraction for Corpus Analysis

Audio Feature Extraction for Corpus Analysis Audio Feature Extraction for Corpus Analysis Anja Volk Sound and Music Technology 5 Dec 2017 1 Corpus analysis What is corpus analysis study a large corpus of music for gaining insights on general trends

More information

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA

More information

Music Information Retrieval with Temporal Features and Timbre

Music Information Retrieval with Temporal Features and Timbre Music Information Retrieval with Temporal Features and Timbre Angelina A. Tzacheva and Keith J. Bell University of South Carolina Upstate, Department of Informatics 800 University Way, Spartanburg, SC

More information

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Toward Automatic Music Audio Summary Generation from Signal Analysis

Toward Automatic Music Audio Summary Generation from Signal Analysis Toward Automatic Music Audio Summary Generation from Signal Analysis Geoffroy Peeters IRCAM Analysis/Synthesis Team 1, pl. Igor Stravinsky F-7 Paris - France peeters@ircam.fr ABSTRACT This paper deals

More information

Automatic music transcription

Automatic music transcription Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola:

More information

ISSN ICIRET-2014

ISSN ICIRET-2014 Robust Multilingual Voice Biometrics using Optimum Frames Kala A 1, Anu Infancia J 2, Pradeepa Natarajan 3 1,2 PG Scholar, SNS College of Technology, Coimbatore-641035, India 3 Assistant Professor, SNS

More information

Recognising Cello Performers using Timbre Models

Recognising Cello Performers using Timbre Models Recognising Cello Performers using Timbre Models Chudy, Magdalena; Dixon, Simon For additional information about this publication click this link. http://qmro.qmul.ac.uk/jspui/handle/123456789/5013 Information

More information

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS François Rigaud and Mathieu Radenen Audionamix R&D 7 quai de Valmy, 7 Paris, France .@audionamix.com ABSTRACT This paper

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT 10th International Society for Music Information Retrieval Conference (ISMIR 2009) FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT Hiromi

More information

Sparse Representation Classification-Based Automatic Chord Recognition For Noisy Music

Sparse Representation Classification-Based Automatic Chord Recognition For Noisy Music Journal of Information Hiding and Multimedia Signal Processing c 2018 ISSN 2073-4212 Ubiquitous International Volume 9, Number 2, March 2018 Sparse Representation Classification-Based Automatic Chord Recognition

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons Róisín Loughran roisin.loughran@ul.ie Jacqueline Walker jacqueline.walker@ul.ie Michael O Neill University

More information

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE 12th International Society for Music Information Retrieval Conference (ISMIR 2011) MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE Sihyun Joo Sanghun Park Seokhwan Jo Chang D. Yoo Department of Electrical

More information

Automatic Summarization of Music Videos

Automatic Summarization of Music Videos Automatic Summarization of Music Videos XI SHAO, CHANGSHENG XU, NAMUNU C. MADDAGE, and QI TIAN Institute for Infocomm Research, Singapore MOHAN S. KANKANHALLI School of Computing, National University of

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING Luis Gustavo Martins Telecommunications and Multimedia Unit INESC Porto Porto, Portugal lmartins@inescporto.pt Juan José Burred Communication

More information

Recognising Cello Performers Using Timbre Models

Recognising Cello Performers Using Timbre Models Recognising Cello Performers Using Timbre Models Magdalena Chudy and Simon Dixon Abstract In this paper, we compare timbre features of various cello performers playing the same instrument in solo cello

More information

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark 214 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION Gregory Sell and Pascal Clark Human Language Technology Center

More information

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the

More information

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Efficient Vocal Melody Extraction from Polyphonic Music Signals http://dx.doi.org/1.5755/j1.eee.19.6.4575 ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 19, NO. 6, 213 Efficient Vocal Melody Extraction from Polyphonic Music Signals G. Yao 1,2, Y. Zheng 1,2, L.

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

Content-based music retrieval

Content-based music retrieval Music retrieval 1 Music retrieval 2 Content-based music retrieval Music information retrieval (MIR) is currently an active research area See proceedings of ISMIR conference and annual MIREX evaluations

More information