A Music Retrieval System Using Melody and Lyric

Size: px
Start display at page:

Download "A Music Retrieval System Using Melody and Lyric"

Transcription

1 202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent System Laboratory 2 Key Laboratory of Trustworthy Distributed Computing and Service, Ministry of Education,2 Beiing University of Posts and Telecommunications Beiing, China guozhiyuan.cathie@gmail.com Abstract Using melody and/or lyric to query a music retrieval system is convenient for users but challenging for developers. This paper proposes efficient schemes for realizing key algorithms in such a kind of system. Specifically, we characterize our system by adding lyric to query as follows: A Support Vector Machine (SVM) is employed to distinguish humming queries from singing queries; For a singing query, lyrics of candidates, which are pre-selected by the commonly used melody matching method, are used to dynamically build up the recognition network; A novel fusion strategy, which is based on the classification confidence, is proposed to combine the lyric and melody scores. The experimental results show that error reduction rates as much as 22.9%, 25.0%, 28.7% and 33.5% for mean reciprocal rank (MRR) are achieved by using the proposed method, respectively for four existing query by singing/humming (QBSH) systems. Keywords-QBSH; SVM; isolated-word recognition; music retrieval I. INTRODUCTION Query-by-singing/humming (QBSH) systems, which help user to find the wanted song based on the singing or humming queries, provide an intuitive and practical way for music retrieval. In the past decades, substantial research has been devoted to the QBSH systems [-0], and various effective matching method have been proposed, such as dynamic time warping (DTW) [2], linear scaling (LS) [3], recursive alignment (RA) [4], and earth mover s distance (EMD) [5]. However, most of QBSH systems use only the melody features, but lyric information is ignored. Moreover, many researchers believe that singing queries are more difficult to handle than the humming queries in QBSH systems, because the speech in the singing audio will reduce the accuracy of melody feature extracting, which closely connects to the quality of a QBSH system. To deal with this problem, Haus et al [6] applied signal processing techniques for the singing queries to extract melody feature more accurately. In fact, lyric information is very useful for song identification. Guo et al [] developed a music retrieval system using spoken lyric queries. Since most of users are non-professional singers, it is very likely that the input singing/humming queries contain errors and biases. In this case, QBSH systems, which only based on melody, may result a failed retrieval. Obviously, lyric is the additive complementary information for song identification if the input query is sung. Using both melody and lyric for QBSH systems is intuitive but challenging for researchers. Firstly, because only the singing queries include lyric information, there is the risk to extract false lyrics, which do not actually exist in humming queries. That will lead to a serious deterioration of performance. Secondly, it is difficult to extract lyric feature from singing queries, because the speech is deformed. There has been a few research studies devote to melody and lyric based QBSH systems. Suzuki et al [7] proposed a QBSH method based on both lyric and melody information, but it can not handle humming queries, which contain no lyric information. To solve this issue, Wang et al [8] used a singing/humming discriminator (SHD) to distinguish the humming query from singing query. He firstly converted the query into a phone string by using a phone-level continuous speech recognizer, and then counted the number of distinct phones in the string. Considering a singing query usually has more distinct phones than s humming query, he classified the input query as humming or singing. A singing query would be converted into a syllable string, and each candidate obtained a lyric score using a syllable-level recognizer. His method provided a slight improvement. However, the processing time was greatly increased due to that two recognition procedures are added. Moreover, as the classification accuracy was extremely dependent on the phone recognition results, which unfortunately were not accurate enough, the improvement of retrieval accuracy was insignificant. This paper proposes a novel QBSH method using both melody and lyric information. Different from Wang s method, we use a well-trained SVM to identify the singing query, and a dynamically constructed isolated-word recognizer to recognize lyric of the singing query. Moreover, a robust fusion method, which is based on classification confidence, is used to combine the lyric and melody scores. Experimental results show that our classifier significantly outperforms than the classifier proposed by Wang. The error reduction rates of 22.9%, 25.0%, 28.7% and 33.5% for mean reciprocal rank (MRR) are achieved by using the proposed method, respectively for four existing QBSH systems. The remainder of this paper is organized as follows: Section describes the overview of the proposed QBSH system. The proposed method is introduced in Section. In /2 $ IEEE DOI 0.09/ICMEW

2 Section, the experimental results are demonstrated. The conclusion follows in Section. II. OVERVIEW OF THE PROPOSED QBSH SYSTEM Fig. gives an overview of the proposed QBSH system, which processes as follows: Step : We use a melody matching method to sort music in the database. Music clips, which have top K highest melody scores, are selected as candidates. Four different methods were used in our experiments, viz., DTW [2], LS [3], RA [4] and EMD [5], which have been proved effective in QBSH systems. Step 2: A well-trained SVM is employed to classify the input query into the categories of humming or singing. Step 3: If the query is classified as a humming query, the ranked candidates according to the melody scores will be returned to the user. Step 4: If the query is classified as a singing query, a dynamically constructed isolated-word recognizer is employed to assign lyric scores for all candidates, and then two kinds of scores including lyric and melody scores are fused. The ranked results will be returned to the user according to the combined scores. III. THE PROPOSED QBSH METHOD A. Melody retrieval Melody retrieval aims at finding the probable candidate clips, which are most similar to the query in respect of melody. Many melody matching methods can achieve this goal. Most of them, such as LS, DTW, and RA, can calculate the distance between two pitch sequences. Others, such as EMD, can calculate the distance between two note sequences. A clip, which has a smaller distance with the query, is considered to be more similar to the query, and it obtains a higher melody score. Let Q and P represent the query and a clip in the database respectively. It should to be noted that both them have been converted into pitch or note sequences according to the adopted melody matching method. Let D Q,P represents the distance between the query Q and the clip P. MS(P), the melody score of P, can be calculated by (). Clips, which have the top K largest melody scores, are selected as candidates. MS( P) = () D Q, P We provide a brief description of the four commonly used melody matching methods. ) DTW: Dynamic time warping (DTW) [2] is a pitch based matching method. The distance between two pitch sequences S = p, p,..., p ), S = q, q,..., q ) can be ( 2 n 2 ( 2 m iteratively calculated by (2). Here, n and m mean the length of S, S 2 respectively. p i is the i-th pitch of S, and q SVM Training Corpus Lyrics Database Query SVM Lyrics Recognition Lyrics Scores Score Fusion Singing Melody Retrieval Humming Candidate Fragments Melody Scores Result MIDI Database Figure. The framework of the proposed QBSH system is the -th pitch of S 2. d(i, ) is the cost associated with p i and q, which can be defined as: d(i, )= p i - q -, where is the absolute value operation and is a constant. D(i, ) represents the minimum distance from the start point to the lattice point (i, ). Obviously, D(n, m) is the distance between S and S 2. D( i 2, ) ( i, ) = d( i, ) + min D( i, ) D( i, 2) D (2) 2) LS: Linear scaling (LS) [3] is a simple but effective pitch-based melody matching method. The main idea of this method is rescaling the input audio, based on the analysis that the length of the input audio is not always equal to the corresponding part in the MIDI data. LS choose different rescaling factors to stretch or compress the pitch contour of the input audio to more accurately match the correct part in the MIDI file. The most appropriate rescaling factor will result in minimum distance between the input audio and the music clip in the database. 3) RA: Recursive alignment (RA) [4] is another pitchbased melody matching method. Since linear scaling can not solve the problem of nonlinear alignment, RA solves this problem in a top-down fashion which is more capable of capturing long distance information in human singing. This method differs from DTW, because it starts optimization from a global view. RA utilizes LS as a subroutine and it tries to tune local matching recursively in order to optimize the alignment. Further details may be found in [4]. 4) EMD: Earth mover s distance (EMD) [5] measures the minimal cost that must be paid to transform one distribution into the other. Melody matching can be naturally cast as a transportation problem by defining one clips as the supplier and the other as the consumer. 344

3 To obtain EMD between the input query and a candidate clip, we need convert the clip into a set of notes with weights. Let P = {( p, ωp ),( p, ),...,(, )} 2 ω p p 2 n ω be the notes set pn of a candidate clip as supplier, where p i is a note occurred in the candidate, and ω p i is the duration of p i. Similarly, let Q = {( q, ωq ),( q, ),...,(, )} 2 ωq q 2 m ω represents qm the query as demander. The EMD, which represents the melody distance between two clips here, can be quickly calculated by many algorithms. B. SVM-based Singing/Humming Classification SVM [2] has attracted lots of researchers due to its excellent performance on many classification problems. It is reported that SVMs can achieve greater or equal performance comparing to other classifiers, while requiring significantly less training data. In our system, we use an SVM to distinguish humming clips from singing clips. An SVM is trained using training data, including 30 humming clips and 30 singing clips. All the training date and input audio were segmented into 0.25-second frames with 50% overlap, and then 32-dimensional features were extracted for each frame. ) Features: 32-dimensional features, which includes one-dimensional zero cross rate, one-dimensional spectral energy, one-dimensional spectral centroid, one-dimensional spectral bandwidth, eight-dimensional spectral band energy, eight-dimensional sub-band spectral flux, twelvedimensional mel- frequency cepstral coefficients (MFCC), are extracted to represent a frame in this work. 2) Singing/Humming classification using SVM: In the classification process, we first segment an input audio into frames, and then classify each frame into the categories of singing or humming using the trained SVM. To mitigate the impact of inevitable classification errors, a median filter is used to smooth the classification results contour. Fig. 2 shows the results of the first 90 frames of a query before and after smoothing. The width of filter window is set to 3 in the example. It can be seen that two itters are removed. Let N s represents the number of singing frames of an input query, which can be counted from the SVM classification results after smoothing, and N h represents the number of humming frames. N s /(N s +N h ) represents the proportion of singing frames in the input query. A larger value of N s /(N s +N h ) represents a higher possibility that the input query is sung. But it is important to note that different misclassifications have different costs. The cost of a humming query being misclassified as singing is larger than the cost of a singing query being misclassified as humming. Because if a humming query is misclassified as singing, the lyric information, which does not actually exist, will be exacted, and it will lead to deterioration. But in the opposite situation, a singing query still can find its corresponding song using only the melody information even if it is misclassified as humming. So, we should improve the classification accuracy of category singing. A threshold T s (0.5 T s <) is used to handle this situation, the input query will be classified as a singing query when the value of N s /(N s +N h ) is greater than or equal to T s. A larger value of T s leads to a higher classification accuracy of singing. That is to say, the classification results of singing are more reliable. C. Lyric Recognition If the query clip is classified into singing, a lyric recognizer is used to assign lyric score for each candidate clip, which is selected by melody matching method. Since melody matching method has located each candidate clip in their corresponding songs, it is easy to obtain their lyrics. By using each lyric as a word, an isolated-word recognition network can be easily constructed. Fig. 3 shows the structure of the recognition network, which has K paths representing K candidate lyrics. K is usually between 20 to00. The isolated-word recognizer uses continuous density hidden Markov models with cross word, context dependent tied state tri-phones. 39 MFCC is extracted from each frame for recognition. When the recognition process finished, each word can get a posterior probability. The lyric score of a candidate is the posterior probability of its corresponding lyric. An isolated word recognizer performs better comparing with a continuous speech recognizer in the system, since the Lyrics Lyrics 2 Begin End Lyrics K- Figure 2. An example of smoothing results. In the vertical axis, represents humming and - represents singing. The left panel shows the initial classification results of the first 90 frames in the input query, and the right panel shows the smoothed results. Figure 3. Lyrics K The lyric recognition network. 345

4 lyric of the input singing query is one of the K candidate lyrics. Due to the simplicity of the recognition network, the lyric recognition is fast and accurate. D. Combination of melody and lyric scores A score level fusion strategy is proposed to combine the lyric and melody scores for the candidate clips of a singing query. Various methods were proposed for score level fusion [9], such as the MIN, MAX, SUM, PRODUCT, and Weighted SUM rules. MIN means selecting the minimum value of all of the scores, MAX means selecting the maximum value, PRODUCT means to obtain the multiplied value of all the scores, SUM means to obtain the summed value and Weighted SUM means to obtain the summed value of all the scores with weights. In the proposed QBSH system, we use Weighted SUM rule, which has been verified that it can achieve the best performance comparing with other rules. The final score of a candidate clip can be calculated as follows: CS(c ) = p MS(c ) + (- p) LS(c ) (3) Where c represents the -th candidate, MS(c ) is the melody score of c, LS(c ) is its lyric score (As mentioned in Section, the lyric score of a candidate is the posterior probability of its corresponding lyric.), and CS(c ) is the fused score. p is the weight coefficient which can be determined empirically. Furthermore, the QBSH system will be deteriorated in the case that the humming query was wrongly classified as a singing query. The classification confidence is used to weight the lyric score according the classification confidence of singing. The improved score level fusion method is as follows: N s CS(c ) = p MS(c ) + ( - p) LS(c ) (4) (N s + N h ) Where N s means the number of frames classified as singing frames for one query, and N h means the number of frames classified as humming frames. N s /(N s +N h ) represents the confidence that the input query is sung. The improved fusion method is more robust against classification errors. IV. EXPERIMENTS A. Experimental Data and Setup The MIREX (Music Information Retrieval Evaluation Exchange) QBSH corpus released by Jang [3] is used to demonstrate the proposed method. The corpus includes 48 MIDI files and 443 singing or humming queries. All the queries are from the beginning. We add 000 MIDI files to MIREX corpus to compose the MIDI database. The Lyrics database consists of lyrics of all songs in the MIDI database. Since our lyric recognizer is for Mandarin, we selected 878 queries belonging to Chinese songs in the corpus. 60 queries, including 30 humming clips and 30 singing clips, are randomly selected to train the SVM. The left 88 clips, including 47 singing queries and 40 humming queries, compose the test set. The acoustic model (AM) of the lyric recognizer were trained using Chinese speech recognition corpus of 863 Program [4], which is a database provided by Chinese National High Technology Proect 863 for Chinese LVCSR system development, and all the audios in the corpus are normal speech. All the experiments were conducted on a platform of PC and C++. B. Evaluation Metrics The evaluation measurements are top-m hit rate and mean reciprocal rank (MRR). Let r i denotes the rank of correct song, the top-m hit rate is the proportion of queries for which r i M. MRR is the average of the reciprocal ranks across all queries, and it can be calculated as (5). Here, n is the number of queries, and rank i means the rank of the correct song corresponding to the i-th query. MRR = n n i= rank C. Singing/humming classification results using SVM Table shows the singing/humming classification accuracies of Wang s method and the proposed SVM based method with different values of T s described in Section. The second column of the table gives the results of Wang s method [8]. Here, the classification accuracy of Singing / Humming is defined as the proportion of correctly classified clips among the clips which are classified as Singing / Humming. An overall classification accuracy of 89.27% has been yielded when the threshold T s is set to 0.5. It can be seen that the proposed SVM based classifier significantly outperforms Wang s method not only in the overall classification accuracy, but also in the classification accuracy of category Singing, which is more important in the QBH system according to the analysis in Section. Moreover, our method can easily control the classification accuracy of category Singing by setting different values of T s. The classification accuracy of category Singing increases with the increasing of T s. D. The performances using different lyric recognition methods All of 47 singing clips are used to test the performance using our recognition methods (with different values of K) and Wang s methods. Fig. 4 shows the experimental results. SCSR is a short name of syllable based continuous speech recognition, which is used by Wang et al [8]. The other 4 curves represent using our lyric recognition methods. K represents the number of candidates, which are selected by RA [4]. In these experiments, we only use the lyric scores to rank the candidates. i (5) 346

5 TABLE I. THE HUMMING AND SINGING CLASSIFICATION ACCURACIES. Categories SHD[8] T s=0.50 T s=0.55 T s=0.60 T s=0.65 Humming 54.80% 70.36% 68.6% 65.54% 63.05% Singing 95.43% 96.57% 97.2% 97.30% 97.64% 98% 97% 96% 95% 00% 95% 90% 85% 80% 94% 93% 92% 9% 90% 75% 70% 65% 60% Figure 5. The performance of RA, DTW and the combinations with lyric recognition. 00% Figure 4. The retrieval performances using only lyric scores derived by different recognition methods. The vertical axis means the hit rate, while the horizontal axis shows the top T candidates and MRR. As can be seen, the proposed isolated-word lyric recognizer is much effective than the continuous speech recognizer used in Wang s method [8]. Besides, the recognition speed is increased by approximately 3 times comparing with SCSR. It should be noted that the top-20 rate reduced when the value of K increases from 0 to 20. Smaller value of K means smaller amount of competition paths in the network, which is helpful for recognition. But if K is too small, the right lyric may be not included in the K lyrics, and this will lead to recognition errors definitely. E. Retrieval accuracy of the proposed QBSH systems The melody retrieval part in our system can adopt any existing melody matching methods. We realize four systems, namely, Melody&Lyric RA, Melody&Lyric DTW, Melody&Lyric LS and Melody&Lyric EMD, by using RA, DTW, LS and EMD as melody matching method respectively. Fig. 5 and Fig. 6 show the performance of the above four systems (K=50, and T s =0.55 for all four systems). The axes in Fig. 5 and Fig. 6 have the same meaning as that in Fig. 4. As can be seen, Melody&Lyric RA, which use both lyric and melody information, performs better than Melody RA, which is only based on melody information. The same conclusions can be obtained for the other three systems. From Fig. 5 and Fig. 6, we can see that RA performs best among the four methods, DTW is the second, LS is the third and EMD performs worst. But after adding lyric information, the corresponding four systems achieve error reduction rates of 22.9%, 25.0%, 28.7% and 33.5% respectively. It indicates that the worse the performance of melody-only system is, the greater the improvement will be. 95% 90% 85% 80% 75% 70% Figure 6. The performance of LS, EMD and the combinations with lyric recognition. V. CONCLUSION In this paper, we proposed a novel QBSH method by adding lyrics information. An SVM classifier was used to identify the singing query, and an isolated-word recognizer was used to recognize lyrics. In addition, fusion method was proposed to combine melody and lyric scores. Our experiments demonstrate that this method shows promising results on the test data. Our current lyrics recognizer is designed for Mandarin. It can not handle English songs. We will try to develop a Mandarin-English bilingual recognizer to solve this issue in the future. ACKNOWLEDGMENT This work was partially supported by the proect under Grant No.B08004, a key proect of the Ministry of Science and Technology of China under grant no.202zx , Innovation Fund of Information and Communication Engineering School of BUPT in 20, Development Program (863) of China under grant No. 20AA0A205, and the Next-Generation Broadband 347

6 Wireless Mobile Communications Network Technology Key Proect under Grant No. 20ZX REFERENCES [] A. Ghias, J. Logan, D. Chamberlin, B.C. Smith, Query by humming: Musical information retrieval in an audio database, Proc. ACM Multimedia, pp , 995. [2] J.S.R. Jang, M.Y. Gao, A query-by-singing system based on dynamic programming, Proc. International Workshop on Intelligent Systems Resolutions (the 8th Bellman Continuum), PP , Hsinchu, Taiwan, Dec [3] J.S.R. Jang, H. Lee, M. Kao, Content-based music retrieval using linear scaling and branch-and-bound tree search, Proc. ICME, 200. [4] X. Wu, M. Li, J. Liu, J. Yang, Y. Yan, A top-down approach to melody match in pitch contour for query by humming, Proc. International Conference of Chinese Spoken Language Processing, [5] S. Huang, L. Wang, S. Hu, H. Jiang, B. Xu, Query by humming via multiscale transportation distance in random query occurrence context, Proc. ICME, [6] G. Haus, E. Pollastri, An audio front end for query-by-humming systems, Proc. ISMIR, 200. [7] M. Suzuki, T. Hosoya, A. Ito, S. Makino, Music information retrieval from a singing voice using lyrics and melody information, EURASIP Journal on Advances in Signal Processing, vol.2007, [8] C.C. Wang, J.S.R. Jang, W. Wang, An improved query by singing/humming system using melody and lyrics information, Proc. ISMIR, 200. [9] G.P. Nam, T.T.T. Luong, H.H. Nam, Intelligent query by humming system based on score level fusion of multiple classifiers, EURASIP Journal on Advances in Signal Processing, vol. 20, pp.22, 20. [0] Q. Wang, Z. Guo, G. Liu, J. Guo, Y. Lu, Query by humming by using locality sensitive hashing based on combination of pitch and note, Proc. International Conference on Multimedia & Expo Workshops (ICMEW), 202. [] Z. Guo, Q. Wang, G. Liu, J. Guo, A music retrieval system based on spoken lyric queries, International Journal of Advancements in Computing Technology, in press. [2] B.E. Boser, I. Guyon, V. Vapnik, A training algorithm for optimal margin classifiers, Proc. COLT, 992, pp [3] [4] Y. Qian, S. Lin, Y. Zhang, Y. Liu, H. Liu, Q. Liu, An introduction to corpora resources of 863 program for Chinese language processing and human machine interaction, Proc. ALR2004, affiliated to IJCNLP,

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting

Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

A LYRICS-MATCHING QBH SYSTEM FOR INTER- ACTIVE ENVIRONMENTS

A LYRICS-MATCHING QBH SYSTEM FOR INTER- ACTIVE ENVIRONMENTS A LYRICS-MATCHING QBH SYSTEM FOR INTER- ACTIVE ENVIRONMENTS Panagiotis Papiotis Music Technology Group, Universitat Pompeu Fabra panos.papiotis@gmail.com Hendrik Purwins Music Technology Group, Universitat

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset

Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,

More information

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Melody Retrieval On The Web

Melody Retrieval On The Web Melody Retrieval On The Web Thesis proposal for the degree of Master of Science at the Massachusetts Institute of Technology M.I.T Media Laboratory Fall 2000 Thesis supervisor: Barry Vercoe Professor,

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

A Note Based Query By Humming System using Convolutional Neural Network

A Note Based Query By Humming System using Convolutional Neural Network INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden A Note Based Query By Humming System using Convolutional Neural Network Naziba Mostafa, Pascale Fung The Hong Kong University of Science and Technology

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Retrieval of textual song lyrics from sung inputs

Retrieval of textual song lyrics from sung inputs INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Retrieval of textual song lyrics from sung inputs Anna M. Kruspe Fraunhofer IDMT, Ilmenau, Germany kpe@idmt.fraunhofer.de Abstract Retrieving the

More information

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui

More information

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

Singer Identification

Singer Identification Singer Identification Bertrand SCHERRER McGill University March 15, 2007 Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 1 / 27 Outline 1 Introduction Applications Challenges

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,

More information

Music Database Retrieval Based on Spectral Similarity

Music Database Retrieval Based on Spectral Similarity Music Database Retrieval Based on Spectral Similarity Cheng Yang Department of Computer Science Stanford University yangc@cs.stanford.edu Abstract We present an efficient algorithm to retrieve similar

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Detecting Musical Key with Supervised Learning

Detecting Musical Key with Supervised Learning Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca

More information

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.

More information

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL

HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL 12th International Society for Music Information Retrieval Conference (ISMIR 211) HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL Cristina de la Bandera, Ana M. Barbancho, Lorenzo J. Tardón,

More information

A Query-by-singing Technique for Retrieving Polyphonic Objects of Popular Music

A Query-by-singing Technique for Retrieving Polyphonic Objects of Popular Music A Query-by-singing Technique for Retrieving Polyphonic Objects of Popular Music Hung-Ming Yu, Wei-Ho Tsai, and Hsin-Min Wang Institute of Information Science, Academia Sinica, Taipei, Taiwan, Republic

More information

Error Resilience for Compressed Sensing with Multiple-Channel Transmission

Error Resilience for Compressed Sensing with Multiple-Channel Transmission Journal of Information Hiding and Multimedia Signal Processing c 2015 ISSN 2073-4212 Ubiquitous International Volume 6, Number 5, September 2015 Error Resilience for Compressed Sensing with Multiple-Channel

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)

More information

Singing voice synthesis based on deep neural networks

Singing voice synthesis based on deep neural networks INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf

WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS. A. Zehetner, M. Hagmüller, and F. Pernkopf WAKE-UP-WORD SPOTTING FOR MOBILE SYSTEMS A. Zehetner, M. Hagmüller, and F. Pernkopf Graz University of Technology Signal Processing and Speech Communication Laboratory, Austria ABSTRACT Wake-up-word (WUW)

More information

Normalized Cumulative Spectral Distribution in Music

Normalized Cumulative Spectral Distribution in Music Normalized Cumulative Spectral Distribution in Music Young-Hwan Song, Hyung-Jun Kwon, and Myung-Jin Bae Abstract As the remedy used music becomes active and meditation effect through the music is verified,

More information

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,

More information

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction

Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction Hsuan-Huei Shih, Shrikanth S. Narayanan and C.-C. Jay Kuo Integrated Media Systems Center and Department of Electrical

More information

Toward Evaluation Techniques for Music Similarity

Toward Evaluation Techniques for Music Similarity Toward Evaluation Techniques for Music Similarity Beth Logan, Daniel P.W. Ellis 1, Adam Berenzweig 1 Cambridge Research Laboratory HP Laboratories Cambridge HPL-2003-159 July 29 th, 2003* E-mail: Beth.Logan@hp.com,

More information

Research on sampling of vibration signals based on compressed sensing

Research on sampling of vibration signals based on compressed sensing Research on sampling of vibration signals based on compressed sensing Hongchun Sun 1, Zhiyuan Wang 2, Yong Xu 3 School of Mechanical Engineering and Automation, Northeastern University, Shenyang, China

More information

Creating Data Resources for Designing User-centric Frontends for Query by Humming Systems

Creating Data Resources for Designing User-centric Frontends for Query by Humming Systems Creating Data Resources for Designing User-centric Frontends for Query by Humming Systems Erdem Unal S. S. Narayanan H.-H. Shih Elaine Chew C.-C. Jay Kuo Speech Analysis and Interpretation Laboratory,

More information

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals

Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Characteristics of Polyphonic Music Style and Markov Model of Pitch-Class Intervals Eita Nakamura and Shinji Takaki National Institute of Informatics, Tokyo 101-8430, Japan eita.nakamura@gmail.com, takaki@nii.ac.jp

More information

A Pattern Recognition Approach for Melody Track Selection in MIDI Files

A Pattern Recognition Approach for Melody Track Selection in MIDI Files A Pattern Recognition Approach for Melody Track Selection in MIDI Files David Rizo, Pedro J. Ponce de León, Carlos Pérez-Sancho, Antonio Pertusa, José M. Iñesta Departamento de Lenguajes y Sistemas Informáticos

More information

TANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM. M. Anand Raju, Bharat Sundaram* and Preeti Rao

TANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM. M. Anand Raju, Bharat Sundaram* and Preeti Rao TANSEN: A QUERY-BY-HUMMING BASE MUSIC RETRIEVAL SYSTEM M. Anand Raju, Bharat Sundaram* and Preeti Rao epartment of Electrical Engineering, Indian Institute of Technology, Bombay Powai, Mumbai 400076 {maji,prao}@ee.iitb.ac.in

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

Singing Pitch Extraction and Singing Voice Separation

Singing Pitch Extraction and Singing Voice Separation Singing Pitch Extraction and Singing Voice Separation Advisor: Jyh-Shing Roger Jang Presenter: Chao-Ling Hsu Multimedia Information Retrieval Lab (MIR) Department of Computer Science National Tsing Hua

More information

Optimized Color Based Compression

Optimized Color Based Compression Optimized Color Based Compression 1 K.P.SONIA FENCY, 2 C.FELSY 1 PG Student, Department Of Computer Science Ponjesly College Of Engineering Nagercoil,Tamilnadu, India 2 Asst. Professor, Department Of Computer

More information

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Efficient Vocal Melody Extraction from Polyphonic Music Signals http://dx.doi.org/1.5755/j1.eee.19.6.4575 ELEKTRONIKA IR ELEKTROTECHNIKA, ISSN 1392-1215, VOL. 19, NO. 6, 213 Efficient Vocal Melody Extraction from Polyphonic Music Signals G. Yao 1,2, Y. Zheng 1,2, L.

More information

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION

DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories

More information

A Categorical Approach for Recognizing Emotional Effects of Music

A Categorical Approach for Recognizing Emotional Effects of Music A Categorical Approach for Recognizing Emotional Effects of Music Mohsen Sahraei Ardakani 1 and Ehsan Arbabi School of Electrical and Computer Engineering, College of Engineering, University of Tehran,

More information

Toward Multi-Modal Music Emotion Classification

Toward Multi-Modal Music Emotion Classification Toward Multi-Modal Music Emotion Classification Yi-Hsuan Yang 1, Yu-Ching Lin 1, Heng-Tze Cheng 1, I-Bin Liao 2, Yeh-Chin Ho 2, and Homer H. Chen 1 1 National Taiwan University 2 Telecommunication Laboratories,

More information

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

A Fast Alignment Scheme for Automatic OCR Evaluation of Books A Fast Alignment Scheme for Automatic OCR Evaluation of Books Ismet Zeki Yalniz, R. Manmatha Multimedia Indexing and Retrieval Group Dept. of Computer Science, University of Massachusetts Amherst, MA,

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS

AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS AUTOMATIC MAPPING OF SCANNED SHEET MUSIC TO AUDIO RECORDINGS Christian Fremerey, Meinard Müller,Frank Kurth, Michael Clausen Computer Science III University of Bonn Bonn, Germany Max-Planck-Institut (MPI)

More information

A Framework for Segmentation of Interview Videos

A Framework for Segmentation of Interview Videos A Framework for Segmentation of Interview Videos Omar Javed, Sohaib Khan, Zeeshan Rasheed, Mubarak Shah Computer Vision Lab School of Electrical Engineering and Computer Science University of Central Florida

More information

Transcription of the Singing Melody in Polyphonic Music

Transcription of the Singing Melody in Polyphonic Music Transcription of the Singing Melody in Polyphonic Music Matti Ryynänen and Anssi Klapuri Institute of Signal Processing, Tampere University Of Technology P.O.Box 553, FI-33101 Tampere, Finland {matti.ryynanen,

More information

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Selective Intra Prediction Mode Decision for H.264/AVC Encoders Selective Intra Prediction Mode Decision for H.264/AVC Encoders Jun Sung Park, and Hyo Jung Song Abstract H.264/AVC offers a considerably higher improvement in coding efficiency compared to other compression

More information

Mood Tracking of Radio Station Broadcasts

Mood Tracking of Radio Station Broadcasts Mood Tracking of Radio Station Broadcasts Jacek Grekow Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, Bialystok 15-351, Poland j.grekow@pb.edu.pl Abstract. This paper presents

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski

Music Mood Classification - an SVM based approach. Sebastian Napiorkowski Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.

More information

Outline. Why do we classify? Audio Classification

Outline. Why do we classify? Audio Classification Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify

More information

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed,

VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS. O. Javed, S. Khan, Z. Rasheed, M.Shah. {ojaved, khan, zrasheed, VISUAL CONTENT BASED SEGMENTATION OF TALK & GAME SHOWS O. Javed, S. Khan, Z. Rasheed, M.Shah {ojaved, khan, zrasheed, shah}@cs.ucf.edu Computer Vision Lab School of Electrical Engineering and Computer

More information

Adaptive Key Frame Selection for Efficient Video Coding

Adaptive Key Frame Selection for Efficient Video Coding Adaptive Key Frame Selection for Efficient Video Coding Jaebum Jun, Sunyoung Lee, Zanming He, Myungjung Lee, and Euee S. Jang Digital Media Lab., Hanyang University 17 Haengdang-dong, Seongdong-gu, Seoul,

More information

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding

Free Viewpoint Switching in Multi-view Video Streaming Using. Wyner-Ziv Video Coding Free Viewpoint Switching in Multi-view Video Streaming Using Wyner-Ziv Video Coding Xun Guo 1,, Yan Lu 2, Feng Wu 2, Wen Gao 1, 3, Shipeng Li 2 1 School of Computer Sciences, Harbin Institute of Technology,

More information

A Bootstrap Method for Training an Accurate Audio Segmenter

A Bootstrap Method for Training an Accurate Audio Segmenter A Bootstrap Method for Training an Accurate Audio Segmenter Ning Hu and Roger B. Dannenberg Computer Science Department Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 1513 {ninghu,rbd}@cs.cmu.edu

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Effects of acoustic degradations on cover song recognition

Effects of acoustic degradations on cover song recognition Signal Processing in Acoustics: Paper 68 Effects of acoustic degradations on cover song recognition Julien Osmalskyj (a), Jean-Jacques Embrechts (b) (a) University of Liège, Belgium, josmalsky@ulg.ac.be

More information

Pattern Based Melody Matching Approach to Music Information Retrieval

Pattern Based Melody Matching Approach to Music Information Retrieval Pattern Based Melody Matching Approach to Music Information Retrieval 1 D.Vikram and 2 M.Shashi 1,2 Department of CSSE, College of Engineering, Andhra University, India 1 daravikram@yahoo.co.in, 2 smogalla2000@yahoo.com

More information

NEW QUERY-BY-HUMMING MUSIC RETRIEVAL SYSTEM CONCEPTION AND EVALUATION BASED ON A QUERY NATURE STUDY

NEW QUERY-BY-HUMMING MUSIC RETRIEVAL SYSTEM CONCEPTION AND EVALUATION BASED ON A QUERY NATURE STUDY Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-), Limerick, Ireland, December 6-8,2 NEW QUERY-BY-HUMMING MUSIC RETRIEVAL SYSTEM CONCEPTION AND EVALUATION BASED ON A QUERY NATURE

More information

Enhancing Music Maps

Enhancing Music Maps Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing

More information

Audio Structure Analysis

Audio Structure Analysis Lecture Music Processing Audio Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Music Structure Analysis Music segmentation pitch content

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers

Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Bilbo-Val: Automatic Identification of Bibliographical Zone in Papers Amal Htait, Sebastien Fournier and Patrice Bellot Aix Marseille University, CNRS, ENSAM, University of Toulon, LSIS UMR 7296,13397,

More information

Improving Performance in Neural Networks Using a Boosting Algorithm

Improving Performance in Neural Networks Using a Boosting Algorithm - Improving Performance in Neural Networks Using a Boosting Algorithm Harris Drucker AT&T Bell Laboratories Holmdel, NJ 07733 Robert Schapire AT&T Bell Laboratories Murray Hill, NJ 07974 Patrice Simard

More information

IMPROVED MELODIC SEQUENCE MATCHING FOR QUERY BASED SEARCHING IN INDIAN CLASSICAL MUSIC

IMPROVED MELODIC SEQUENCE MATCHING FOR QUERY BASED SEARCHING IN INDIAN CLASSICAL MUSIC IMPROVED MELODIC SEQUENCE MATCHING FOR QUERY BASED SEARCHING IN INDIAN CLASSICAL MUSIC Ashwin Lele #, Saurabh Pinjani #, Kaustuv Kanti Ganguli, and Preeti Rao Department of Electrical Engineering, Indian

More information

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004

Story Tracking in Video News Broadcasts. Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Story Tracking in Video News Broadcasts Ph.D. Dissertation Jedrzej Miadowicz June 4, 2004 Acknowledgements Motivation Modern world is awash in information Coming from multiple sources Around the clock

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT CSVT -02-05-09 1 Color Quantization of Compressed Video Sequences Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 Abstract This paper presents a novel color quantization algorithm for compressed video

More information

Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases *

Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 31, 821-838 (2015) Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases * Department of Electronic Engineering National Taipei

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information