Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis
|
|
- Miranda Griffith
- 6 years ago
- Views:
Transcription
1 Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu Shutao Sun Weiyao Xue Abstract Automatic extraction of popular music ringtones have become an important and useful area for communications and telecommunications industry. Quick and batch extraction of music ringtones increases the convenience in practical application. In this paper, we propose an automatic technology to extract the ringtones from popular music based on the musical structural analysis. This is a meaning attempt to use the theory of musical structural analysis to solve a practical problem. Experiments show the feasibility of the process, several sets of comparative experiments reflect that various stages of threshold may have different effects. The experiments also reflected some problems which inspired us to optimize the processing. On our testing database composed of 186 popular songs, the best accuracy of boundary detection with tolerance ±3 seconds achieves up to 79.9%. We also try to invite strangers to evaluate the result by using voting mechanism. Keywords SVM; boundary detection; random forest; automatic extraction I. INTRODUCTION It is known that automatic music segmentation is very significant in many fields. Segments correspond to structurally meaningful regions of the performance, such as verse or chorus [1].In recent years, a lot of people devoted to the study of automatic segmentation technique, also, many people try to extract the chorus of a music. However, ringtones extraction has not be paid enough attention. This page proposes a framework to extract ringtones from music automatically by analyzing the structure of popular music. With the development of the communication business, mobile phones have become an indispensable part of people s life. Ringtones and ring-music bring more fun and pleasure when people make calls. However, in most cases, ringtone edit remains labor-intensive work, people need to listen each song, and set starting and ending points for a clip within the audio file, then cut the segment. Manual checking of each song and cropping specific parts of a song with proper tools are needed in this process, which could be highly time-consuming and waste of human resources. Quick and batch extraction of music becomes urgently needed. In this paper, we propose a simple method for extracting music ringtones by using musical structural analysis and random forest classification. Although the ringtones are selected according to individual preferences, we studied a large number of music ringtones provided on the major sites, and then found that most of them follows the rules as : 1)ringtones are segments which are frequently repeated segments of a song; 2)The ringtone has not been strictly defined as a intro or a chorus of a song. People tend to choose parts that are easy to remember to be a ringtone. 3)The ringtone has a strong melody characteristics. The segments are popular and catchy. 4)The ringtones extracted from the same types or composers of songs seem have similar melody and structure. 5)Most ringtones last 45~60s. The starting point of the ringtone is considered as the starting point of a sentence, not an unexpected phrase. According to the proposals mentioned above, we propose a new framework to extract ringtones from music automatically by analyzing the structure of popular music. We try to find the boundary between the starting point and ending point of a ringtone, and verify the feasibility of the method by experiments. The rest of the paper is organized as follows: Section 2 describes the related work. Section3 presents the method. Evaluation is done in Section4. Finally, conclusion and perspectives are discussed in section 5. II. RELATED WORK The structure of popular music is usually composed of intro, verse, chorus, bridge and outro,a total of five parts. As we know that an important part of music analysis is the detection of those structure[2], in most cases, we use the structure analysis to detect the boundary of music. Chorus as an important part of music which contains the memory points of the song, can be most likely to be a ringtone. Therefore, musical structural analysis and chorus extraction become core topics of ringtone extraction. The problem to extract chorus has been addressed previously by Logan[3] and Chu[4]. They focused on the using of Hidden Markov Models and clustering /16/$31.00 copyright 2016 IEEE ICIS 2016, June 26-29, 2016, Okayama, Japan
2 techniques on mel-frequency cepstral coefficients (MFCC),and built a set of spectral features that have been used with great success for applications in speech processing[5]. In 2003, Chai Wei [6~8] through analyzing the hierarchy structure of music signals, the paper puts forward the algorithm of using the results of structural analysis to carry out music digest and chorus extraction. Chen[9] built up a music summarization system based on the structure label system that spot out the main theme segment. L. Regnier [10] shows that partial clustering is a promising approach for singing voice detection and separation. However, as we mentioned in Section 1, ringtone extraction has more flexible requirements than the chorus extraction. We need to extract the ringtone according to the habit/genre/singer, not only do the extraction of chorus. In this paper, we propose a framework to extract ringtones from music automatically by using the theory of musical structural analysis and machine learning algorithm. III. SYSTEM DESIGN OF RINGTONE EXTRACTION The system of ringtone extraction from popular music is shown in Figure 1, in which the processing steps can be grouped into 3 steps. Fig. 1. The system framework of singing voice detection A. System Description There are three major steps of the extraction process: firstly, beat tracking and feature extraction. Secondly, segment boundary detection and ringtone extraction. The boundary detection finds the points from the music to cut a song into segments which the possible ringtone may in it. Finally, smooth filtering and choose the suitable segments to be ringtones. Fig.1 shows the proposed system framework of popular music ringtone extraction. We use numbers to mark the three parts of the process in the picture. Step one and segment Boundary Detection are introduced in detail in my other article[11], this paper focuses on the process of extracting the ringtone segments, which is the second and third step. Importantly, the system tries to find the starting point of singing sentence instead of singing words, because ringtone is always starting from the beginning of a complete lyrics. Therefore, both training sample and testing sample for this experiment need to be divided into fragments based on the result of beat tracking, cause we consider that a sentence is not begin in a beat. After getting the result of beat tracking, we extract the features of each beat in MFCC and Chroma features. B. Feature Extraction and Segment Boundary Detection We detailed the process of feature extraction and segment boundary detection in my another paper [11]. In our system, we choose to use Simon Dixon s beat tracker BeatRoot [8] to extract the beat onsets from the songs. The beats have been extracted from Hz files with the duration generally ranged from 450 to 500ms. We used a beat as a unit to note the result, In our experiment, we used two kinds of features which are commonly used in speech recognition area: MFCC (Mel Frequency Cepstrum Coefficients) and Chroma and their first derivative and second derivative. Both of them were introduced in my article [11]. The dimensions of MFCC are 36 and of Chroma are 33. In this paper, we choose SVM to do the segment boundary detection. The kernel we choose to use is Radial Basis Function (RBF) kernel as follow: K(v 1,v 2 )= exp (- v 1 -v 2 2 ) (1) v 1,v 2 are feature vectors extracted by the method mentioned in paper[11]. As we all know, penalty parameter C has a significant impact on the result of the SVM. For the penalty parameter C, we do a cross validation on each training set to find the optimal value. The rule of segment boundary detection is to attach a label named singing voice or music to each beat. After doing the annotation, we need to use a simple filtering to pick out some novelty points. we regards the single beat or two beats that are both different from the prior beat and follow beat as novelty peaks, cause a beat lasts about 0.5 sec may too short to be discovered by listening. A single different beat can be regard as a wrong result of boundary detection. By filtering out of some novelty peaks, we get the segment boundary point set of the song.
3 C. Ringtone Extraction and Random Forest Classification We got the boundary points set by the method mentioned above, the next step we need is to choose the valuable segments which most likely to be ringtones. In summary of the rules mentioned in Section1, there are two kinds of segments can be considered as a ringtone. One typical type is the chorus of a song, the other type is the intro whose duration is longer than 30 seconds. In our experiment, we proposed to use random forest classification to choose the chorus segments and use screening and smooth filtering to choose the qualified intro. Smooth filtering can also fix the suitable positive starting point of a ringtone. Random forest is used in a random way to build a forest, To understand and use the various options, further information about how they are computed is useful. Most of the options depend on two data objects generated by random forests. When the training set for the current tree is drawn by sampling with replacement, about one-third of the cases are left out of the sample. This out-of-bag data is used to get a running unbiased estimate of the classification error as trees are added to the forest. After each tree is built, all of the data are run down the tree, and proximities are computed for each pair of cases. If two cases occupy the same terminal node, their proximity is increased by one. At the end of the run, the proximities are normalized by dividing by the number of trees. Proximities are used in replacing missing data, locating outliers, and producing illuminating low-dimensional views of the data. We used the result of the segment boundary detection as the input of the random forest algorithm. The boundary detection cut the music into segments, and then we extracted the mean MFCC and Chroma features of each segment. We use the clips of popular ringtones downloaded from Internet to build a training set, and test all segments of a test song to choose the valuable segments. The results of the experiment will be introduced in Section 4. D. Smooth and Correction As we mentioned in Section 1, the starting point of a ringtone should be a start of a sentence, not between two phrase. We scan the label of each beat of the selected segment to be the first no-singing voice label to be the precise starting point of a segment, after doing this, we can fix the starting point of a ringtone extraction. In some special occasions, intro is a long time melody, and can also be considered as a ringtone. We judge by using the rule as follow: if the first segment of a song is more than 30 seconds and contains almost beats labeled as no-singing voice,we consider it as a ringtone. A smooth and correction is indeed, the suitable starting point of a ringtone for the user experience is important. Extract more consumer satisfied ringtone segment from one song can also enhance the user experience. IV. EVALUATION AND RESULT All experiments are based on the TUT standard annotation collection of music data. It contains186 songs of the Beatles. We choose 140 of them to build the training set, and rest of them to test the result. The SVM training set contains 100 pieces of popular music with duration ranging from 20 to 30 sec. Half of them are singing voice mixed with musical instruments, and the others are music without singing voice. We regard mute pieces as no-singing voice, the mute section we meet while doing the scan of the beat label is recognized as no-singing voice. The random forest training set contains 140 corresponding clips ringtone of the Beatles which are downloaded from Internet. The SVM testing set contains 50 pieces of popular music with duration ranging from 20 to 30sec, 25 of them are singing voice and other 25 are no-singing voice. It also contains 20 songs which are chosen from the TUT collection with standard annotation. In order to select the appropriate parameters, all the experimental data were cross validated. Since the suitability of music ringtone extraction is based on user preferences, the accuracy can not be reflected intuitive. In the experiments of this paper, we did several groups comparative experiments, and use more realistic calculation method to reflect the accuracy of the extraction. A. Experiment of Segment Boundary Detection We test 50 popular music (25singing voice pieces and 25 no-singing voice pieces)which all have a duration about 30sec. We use the SVM to classify each beat to note the label as 0(nosinging voice) or 1(singing voice), and then compare with the standard annotation to calculate the precision. This experiment shows the directly result of the classification. TABLE I. Method THE AVERAGE PRECISION OF THE SEGMENT BOUNDARY DETECTION SVM classification with pieces SVM classification with songs Without Filtering Filtering As we can see from the table, Tab.1 shows the result of the segment boundary detection. The results shows that SVM classification can provide a reasonable singing voice boundary detection. However, the threshold of filtering method may powerfully effect the accuracy of the segment boundary detection. B. Ringtone Extraction from Segment Boundary set Previously mentioned, chorus extraction can be most possible to be a ringtone. In this experiment, we consider the chorus extraction accuracy as the ringtone extraction accuracy, all the results compared with the standard annotation. We choose two group data sets to be the testing sets. One is the result set of SVM classification with a simple filtering as mentioned above, the other set is the standard annotation of the segment boundary detection. The standard annotation set can reflect a more accurate efficiency of random forest classification. Due to the ringtone is set according to the personal preference, when the coverage rate which compared with the ringtone download from Internet is higher than 80%, we consider the result are correct. As mentioned above, we used MFCC and Chroma features to do comparative experiments. The features we did experiments included:
4 MFCC(36 dimensions), Chroma(39), MFCC & Chroma 25. interviewed their feelings, 84% of them considered the ringtones which doing the screen are more suitable. TABLE II. THE AVERAGE PRECISION OF 30SONGS DOWNLOAD FROM INTERNET WITH SMOOTH FILTERING Method MFCC Chroma MFCC &Chroma SVM Result Set Standard Annotation Set Fig. 2. The RESULTS OF THREE KINDS OF FEATURES Table2 and Fig.2 show the results of the several comparative experiments. The row of the table shows the precision of three kinds of features, and the column shows the different results of SVM result set and standard annotation set. These experiments verified the feasibility of the random forest algorithm to extract the ringtone from music. These experiment shows that Chroma features has a better effect to extract music ringtone than MFCC. However, the combination of two kinds of features did not enhance the effect.also, the result showed the precision of the boundary detection by SVM affect a lot on the music ringtone extraction. The segment boundary got from standard annotation set showed the objective results of random forest algorithm to choose the right segment. C. Smooth and Screen the segment The starting point of segment boundary which generated by SVM with filtering may not be appropriate to be a ringtone, cause the threshold of filtering may effected the result of the segment. In this experiment, we consider that a no-singing voice beat can be more possible be the starting of a sentence. We screen the label of each beat of the selected segment to find the first no-singing voice label to be the precise starting point of a segment. Also, the intro which lasts more than 30s can also be considered as a ringtone. The results of this experiment are difficult to be reflected with data, we randomly found 50 people to listen to these extracted ringtones, and then V. CONCLUSION Ringtone extraction is a wide range of business needs. This paper proposed an automatic framework of music ringtone extraction,by using musical structural analysis and machine learning. Experiments of this paper not only show the feasibility of the process: We designed a set of rules for automatic extraction of music ringtone, and verified the feasibility through experiments. We compared three kinds of audio feature extraction effect, and obtained that Chroma feature have a more stable effect in this process. We used a more objective standard annotation set to verify the feasibility of the random forest algorithm to extract the music ringtone. We proposed a method to find the start point of a music ringtone and verified the effect the method of user research. This is a meaning attempt to use the theory of musical structural analysis to solve a practical problem. The system is the base for further research. B ut also revealed some problems: All training set and testing set comes from the same author, the accuracy of random forest for songs from different author or genre may not be stable. The result of machine learning may depend on the selection of training set, cross-validation and training data selection may be key factors to the classification. Future work will be directed towards improving the classification accuracy of each machine learning stage, and try to use the speech signal related knowledge to fix the starting point of a ringtone. ACKNOWLEDGMENT (Heading 5) The paper is supported by the Chinese Music Audience Automatic Classification of Music Technology Innovation Program of the Ministry of Culture (WHB201520). REFERENCES [1] Brian McFee,Daniel P.W.Ellis LEARNING TO SEGMENT SONGS WITH ORDINAL LINEAR DISCRIMINANT ANALYSIS, 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP).. [2] Heng-Tze Cheng, Yi-Hsuan Yang, Yu-Ching Lin, and Homer H. Chen Music using audio and textual information,ieee [3] Mark.A.Bartsch,Gregory H.Wakefield, To Catch A Chorus:Using Chroma-Based Representations for Audio Thumbnailing,IEEE Workshop on Applications of Signal Processing to Audio and Acoustics [4] B.Logan and S.Chu, Music summarization using key phrases,in International Conference on Acoustics, Speech and Singal Processing, 2000.
5 [5] L.R.Rabiner and B.H.Juang,Fundamentals of Speech Recognition,Prentice-Hall [6] Chai Wei,Vercoe Barry. Music Thumbanailing via Stuctural Analysis. Proceedings of ACM Multimedia Conference.2003 [7] Chai Wei, Vercoe Barry.Structual Analysis of Music Singals for Indexing and Thumbnailing.Proceedings of ACM/IEEE Joint Conference on Digital Libraries,2003. [8] Chai We,Structual Analysis of Musical Singals Via Patten Matching.Proceedings of IEEE Intrenational Conference on Acoustics,Speech,and Singal Processing,2003 [9] Chen Yanliang,Music Structural Analysis and Application.U.D.C:681.3 [10] Namunu C.Maddage,Automatic Structure Detection for Popular Music.Institute for Infocomm Research. [11] D. Dimitriadis, P. Maragos, and A. Potamianos, Robust am-fm features for speech recognition, IEEE Signal Process. Lett., vol. 12, pp , 2005 [12] Wu Fengyan, Singing Voice Detection of Popular Music Using Beat Tracking and SVM Classification,International Conference on Computer and Information Science (ICIS 2015) [13]. Reliable onset detection scheme for singing voices based on enhanced difference filtering and combined features,wireless Communications & Signal Processing, WCSP 2009 [14] SHI ZI-qiang,Li Hai-feng,SUN Jia-yin.Vocal discrimination in pop music based on SVM.Computer Engineering and Applications,2008,44(25): [15] S. Dixon. Automatic extraction of tempo and beat from expressive performances.journal of New Music Research, 30(1):39 58, 2001
INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION
INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for
More informationA QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM
A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr
More informationA Music Retrieval System Using Melody and Lyric
202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent
More informationOn Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices
On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices Yasunori Ohishi 1 Masataka Goto 3 Katunobu Itou 2 Kazuya Takeda 1 1 Graduate School of Information Science, Nagoya University,
More informationTopics in Computer Music Instrument Identification. Ioanna Karydi
Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches
More informationMUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 9.1: Genre Classification alexander lerch November 4, 2015 temporal analysis overview text book Chapter 8: Musical Genre, Similarity, and Mood (pp. 151 155)
More informationAutomatic Rhythmic Notation from Single Voice Audio Sources
Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung
More informationWHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?
WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG? NICHOLAS BORG AND GEORGE HOKKANEN Abstract. The possibility of a hit song prediction algorithm is both academically interesting and industry motivated.
More informationImproving Frame Based Automatic Laughter Detection
Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for
More informationSupervised Learning in Genre Classification
Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music
More informationAutomatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting
Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced
More informationA repetition-based framework for lyric alignment in popular songs
A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine
More informationContent-based Music Structure Analysis with Applications to Music Semantics Understanding
Content-based Music Structure Analysis with Applications to Music Semantics Understanding Namunu C Maddage,, Changsheng Xu, Mohan S Kankanhalli, Xi Shao, Institute for Infocomm Research Heng Mui Keng Terrace
More informationMethods for the automatic structural analysis of music. Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010
1 Methods for the automatic structural analysis of music Jordan B. L. Smith CIRMMT Workshop on Structural Analysis of Music 26 March 2010 2 The problem Going from sound to structure 2 The problem Going
More informationSinger Traits Identification using Deep Neural Network
Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic
More informationInstrument Recognition in Polyphonic Mixtures Using Spectral Envelopes
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu
More informationInternational Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL
More informationMusic Radar: A Web-based Query by Humming System
Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,
More informationTOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC
TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu
More informationBi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset
Bi-Modal Music Emotion Recognition: Novel Lyrical Features and Dataset Ricardo Malheiro, Renato Panda, Paulo Gomes, Rui Paiva CISUC Centre for Informatics and Systems of the University of Coimbra {rsmal,
More informationMODELS of music begin with a representation of the
602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and
More informationSubjective Similarity of Music: Data Collection for Individuality Analysis
Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp
More informationAutomatic Piano Music Transcription
Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening
More informationMusic structure information is
Feature Article Automatic Structure Detection for Popular Music Our proposed approach detects music structures by looking at beatspace segmentation, chords, singing-voice boundaries, and melody- and content-based
More informationGRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM
19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui
More informationDrum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods
Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National
More informationAudio-Based Video Editing with Two-Channel Microphone
Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science
More informationTHE importance of music content analysis for musical
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With
More informationMusic Recommendation from Song Sets
Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia
More informationDAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval
DAY 1 Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval Jay LeBoeuf Imagine Research jay{at}imagine-research.com Rebecca
More informationAudio Structure Analysis
Lecture Music Processing Audio Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Music Structure Analysis Music segmentation pitch content
More informationMUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES
MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate
More informationAutomatic Summarization of Music Videos
Automatic Summarization of Music Videos XI SHAO, CHANGSHENG XU, NAMUNU C. MADDAGE, and QI TIAN Institute for Infocomm Research, Singapore MOHAN S. KANKANHALLI School of Computing, National University of
More informationMood Tracking of Radio Station Broadcasts
Mood Tracking of Radio Station Broadcasts Jacek Grekow Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, Bialystok 15-351, Poland j.grekow@pb.edu.pl Abstract. This paper presents
More informationMelody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng
Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng Introduction In this project we were interested in extracting the melody from generic audio files. Due to the
More informationAutomatic Identification of Instrument Type in Music Signal using Wavelet and MFCC
Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC Arijit Ghosal, Rudrasis Chakraborty, Bibhas Chandra Dhara +, and Sanjoy Kumar Saha! * CSE Dept., Institute of Technology
More informationRepeating Pattern Extraction Technique(REPET);A method for music/voice separation.
Repeating Pattern Extraction Technique(REPET);A method for music/voice separation. Wakchaure Amol Jalindar 1, Mulajkar R.M. 2, Dhede V.M. 3, Kote S.V. 4 1 Student,M.E(Signal Processing), JCOE Kuran, Maharashtra,India
More informationAudio Structure Analysis
Advanced Course Computer Science Music Processing Summer Term 2009 Meinard Müller Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Music Structure Analysis Music segmentation pitch content
More informationToward Multi-Modal Music Emotion Classification
Toward Multi-Modal Music Emotion Classification Yi-Hsuan Yang 1, Yu-Ching Lin 1, Heng-Tze Cheng 1, I-Bin Liao 2, Yeh-Chin Ho 2, and Homer H. Chen 1 1 National Taiwan University 2 Telecommunication Laboratories,
More informationComposer Identification of Digital Audio Modeling Content Specific Features Through Markov Models
Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models Aric Bartle (abartle@stanford.edu) December 14, 2012 1 Background The field of composer recognition has
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional
More informationTempo and Beat Analysis
Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:
More informationAUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION
AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate
More informationMelody Retrieval On The Web
Melody Retrieval On The Web Thesis proposal for the degree of Master of Science at the Massachusetts Institute of Technology M.I.T Media Laboratory Fall 2000 Thesis supervisor: Barry Vercoe Professor,
More informationChord Classification of an Audio Signal using Artificial Neural Network
Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationAcoustic Scene Classification
Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of
More informationRepeating Pattern Discovery and Structure Analysis from Acoustic Music Data
Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data Lie Lu, Muyuan Wang 2, Hong-Jiang Zhang Microsoft Research Asia Beijing, P.R. China, 8 {llu, hjzhang}@microsoft.com 2 Department
More informationWeek 14 Music Understanding and Classification
Week 14 Music Understanding and Classification Roger B. Dannenberg Professor of Computer Science, Music & Art Overview n Music Style Classification n What s a classifier? n Naïve Bayesian Classifiers n
More informationShades of Music. Projektarbeit
Shades of Music Projektarbeit Tim Langer LFE Medieninformatik 28.07.2008 Betreuer: Dominikus Baur Verantwortlicher Hochschullehrer: Prof. Dr. Andreas Butz LMU Department of Media Informatics Projektarbeit
More informationMOVIES constitute a large sector of the entertainment
1618 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 11, NOVEMBER 2008 Audio-Assisted Movie Dialogue Detection Margarita Kotti, Dimitrios Ververidis, Georgios Evangelopoulos,
More informationHidden Markov Model based dance recognition
Hidden Markov Model based dance recognition Dragutin Hrenek, Nenad Mikša, Robert Perica, Pavle Prentašić and Boris Trubić University of Zagreb, Faculty of Electrical Engineering and Computing Unska 3,
More informationMusic Emotion Recognition. Jaesung Lee. Chung-Ang University
Music Emotion Recognition Jaesung Lee Chung-Ang University Introduction Searching Music in Music Information Retrieval Some information about target music is available Query by Text: Title, Artist, or
More informationMusic Genre Classification and Variance Comparison on Number of Genres
Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques
More informationStatistical Modeling and Retrieval of Polyphonic Music
Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,
More informationSinging voice synthesis based on deep neural networks
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
More informationNeural Network for Music Instrument Identi cation
Neural Network for Music Instrument Identi cation Zhiwen Zhang(MSE), Hanze Tu(CCRMA), Yuan Li(CCRMA) SUN ID: zhiwen, hanze, yuanli92 Abstract - In the context of music, instrument identi cation would contribute
More informationSinger Identification
Singer Identification Bertrand SCHERRER McGill University March 15, 2007 Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 1 / 27 Outline 1 Introduction Applications Challenges
More informationEXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION
EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric
More informationDetecting Musical Key with Supervised Learning
Detecting Musical Key with Supervised Learning Robert Mahieu Department of Electrical Engineering Stanford University rmahieu@stanford.edu Abstract This paper proposes and tests performance of two different
More informationRecognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval
Recognition and Summarization of Chord Progressions and Their Application to Music Information Retrieval Yi Yu, Roger Zimmermann, Ye Wang School of Computing National University of Singapore Singapore
More informationAn Examination of Foote s Self-Similarity Method
WINTER 2001 MUS 220D Units: 4 An Examination of Foote s Self-Similarity Method Unjung Nam The study is based on my dissertation proposal. Its purpose is to improve my understanding of the feature extractors
More informationBrowsing News and Talk Video on a Consumer Electronics Platform Using Face Detection
Browsing News and Talk Video on a Consumer Electronics Platform Using Face Detection Kadir A. Peker, Ajay Divakaran, Tom Lanning Mitsubishi Electric Research Laboratories, Cambridge, MA, USA {peker,ajayd,}@merl.com
More informationSinger Recognition and Modeling Singer Error
Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing
More informationMusic Database Retrieval Based on Spectral Similarity
Music Database Retrieval Based on Spectral Similarity Cheng Yang Department of Computer Science Stanford University yangc@cs.stanford.edu Abstract We present an efficient algorithm to retrieve similar
More informationThe song remains the same: identifying versions of the same piece using tonal descriptors
The song remains the same: identifying versions of the same piece using tonal descriptors Emilia Gómez Music Technology Group, Universitat Pompeu Fabra Ocata, 83, Barcelona emilia.gomez@iua.upf.edu Abstract
More informationCS 591 S1 Computational Audio
4/29/7 CS 59 S Computational Audio Wayne Snyder Computer Science Department Boston University Today: Comparing Musical Signals: Cross- and Autocorrelations of Spectral Data for Structure Analysis Segmentation
More informationhit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.
CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating
More informationAutomatic Music Clustering using Audio Attributes
Automatic Music Clustering using Audio Attributes Abhishek Sen BTech (Electronics) Veermata Jijabai Technological Institute (VJTI), Mumbai, India abhishekpsen@gmail.com Abstract Music brings people together,
More informationComputational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)
Computational Models of Music Similarity 1 Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST) Abstract The perceived similarity of two pieces of music is multi-dimensional,
More informationMusic Segmentation Using Markov Chain Methods
Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some
More informationUC San Diego UC San Diego Previously Published Works
UC San Diego UC San Diego Previously Published Works Title Classification of MPEG-2 Transport Stream Packet Loss Visibility Permalink https://escholarship.org/uc/item/9wk791h Authors Shin, J Cosman, P
More informationMusic Mood Classification - an SVM based approach. Sebastian Napiorkowski
Music Mood Classification - an SVM based approach Sebastian Napiorkowski Topics on Computer Music (Seminar Report) HPAC - RWTH - SS2015 Contents 1. Motivation 2. Quantification and Definition of Mood 3.
More informationLAUGHTER serves as an expressive social signal in human
Audio-Facial Laughter Detection in Naturalistic Dyadic Conversations Bekir Berker Turker, Yucel Yemez, Metin Sezgin, Engin Erzin 1 Abstract We address the problem of continuous laughter detection over
More informationPhone-based Plosive Detection
Phone-based Plosive Detection 1 Andreas Madsack, Grzegorz Dogil, Stefan Uhlich, Yugu Zeng and Bin Yang Abstract We compare two segmentation approaches to plosive detection: One aproach is using a uniform
More informationA Computational Model for Discriminating Music Performers
A Computational Model for Discriminating Music Performers Efstathios Stamatatos Austrian Research Institute for Artificial Intelligence Schottengasse 3, A-1010 Vienna stathis@ai.univie.ac.at Abstract In
More informationMusic Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)
Advanced Course Computer Science Music Processing Summer Term 2010 Music ata Meinard Müller Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Music Synchronization Music ata Various interpretations
More informationSpeech Recognition and Signal Processing for Broadcast News Transcription
2.2.1 Speech Recognition and Signal Processing for Broadcast News Transcription Continued research and development of a broadcast news speech transcription system has been promoted. Universities and researchers
More informationComputational Modelling of Harmony
Computational Modelling of Harmony Simon Dixon Centre for Digital Music, Queen Mary University of London, Mile End Rd, London E1 4NS, UK simon.dixon@elec.qmul.ac.uk http://www.elec.qmul.ac.uk/people/simond
More informationOutline. Why do we classify? Audio Classification
Outline Introduction Music Information Retrieval Classification Process Steps Pitch Histograms Multiple Pitch Detection Algorithm Musical Genre Classification Implementation Future Work Why do we classify
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,
More informationMusic Mood. Sheng Xu, Albert Peyton, Ryan Bhular
Music Mood Sheng Xu, Albert Peyton, Ryan Bhular What is Music Mood A psychological & musical topic Human emotions conveyed in music can be comprehended from two aspects: Lyrics Music Factors that affect
More informationComputer Coordination With Popular Music: A New Research Agenda 1
Computer Coordination With Popular Music: A New Research Agenda 1 Roger B. Dannenberg roger.dannenberg@cs.cmu.edu http://www.cs.cmu.edu/~rbd School of Computer Science Carnegie Mellon University Pittsburgh,
More informationDetection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting
Detection of Panoramic Takes in Soccer Videos Using Phase Correlation and Boosting Luiz G. L. B. M. de Vasconcelos Research & Development Department Globo TV Network Email: luiz.vasconcelos@tvglobo.com.br
More informationWeek 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University
Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based
More informationMusic Structure Analysis
Lecture Music Processing Music Structure Analysis Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals
More informationRecognising Cello Performers Using Timbre Models
Recognising Cello Performers Using Timbre Models Magdalena Chudy and Simon Dixon Abstract In this paper, we compare timbre features of various cello performers playing the same instrument in solo cello
More informationSINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION
th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang
More informationData Driven Music Understanding
Data Driven Music Understanding Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Engineering, Columbia University, NY USA http://labrosa.ee.columbia.edu/ 1. Motivation:
More informationDetection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1
International Conference on Applied Science and Engineering Innovation (ASEI 2015) Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1 1 China Satellite Maritime
More informationMusic Genre Classification
Music Genre Classification chunya25 Fall 2017 1 Introduction A genre is defined as a category of artistic composition, characterized by similarities in form, style, or subject matter. [1] Some researchers
More informationInteracting with a Virtual Conductor
Interacting with a Virtual Conductor Pieter Bos, Dennis Reidsma, Zsófia Ruttkay, Anton Nijholt HMI, Dept. of CS, University of Twente, PO Box 217, 7500AE Enschede, The Netherlands anijholt@ewi.utwente.nl
More informationVoice & Music Pattern Extraction: A Review
Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation
More informationHUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL
12th International Society for Music Information Retrieval Conference (ISMIR 211) HUMMING METHOD FOR CONTENT-BASED MUSIC INFORMATION RETRIEVAL Cristina de la Bandera, Ana M. Barbancho, Lorenzo J. Tardón,
More informationDETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION
DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories
More informationAnalytic Comparison of Audio Feature Sets using Self-Organising Maps
Analytic Comparison of Audio Feature Sets using Self-Organising Maps Rudolf Mayer, Jakob Frank, Andreas Rauber Institute of Software Technology and Interactive Systems Vienna University of Technology,
More informationEnhancing Music Maps
Enhancing Music Maps Jakob Frank Vienna University of Technology, Vienna, Austria http://www.ifs.tuwien.ac.at/mir frank@ifs.tuwien.ac.at Abstract. Private as well as commercial music collections keep growing
More informationResearch & Development. White Paper WHP 232. A Large Scale Experiment for Mood-based Classification of TV Programmes BRITISH BROADCASTING CORPORATION
Research & Development White Paper WHP 232 September 2012 A Large Scale Experiment for Mood-based Classification of TV Programmes Jana Eggink, Denise Bland BRITISH BROADCASTING CORPORATION White Paper
More informationPOST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS
POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music
More informationAutomatic Labelling of tabla signals
ISMIR 2003 Oct. 27th 30th 2003 Baltimore (USA) Automatic Labelling of tabla signals Olivier K. GILLET, Gaël RICHARD Introduction Exponential growth of available digital information need for Indexing and
More information