Singing Voice Conversion Using Posted Waveform Data on Music Social Media

Size: px
Start display at page:

Download "Singing Voice Conversion Using Posted Waveform Data on Music Social Media"

Transcription

1 Singing Voice Conversion Using Poste Waveform Data on Music Social Meia Koki Sena, Yukiya Hono, Kei Sawaa, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku an Keiichi Tokua Department of Computer Science an Engineering, Nagoya Institute of Technology, Nagoya, Japan {kksn924, hono, swkei, bonanza, uratec, nankaku, Tel: Abstract This paper proposes a metho of selecting training ata for many-to-one singing voice conversion (VC) from ata on the social meia music app nana. On this social meia app, users can share souns such as speaking, singing, an instrumental music recore by their smartphones. The number of hours of accumulate ata has exceee one million, an it is regare as big ata. It is wiely known that big ata can create huge values by avance eep learning technology. A lot of post ata of multiple users having sung the same song is containe in nana s atabase. This ata is consiere suitable training ata for VC. This is because VC frameworks base on statistical approaches often require parallel ata sets that consist of pairs of ata of source an target singers who sing the same phrases. The propose metho can compose parallel ata sets that can be use for many-to-one statistical VCs from nanas atabase by extracting frames that have small ifferences in the timing of utterances, base on the results of ynamic programming (DP) matching. Experimental results inicate that a system that uses training ata compose by our metho can convert acoustic more accurately than a system that oes not use the metho. I. INTRODUCTION Social meia has mae it possible for people all over the worl to transmit their information. There are many kins of social meia websites an apps, such as YouTube, Facebook, an Instagram, an a large amount of ata transmitte by users has been accumulating. This big ata has increasing potential in creating values for every fiel[1]. Recently, machine learning methos of ealing with big ata have been wiely researche in many institutes an laboratories. The Multi-Genre Broacast (MGB) Challenge an official challenge of the IEEE Automatic Speech Recognition an Unerstaning Workshop (ASRU) is one of the international workshops evaluating big ata technologies relate to the speech fiel[3]. The challenge at ASRU 2015 was an evaluation of transcription[6], [7], speaker iarization[4], [5], ialect etection, an lightly supervise alignment using approximately 1,600 hours of British Broacasting Corporation (BBC) television program ata that ha been recore. Commercialize proucts using eep learning technology alreay exist, such as the speech recognition system employe in Google Home, which was traine using tens of thousans of hours of speech ata[2]. The social meia app nana [8] stores big ata of music ata. This social meia app is esigne to allow users to share singing an instrumental souns easily using their smartphone. Up to more than one million hours of ata have been uploae, an the amount continues to increase. In nana, users can collaborate with other users uploae posts by overubbing other user s souns with their own souns. In particular, accompaniment posts of popular songs are collaborate on by many users. The relationship between collaborating an collaborate posts is represente by a tree structure. Because each tree generally consists of one song, the same song is sung in almost all singing posts of each tree. A atabase that contains a large number of posts in which the same songs are sung by multiple users has large potential. This ata can be use for a parallel ata set, the training ata of voice conversion (VC). VC is a metho of converting a speaker s voice into another kin of voice, especially another speaker s voice, while maintaining linguistic information. Statistical approaches have been wiely researche[9], [10]. The conventional statistical VC is often base on a Gaussian mixture moel (GMM) [11]} More recently, a VC framework base on eep neural networks (DNNs) has been propose[12], [13]. These statistical VC typically train statistical moels by using a parallel ata set that consists of pairs of speech ata from source an target speakers uttering the same sentences. Not every ata can be use when the nana s ata is use for VC training ata. For example, the atabase contains some partly sung ata. We propose a metho base on ynamic programming (DP) matching for composing training ata from the atabase. The target ata an the other ata in the same collaboration tree are compare by DP matching, then a parallel ata set is extracte. The rest of this paper is organize as follows. Section 2 an 3 escribe the social meia music app nana an voice conversion using nana s ata, respectively. Section 4 escribes the experimental conitions an experimental results. Section 5 presents concluing remarks an future work. II. SOCIAL MEDIA MUSIC APP NANA The social meia music app nana[8] was evelope by nana music, Inc. as a social music platform. The users can recor an uploa souns such as speaking, singing, an instrumental music to nana with their smartphones. Through the app, users worlwie can communicate with each other through music. As of April 2018, there are six million users in 113 countries APSIPA 1913 APSIPA-ASC 2018

2 Proceeings, APSIPA Annual Summit an Conference 2018 Guitar Collab Pf. Bass Ba. Dr. Drums Piano Vocal Fig. 4. A tree structure of an example collaboration relationship. Fig. 1. The recoring process. Fig. 2. Playing the soun of a post. Fig. 3. Showing all the collaborators. Pf. Accompaniment Collab More than 61 million posts have accumulate in the atabase. Furthermore, the number of posts is still increasing. The users uploa souns to nana accoring to the following proceure: 1) Recor souns. 2) A information about recore souns, such as title, artist s name, an explanation. 3) Choose soun effects, such as echo, to arrange souns. Fig. 1 shows the screen of the recoring process. Users can listen to uploae posts, as shown in Fig. 2, an the users get feeback such as Comment an Applause (function equivalent to Like ) from other users. In aition to these general functions, users can collaborate on posts. This characteristic function is calle Collab. The users can post their souns, which are overubbe on another user s post. The function has the following two main effects. First, multiple users can create one soun together. For example, multiple users singing is overubbe to create choruses, an instruments are overubbe to create ban performances Fig. 3 shows the relationship between collaborators on a post. On the screen shown in the figure, you can see all the posts in the Collab series of each post. Secon, users can easily post accompanie singing voices because they can use an accompaniment post of another user. They o not have to prepare an accompaniment soun source by themselves; all they nee to o is sing. Because many users have poste their singing in this way, the atabase hols many songs sung by multiple users. Popular songs are typically sung by tens of thousans of users. The posts using Collab have two types of ata. The first type is mixe soun source ata that all the posts in the Collab series are overubbe by. The users are able to listen to only this type of soun source. The other type is single source ata that consists of just a soun recore when posting. In most cases, each ata represents only one singing voice or one instrumental soun, although some of this type of ata has multiple souns; for example, singing with an instrument, such as guitar or piano, at the same time. All the posts uploae using Collab are relate to the collaborate post. This relationship is represente by a tree structure representing each post as a noe. When a post A Target Post Parallel Data Fig. 5. The posts that are regare as the same song because they have the same root noe post. exists an a post B collaborates with A, the post A becomes the parent noe an the post B becomes the chil noe. Fig. 4 shows an example of the tree structure. The post that collaborate first becomes the root noe an is accompaniment in most cases. In Fig. 4, the guitar post is the root noe. Generally, every song tree is compose of singing voices an instrumental souns relate to one song. In almost all singing voice posts of each tree, the same song is sung because they have been sung with the same accompaniment. Fig. 5 shows an example of singing voice posts regare as the same song in a tree. We focus on this tree structure to extract such singing voice posts sung by many users for many-to-one singing voice conversion. III. VOICE CONVERSION USING SINGING POST DATA Voice conversion (VC) is a metho of converting an input speaker s voice into various types of voices while keeping linguistic information unchange. This is mainly use for speaker conversion. A typical VC framework uses a statistical approach[9], [10]. In statistical VC, parallel ata sets, which consist of pairs of speech ata from source an target speakers uttering the same sentences, are use for training moels. One conventional statistical VC is base on a Gaussian mixture moel (GMM) [11] GMM-base VC represents the relationship between acoustic of a source an of a target speaker using linear combine multiple Gaussian istributions. A new approach base on eep neural networks (DNNs) has been propose[12], [13]. This can convert acoustic at a higher egree of precision than the GMM-base one. VC approaches are also istinguishe by the number of source 1914

3 Many-to-One VC Training Data Singer B Singer C Converter Singer Z (Statistical Moel) Source Target Fig. 6. Overview of many-to-one singing VC Training Source DTW Neural Network training Target Conversion Conversion by Neural Network Fig. 8. [The training ata extraction metho that we propose. Input Fig. 7. Overview of our VC system Converte an target speakers. In aition, other approaches exist, such as singing VC. Examples also inclue conversion of sexuality, age, etc. We employ DNN-base many-to-one singing VC in our system, which converts an arbitrary singer s voice into a particular singer s voice. In many-to-one singing VC, the input singing of an arbitrary singer (source singer) is converte into the singing of a particular singer (target singer), as shown in Fig. 6. Therefore, the parallel ata set has to consist of multiple source singers voices an one target singer s voice. Fig. 7 shows an overview of our VC system. In the training step, first, the acoustic are extracte from source an target ata. Then, time alignment between these feature sequences are obtaine by ynamic time wrapping (DTW)[14]. Finally, the neural network conversion moel is traine using time-aligne acoustic feature sequences. In the conversion step, acoustic extracte from input ata are converte by the traine moel frame-by-frame. Then, the output singing voice is synthesize using a vocoer from converte. We extracte the singing ata set of many users singing the same song from the nana s atabase an applie it to this VC system because it is suitable for a parallel ata set. Although it is generally ifficult to get intene ata from big ata, such singing voice ata is easily extracte from the atabase using tree structure representing collaboration relationships (Fig. 5). However, all of the extracte ata is not necessarily the same phrases, because users can recor an post arbitrary content. For instance, there are some posts in which a singer is harmonizing with another singer an others sing only the hook of a song. Hence, a metho is neee to remove unsuitable ata an create appropriate parallel ata sets. An approach employing ynamic programming (DP) match- Max istance Fig. 9. Max istance of matching pass from iagonal. Segments (not to be use) Segment (to be use) Fig. 10. Selection of training ata consiering segment length. ing to an extract parallel ata set has been propose on the assumption that any two ata of users singing the same song has a higher similarity than two other ata of users singing ifferent songs[15]. DP matching is a classical elastic matching metho, wiely applie to pattern recognition tasks such as speech recognition[14] an character recognition[16]. It can ynamically match each vector of two vector series that have ifferent lengths, an the result is calle a matching path. Then, the accumulation of Eucliean istances between matche vectors is calculate simultaneously at the en of matching. It inicates similarity between vector series. Therefore, it is possible to compare the similarities between two ranomly selecte post ata in the atabase that have ifferent lengths. In the conventional metho, first, a target post is ecie. Then, all the singing voice posts in the same tree are compare with the target post by DP matching, an the posts that have a small value of the accumulation of istances are extracte as source ata of a parallel ata set. However, in this metho, most singers of the selecte source ata woul be similar to the target singer because the accumulation of istances epens on the similarity between singers voices as well as what song was sung. In many-to-one singing VC, various types of voice ata shoul be use for source training 1915

4 ata to convert arbitrary singers voices. Our metho uses matching paths instea of the accumulation of istances. In our metho, the ifferences in utterance timing between matche frames is calculate from matching paths, an the pairs of frames that have a smaller value of calculation results than threshol are extracte to be use for training base on the hypothesis that the posts that were sung with the same accompaniment have small ifferences in the utterance timing of each phrase. When satisfying the conitions of this metho, the matching path is close to iagonal, as shown in Fig. 8. We call every part of the matching path extracte with this metho a segment. We expect to use our metho to remove unsuitable ata an create parallel ata sets that consists of various types of voices. In this metho, two parameters have to be set. The first is the maximum value of istance between matching paths an iagonal. We call this parameter the max istance (Fig. 9). Increasing this value increases the amount of ata while egraing the quality of the ata. The secon one is the minimum value of segment length. We call this parameter the min seg-size. Fig. 10 shows an example of selection base on segment length. Increasing this value reuces the amount of ata while improving the quality of the ata. IV. EXPERIMENTS Two experiments were conucte to evaluate the propose metho. A. Experimental conitions In this section, we show common experimental conitions. We use ata from 9 trees of songs A, B,..., an I in nana s atabase. The target post ata was the full-length main meloy sung by one female singer. The source post ata were ranomly selecte from each tree, incluing posts singing backing chorus or partly singing. Singing voice signals were sample at 32 khz, an acoustic were extracte with a 5-ms shift. As an acoustic feature, 0 th through 43 th mel-cepstral coefficients were extracte from the smoothe spectrum analyze by STRAIGHT[17]. The DNN use in this system was traine from mel-cepstral coefficients. The architecture of the DNN was a 3-hienlayer fee-forwar neural network with 1024 units per hien layer. were normalize by the mean being zero an the variance being one. The mel-cepstral istortion between the target an the converte mel-cepstra was use as the objective evaluation measure, which is efine as: Mel-CD = 10 D 2 (c (1) c (2) ln 10 )2, (1) =1 where c (1) an c (2) are the th coefficients of the target an the converte mel-cepstra, respectively. B. Close test for parameter consieration This experiment was carrie out to etermine the two parameters, the max istance (the maximum value of istance between matching path an iagonal) an the min seg-size (the minimum value of segment length). Combinations of the parameters were compare base on the mel-cepstral istortion. From the 9 trees, 8 trees (songs A, B,..., an H) were selecte. Then, 50 source posts an 1 target post were selecte from each tree as the training ata. The total training ata of the source singers is 400 an the target singer is 8. Table I an II show the experimental results. They escribe Mel-CD an the percentage of extracte frames to all frames. In Table I, the best Mel-CD value in every column is bolface, an the best value in the table is unerline. In Table II, the corresponing cells are bol-face an unerline. The value - means that no frames were extracte to train the moel because no matching path satisfie the conition[s?]. Although more frames were use for training in the lowerleft sie of Table II, there are cells that have smaller Mel- CD values near the iagonal in Table I. This is because that there is a trae-off between the quantity an the quality of the extracte ata. The quality of the ata improve in the higher right sie of the table where the values of max istance are smaller an the values of min seg-size are larger. These results inicate that our metho with optimum parameters improves conversion accuracy. C. Open test In this experiment, four moels were traine using the singing ata of a ifferent number of collaboration trees. They TABLE I MEL-CD [DB] Min seg-size (s) Max istance (s) Without the propose metho: B TABLE II THE RATIO OF THE NUMBER OF FRAMES USED FOR TRAINING (%) Min seg-size [s] Max istance (s) The total number of frames: 12,247,

5 Mel-CD [B] The Number of Trees Fig. 11. Mel-CD [B] (open test) were compare for open ata using the best parameters of our metho in the previous experiment as follows: Max istance: 0.05 secon Min seg-size: 0.1 secon The four moels were traine using post ata in 1 (song A), 2 (songs A an B), 4 (songs A, B, C, an D), an 8 (songs A, B,..., an H) trees, respectively. Four hunre post ata sung by source singers were ranomly selecte from the tree/trees for all moels as source singer training ata. One post ata was selecte from each tree as a training ata of the target singer. The test ata set was compose of the post ata in the tree of song I, which inclue 18 source singers ata an one target singer s ata. Fig. 11 shows the results. Increasing the number of trees mostly cause a ecrease in mel-cepstral istortion because the iversity of the training ata improve. However, the moel using 4 trees outperforme the moel using 8 trees. It is assume that each tree has ifferent suitable parameters. Therefore, it is possible that parameters suitable for each tree are more largely iffer in the moel using 8trees than the moel using 4 trees. V. CONCLUSIONS Using ata poste to social meia, we propose a metho of extracting training ata that can be use for many-to-one singing voice conversions. For training, we use the pairs of matche frames that have small ifferences in utterance timing. We assume that posts that were sung with the same accompaniment woul have little ifference in utterance timing. Experimental results showe that setting two appropriate parameters (the maximum value of istance between the matching path an the iagonal an minimum value of segment length) while consiering the trae-off between the quantity an the quality of the training ata improve the objective evaluation measure. Increasing the number of trees use for training ata cause us to accurately convert songs that were not use for training. Future works inclue researching proper parameters base on various elements (such as tempo), applying ifferent parameters for each song, an subjective evaluation. VI. ACKNOWLEDGMENT This research was supporte by nana music, Inc. REFERENCES [1] J. Yin, W. Lo, an Z. Wu, From Big Data to Great Services, 2016 IEEE International Congress on Big Data (BigData Congress), pp , [2] B. Li, T. Sainath, A. Narayanan, J. Caroselli, M. Bacchiani, A. Misra, I. Shafran, H. Sak, G. Punak, K. Chin, K-C Sim, R. Weiss, K. Wilson, E. Variani, C. Kim, O. Siohan, M. Wein-traub, E. McDermott, R. Rose, an M. Shannon, Moeling for Google Home, in INTERSPEECH-2017, Aug. 2017, pp [3] P. Bell, M. J. F. Gales, T. Hain, J. Kilgour, P. Lanchantin, X. Liu, A. McParlan, S. Renals, O. Saz, M. Wester, an P. C. Woolan, The MGB Challenge: Evaluating Multi-Genre Broacast meia recognition, IEEE Automatic Speech Recognition an Unerstaning Workshop, [4] P. Karanasou, M. J. F. Gales, P. Lanchantin, X. Liu, Y. Qian, L. Wang, P. C. Woolan, an C. Zhang, Speaker iarisation an longituinal linking in multi-genre broacast ata, IEEE Automatic Speech Recognition an Unerstaning Workshop, [5] J. Villalba, A. Ortega, A. Miguel, an L. Lleia, Variational Bayesian PLDA for speaker iarization in the MGB Challenge, IEEE Automatic Speech Recognition an Unerstaning Workshop, [6] P. C. Woolan, X. Liu, Y. Qian, C. Zhang, M. J. F. Gales, P. Karanasou, P. Lanchantin, an L. Wang, Cambrige University transcription systems for the Multi-Genre Broacast Challenge, IEEE Automatic Speech Recognition an Unerstaning Workshop, [7] O. Saz, M. Doulaty, S. Deena, R. Milner, R. Ng, M. Hasan, Y. Liu, an T. Hain, The 2015 Sheffiel system for transcription of multi-genre broacast meia, IEEE Automatic Speech Recognition an Unerstaning Workshop, [8] nana, (2018) [9] T. Toa, A. W. Black, an K. Tokua, Spectral Conversion Base on Maximum Likelihoo Estimation Consiering Global Variance of Converte Parameter, ICASSP 2005, 2005 [10] T. Toa, A. W. Black, an K. Tokua, Voice Conversion Base on Maximum-Likelihoo Estimation of Spectral Parameter Trajectory, IEEE Transactions on Auio, Speech, an Language Processing, Vol. 15, No. 8, [11] Y. Stylianou, O. Cappe, an E. Moulines, Continuous Probabilistic Transform for Voice Conversion, Proc. of IEEE Trans. Speech Auio Process., vol. 6, pp , [12] S. Desai, E. V. Raghavenra, B. Yegnanarayana, A. W. Black an K. Prahalla, Voice conversion using artificial neural networks, Proceeings of ICASSP 2009 pp , [13] N. Hosaka, K. Hashimoto, K. Oura, Y. Nankaku, an K. Tokua, Voice Conversion Base on Trajectory Moel Training of Neural Networks Consiering Global Variance, Interspeech 2016, [14] H. Sakoe an S. Chiba, Dynamic programming algorithm optimization for spoken wor recognition, IEEE Transactions on s, Speech, an Signal Processing, vol. 26, No. 1, pp , [15] Y. Hono, K. Sawaa, K. Hashimoto, K. Oura, Y. Nankaku, K. Tokua, D. Kono, an D. Ishikawa Singing voice conversion using post ata in music SNS, Proc. of al Society of Japan Autumn Meeting, , pp , 2017 (in Japanese). [16] K. Yoshia an H. Sakoe, Online Hanwritten Character Recognition for a Personal Computer System, IEEE Transactions on Consumer Electronics, vol. CE-28, No. 3, pp , [17] H. Kawahara, I. Masua-Katsuse, an A. Cheveigne, Restructuring speech representations using a pitch-aaptive time-frequency smoothing an an instantaneous-frequency-base F0 extraction: Possible role of a repetitive structure in souns, Speech Communication, Vol. 27, No. 3 4, pp ,

Singing voice synthesis based on deep neural networks

Singing voice synthesis based on deep neural networks INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda

More information

Promises and challenges of electronic journals 169. Heting Chu Palmer School of Library & Information Science, Long Island University, NY, USA

Promises and challenges of electronic journals 169. Heting Chu Palmer School of Library & Information Science, Long Island University, NY, USA Promises an challenges of electronic journals 169 Learne Publishing (1999)13, 169 175 Introuction Rapi avancement of information technologies, incluing the internet an igitizing techniques, means that

More information

Perceptual Quantiser (PQ) to Hybrid Log-Gamma (HLG) Transcoding

Perceptual Quantiser (PQ) to Hybrid Log-Gamma (HLG) Transcoding Perceptual Quantiser (PQ) to Hybri Log-Gamma (HLG) Transcoing Part of the HR-TV series. Last upate June 07. Introuction This ocument escribes the transcoe process between PQ an HLG where the isplay brightness

More information

A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM

A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM T. Borer an A. Cotton BBC R&D, 56 Woo Lane, Lonon, W12 7SB, UK ABSTRACT High Dynamic Range (HDR) television has capture the imagination of the

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

Automatic Rhythmic Notation from Single Voice Audio Sources

Automatic Rhythmic Notation from Single Voice Audio Sources Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung

More information

DXR.1 Digital Audio Codec

DXR.1 Digital Audio Codec DXR.1 Digital Auio Coec SECTION 1...INTRODUCTION... 3...DIGITAL SERVICES... 3...WHAT COMES WITH THE DXR.1?... 3 2...SETUP... 4...DATA CONNECTION... 4...POWER CONNECTION... 4...AUDIO CONNECTIONS... 5...CONTACT

More information

Singer Traits Identification using Deep Neural Network

Singer Traits Identification using Deep Neural Network Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic

More information

Audio-Based Video Editing with Two-Channel Microphone

Audio-Based Video Editing with Two-Channel Microphone Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science

More information

Singer Identification

Singer Identification Singer Identification Bertrand SCHERRER McGill University March 15, 2007 Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 1 / 27 Outline 1 Introduction Applications Challenges

More information

Chord Classification of an Audio Signal using Artificial Neural Network

Chord Classification of an Audio Signal using Artificial Neural Network Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

More information

A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis

A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis INTERSPEECH 2014 A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis S. W. Lee 1, Zhizheng Wu 2, Minghui Dong 1, Xiaohai Tian 2, and Haizhou Li 1,2 1 Human Language Technology

More information

Automatic Laughter Detection

Automatic Laughter Detection Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,

More information

Life Science Journal 2014;11(6)

Life Science Journal 2014;11(6) A Stuy of Joranians Television Viewers Habits Hani H. Al-Dmour, Muhamma Alshurieh 2, Sa'a Salehih 3. Marketing Department Faculty of Business, The University of Joran. Amman Joran, E-mail: mourn@ju.eu.jo

More information

Outline. Introduction to number systems: sign/magnitude, ones complement, twos complement Review of latches, flip flops, counters

Outline. Introduction to number systems: sign/magnitude, ones complement, twos complement Review of latches, flip flops, counters Outline Last time: Introuction to number systems: sign/magnitue, ones complement, twos complement Review of latches, flip flops, counters This lecture: Review Tables & Transition Diagrams Implementation

More information

Subjective Similarity of Music: Data Collection for Individuality Analysis

Subjective Similarity of Music: Data Collection for Individuality Analysis Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp

More information

Advanced Signal Processing 2

Advanced Signal Processing 2 Advanced Signal Processing 2 Synthesis of Singing 1 Outline Features and requirements of signing synthesizers HMM based synthesis of singing Articulatory synthesis of singing Examples 2 Requirements of

More information

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Lab 3 : CMOS Sequential Logic Gates

Lab 3 : CMOS Sequential Logic Gates CARLETON UNIERSITY epartment of Electronics ELEC-3500 igital Electronics Januar 20, 2004 Lab 3 : CMOS Seuential Logic Gates esign an Specification of Seuential Logic Gates an Librar Cell igital circuits

More information

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music

More information

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION Graham E. Poliner and Daniel P.W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University, New York NY 127 USA {graham,dpwe}@ee.columbia.edu

More information

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices Yasunori Ohishi 1 Masataka Goto 3 Katunobu Itou 2 Kazuya Takeda 1 1 Graduate School of Information Science, Nagoya University,

More information

1. Introduction NCMMSC2009

1. Introduction NCMMSC2009 NCMMSC9 Speech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices * Takeshi SAITOU 1, Masataka GOTO 1, Masashi

More information

JAMIA. Information Information for Authors

JAMIA. Information Information for Authors 102 2005 Information for Authors Information JAMIA for Authors The Journal of the American Meical Informatics Association (JAMIA) enorses an recommens the guielines publishe as Uniform Requirements for

More information

A Music Retrieval System Using Melody and Lyric

A Music Retrieval System Using Melody and Lyric 202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent

More information

Towards Complexity Studies of Indonesian Songs

Towards Complexity Studies of Indonesian Songs Towars Complexity Stuies of Inonesian Songs Hokky Situngkir [hs@compsoc.banungfe.net] Dept. Computational Sociology Banung Fe Institute Research Fellow Surya Research International August 8 th 2007 Abstract

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

THE importance of music content analysis for musical

THE importance of music content analysis for musical IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Music Genre Classification and Variance Comparison on Number of Genres

Music Genre Classification and Variance Comparison on Number of Genres Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques

More information

Improving Frame Based Automatic Laughter Detection

Improving Frame Based Automatic Laughter Detection Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;

More information

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions 1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,

More information

ISSN ICIRET-2014

ISSN ICIRET-2014 Robust Multilingual Voice Biometrics using Optimum Frames Kala A 1, Anu Infancia J 2, Pradeepa Natarajan 3 1,2 PG Scholar, SNS College of Technology, Coimbatore-641035, India 3 Assistant Professor, SNS

More information

SINGING COMPANION LESSON BOOK

SINGING COMPANION LESSON BOOK SINGING COMPANION LESSON BOOK Name: 36 COMPREHENSIVE LESSONS from Malovance, Wieneke, Meloia an Burgmayer CURWEN HAND SIGNS The application of solfeggio is best reinforce by using the Curwen han signs

More information

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL

More information

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon

A Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.

More information

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Jordi Bonada, Martí Umbert, Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Spain jordi.bonada@upf.edu,

More information

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION

EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric

More information

Christine Baldwin Project Manager, SuperJournal. David Pullinger Project Director, SuperJournal

Christine Baldwin Project Manager, SuperJournal. David Pullinger Project Director, SuperJournal What reaers value in acaemic journals 229 Learne Publishing (2000)13, 229 239 Introuction SuperJournal 1,2 was a research project in the Electronic Libraries (elib) programme 3 that examine how reaers

More information

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for

More information

Lecture Notes 12: Digital Cellular Communications

Lecture Notes 12: Digital Cellular Communications SNR Lecture Notes 2: Digital Cellular Communications Consier a cellular communications system with hexagonal cells each containing a base station an a number of mobile units Figure 5: Celluar Communication

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM

A QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr

More information

Acoustic Scene Classification

Acoustic Scene Classification Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of

More information

Experiments on musical instrument separation using multiplecause

Experiments on musical instrument separation using multiplecause Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk

More information

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate

More information

Music Information Retrieval Community

Music Information Retrieval Community Music Information Retrieval Community What: Developing systems that retrieve music When: Late 1990 s to Present Where: ISMIR - conference started in 2000 Why: lots of digital music, lots of music lovers,

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS

AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine LaBRI - CNRS UMR 5800 - University of Boreaux {fourer, rouas, hanna,

More information

Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music

Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music Mine Kim, Seungkwon Beack, Keunwoo Choi, and Kyeongok Kang Realistic Acoustics Research Team, Electronics and Telecommunications

More information

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Investigation

More information

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL

More information

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Priyanka S. Jadhav M.E. (Computer Engineering) G. H. Raisoni College of Engg. & Mgmt. Wagholi, Pune, India E-mail:

More information

Automatic Piano Music Transcription

Automatic Piano Music Transcription Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening

More information

MODELS of music begin with a representation of the

MODELS of music begin with a representation of the 602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

Retrieval of textual song lyrics from sung inputs

Retrieval of textual song lyrics from sung inputs INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Retrieval of textual song lyrics from sung inputs Anna M. Kruspe Fraunhofer IDMT, Ilmenau, Germany kpe@idmt.fraunhofer.de Abstract Retrieving the

More information

Topics in Computer Music Instrument Identification. Ioanna Karydi

Topics in Computer Music Instrument Identification. Ioanna Karydi Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 25 N. University Street West Lafayette, IN 4797-266 http://www.cs.purue.eu/people/comer Copyright 26. All rights reserve.

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

An Efficient Test Pattern Generator -Mersenne Twister-

An Efficient Test Pattern Generator -Mersenne Twister- R1-12 SASIMI 2013 Proceings An Efficient Test Pattern Generator -Mersenne Twister- Hiroshi Iwata Sayaka Satonaka Ken ichi Yamaguchi Department of Information Engineering, Faculty of Avanc Engineering Nara

More information

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu

More information

AUDIO KEY LINKS: PLAYBACK DEVICES IMPROVEMENT IST PRESTO Preservation Technologies for European Broadcast Archives

AUDIO KEY LINKS: PLAYBACK DEVICES IMPROVEMENT IST PRESTO Preservation Technologies for European Broadcast Archives PRETO Preservation Technologies for European roacast Archives IT-1999-20013 AUDIO KEY LINK: PLAYACK DEVICE IMPROVEMENT Authors: Daniele AIROLA, alvatore CANGIALOI an Giorgio Dimino (RAI) 2 PRETO IT-1999-2013

More information

On human capability and acoustic cues for discriminating singing and speaking voices

On human capability and acoustic cues for discriminating singing and speaking voices Alma Mater Studiorum University of Bologna, August 22-26 2006 On human capability and acoustic cues for discriminating singing and speaking voices Yasunori Ohishi Graduate School of Information Science,

More information

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT

FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT 10th International Society for Music Information Retrieval Conference (ISMIR 2009) FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT Hiromi

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Music Recommendation from Song Sets

Music Recommendation from Song Sets Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia

More information

The Ukulele Circle of Fifths - Song Structure Lesson

The Ukulele Circle of Fifths - Song Structure Lesson The Ukulele Circle of Fifths - Song Structure Lesson You will learn: How the circle of fifths is constructe. How the circle of fifths helps you unerstan the structure of a song. How to use the circle of

More information

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate

More information

AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE

AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE 1th International Society for Music Information Retrieval Conference (ISMIR 29) AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE Tatsuya Kako, Yasunori

More information

By Jon R. Davids, MD, Daniel M. Weigl, MD, Joye P. Edmonds, MLIS, AHIP, and Dawn W. Blackhurst, DrPH

By Jon R. Davids, MD, Daniel M. Weigl, MD, Joye P. Edmonds, MLIS, AHIP, and Dawn W. Blackhurst, DrPH 1155 COPYRIGHT Ó 2010 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Reference Accuracy in Peer-Reviewe Peiatric Orthopaeic Literature By Jon R. Davis, MD, Daniel M. Weigl, MD, Joye P. Emons, MLIS,

More information

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis

Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

Research Article. ISSN (Print) *Corresponding author Shireen Fathima

Research Article. ISSN (Print) *Corresponding author Shireen Fathima Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models

A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA

More information

Tempo and Beat Analysis

Tempo and Beat Analysis Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:

More information

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE

More information

Singer Recognition and Modeling Singer Error

Singer Recognition and Modeling Singer Error Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing

More information

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES

MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University

More information

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Jana Eggink and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 11

More information

Statistical Modeling and Retrieval of Polyphonic Music

Statistical Modeling and Retrieval of Polyphonic Music Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,

More information

A Bootstrap Method for Training an Accurate Audio Segmenter

A Bootstrap Method for Training an Accurate Audio Segmenter A Bootstrap Method for Training an Accurate Audio Segmenter Ning Hu and Roger B. Dannenberg Computer Science Department Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 1513 {ninghu,rbd}@cs.cmu.edu

More information

Recognising Cello Performers Using Timbre Models

Recognising Cello Performers Using Timbre Models Recognising Cello Performers Using Timbre Models Magdalena Chudy and Simon Dixon Abstract In this paper, we compare timbre features of various cello performers playing the same instrument in solo cello

More information

Supervised Learning in Genre Classification

Supervised Learning in Genre Classification Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music

More information

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM

GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM 19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui

More information

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio

Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory

More information

Music Segmentation Using Markov Chain Methods

Music Segmentation Using Markov Chain Methods Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some

More information

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach

More information

A repetition-based framework for lyric alignment in popular songs

A repetition-based framework for lyric alignment in popular songs A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine

More information

Figure 1: Feature Vector Sequence Generator block diagram.

Figure 1: Feature Vector Sequence Generator block diagram. 1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.

More information

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES 12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou

More information

Release Year Prediction for Songs

Release Year Prediction for Songs Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu

More information

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT

UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT Stefan Schiemenz, Christian Hentschel Brandenburg University of Technology, Cottbus, Germany ABSTRACT Spatial image resizing is an important

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013 73 REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation Zafar Rafii, Student

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter?

Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter? Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter? Yi J. Liang 1, John G. Apostolopoulos, Bernd Girod 1 Mobile and Media Systems Laboratory HP Laboratories Palo Alto HPL-22-331 November

More information