Singing Voice Conversion Using Posted Waveform Data on Music Social Media
|
|
- Henry Fox
- 5 years ago
- Views:
Transcription
1 Singing Voice Conversion Using Poste Waveform Data on Music Social Meia Koki Sena, Yukiya Hono, Kei Sawaa, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku an Keiichi Tokua Department of Computer Science an Engineering, Nagoya Institute of Technology, Nagoya, Japan {kksn924, hono, swkei, bonanza, uratec, nankaku, Tel: Abstract This paper proposes a metho of selecting training ata for many-to-one singing voice conversion (VC) from ata on the social meia music app nana. On this social meia app, users can share souns such as speaking, singing, an instrumental music recore by their smartphones. The number of hours of accumulate ata has exceee one million, an it is regare as big ata. It is wiely known that big ata can create huge values by avance eep learning technology. A lot of post ata of multiple users having sung the same song is containe in nana s atabase. This ata is consiere suitable training ata for VC. This is because VC frameworks base on statistical approaches often require parallel ata sets that consist of pairs of ata of source an target singers who sing the same phrases. The propose metho can compose parallel ata sets that can be use for many-to-one statistical VCs from nanas atabase by extracting frames that have small ifferences in the timing of utterances, base on the results of ynamic programming (DP) matching. Experimental results inicate that a system that uses training ata compose by our metho can convert acoustic more accurately than a system that oes not use the metho. I. INTRODUCTION Social meia has mae it possible for people all over the worl to transmit their information. There are many kins of social meia websites an apps, such as YouTube, Facebook, an Instagram, an a large amount of ata transmitte by users has been accumulating. This big ata has increasing potential in creating values for every fiel[1]. Recently, machine learning methos of ealing with big ata have been wiely researche in many institutes an laboratories. The Multi-Genre Broacast (MGB) Challenge an official challenge of the IEEE Automatic Speech Recognition an Unerstaning Workshop (ASRU) is one of the international workshops evaluating big ata technologies relate to the speech fiel[3]. The challenge at ASRU 2015 was an evaluation of transcription[6], [7], speaker iarization[4], [5], ialect etection, an lightly supervise alignment using approximately 1,600 hours of British Broacasting Corporation (BBC) television program ata that ha been recore. Commercialize proucts using eep learning technology alreay exist, such as the speech recognition system employe in Google Home, which was traine using tens of thousans of hours of speech ata[2]. The social meia app nana [8] stores big ata of music ata. This social meia app is esigne to allow users to share singing an instrumental souns easily using their smartphone. Up to more than one million hours of ata have been uploae, an the amount continues to increase. In nana, users can collaborate with other users uploae posts by overubbing other user s souns with their own souns. In particular, accompaniment posts of popular songs are collaborate on by many users. The relationship between collaborating an collaborate posts is represente by a tree structure. Because each tree generally consists of one song, the same song is sung in almost all singing posts of each tree. A atabase that contains a large number of posts in which the same songs are sung by multiple users has large potential. This ata can be use for a parallel ata set, the training ata of voice conversion (VC). VC is a metho of converting a speaker s voice into another kin of voice, especially another speaker s voice, while maintaining linguistic information. Statistical approaches have been wiely researche[9], [10]. The conventional statistical VC is often base on a Gaussian mixture moel (GMM) [11]} More recently, a VC framework base on eep neural networks (DNNs) has been propose[12], [13]. These statistical VC typically train statistical moels by using a parallel ata set that consists of pairs of speech ata from source an target speakers uttering the same sentences. Not every ata can be use when the nana s ata is use for VC training ata. For example, the atabase contains some partly sung ata. We propose a metho base on ynamic programming (DP) matching for composing training ata from the atabase. The target ata an the other ata in the same collaboration tree are compare by DP matching, then a parallel ata set is extracte. The rest of this paper is organize as follows. Section 2 an 3 escribe the social meia music app nana an voice conversion using nana s ata, respectively. Section 4 escribes the experimental conitions an experimental results. Section 5 presents concluing remarks an future work. II. SOCIAL MEDIA MUSIC APP NANA The social meia music app nana[8] was evelope by nana music, Inc. as a social music platform. The users can recor an uploa souns such as speaking, singing, an instrumental music to nana with their smartphones. Through the app, users worlwie can communicate with each other through music. As of April 2018, there are six million users in 113 countries APSIPA 1913 APSIPA-ASC 2018
2 Proceeings, APSIPA Annual Summit an Conference 2018 Guitar Collab Pf. Bass Ba. Dr. Drums Piano Vocal Fig. 4. A tree structure of an example collaboration relationship. Fig. 1. The recoring process. Fig. 2. Playing the soun of a post. Fig. 3. Showing all the collaborators. Pf. Accompaniment Collab More than 61 million posts have accumulate in the atabase. Furthermore, the number of posts is still increasing. The users uploa souns to nana accoring to the following proceure: 1) Recor souns. 2) A information about recore souns, such as title, artist s name, an explanation. 3) Choose soun effects, such as echo, to arrange souns. Fig. 1 shows the screen of the recoring process. Users can listen to uploae posts, as shown in Fig. 2, an the users get feeback such as Comment an Applause (function equivalent to Like ) from other users. In aition to these general functions, users can collaborate on posts. This characteristic function is calle Collab. The users can post their souns, which are overubbe on another user s post. The function has the following two main effects. First, multiple users can create one soun together. For example, multiple users singing is overubbe to create choruses, an instruments are overubbe to create ban performances Fig. 3 shows the relationship between collaborators on a post. On the screen shown in the figure, you can see all the posts in the Collab series of each post. Secon, users can easily post accompanie singing voices because they can use an accompaniment post of another user. They o not have to prepare an accompaniment soun source by themselves; all they nee to o is sing. Because many users have poste their singing in this way, the atabase hols many songs sung by multiple users. Popular songs are typically sung by tens of thousans of users. The posts using Collab have two types of ata. The first type is mixe soun source ata that all the posts in the Collab series are overubbe by. The users are able to listen to only this type of soun source. The other type is single source ata that consists of just a soun recore when posting. In most cases, each ata represents only one singing voice or one instrumental soun, although some of this type of ata has multiple souns; for example, singing with an instrument, such as guitar or piano, at the same time. All the posts uploae using Collab are relate to the collaborate post. This relationship is represente by a tree structure representing each post as a noe. When a post A Target Post Parallel Data Fig. 5. The posts that are regare as the same song because they have the same root noe post. exists an a post B collaborates with A, the post A becomes the parent noe an the post B becomes the chil noe. Fig. 4 shows an example of the tree structure. The post that collaborate first becomes the root noe an is accompaniment in most cases. In Fig. 4, the guitar post is the root noe. Generally, every song tree is compose of singing voices an instrumental souns relate to one song. In almost all singing voice posts of each tree, the same song is sung because they have been sung with the same accompaniment. Fig. 5 shows an example of singing voice posts regare as the same song in a tree. We focus on this tree structure to extract such singing voice posts sung by many users for many-to-one singing voice conversion. III. VOICE CONVERSION USING SINGING POST DATA Voice conversion (VC) is a metho of converting an input speaker s voice into various types of voices while keeping linguistic information unchange. This is mainly use for speaker conversion. A typical VC framework uses a statistical approach[9], [10]. In statistical VC, parallel ata sets, which consist of pairs of speech ata from source an target speakers uttering the same sentences, are use for training moels. One conventional statistical VC is base on a Gaussian mixture moel (GMM) [11] GMM-base VC represents the relationship between acoustic of a source an of a target speaker using linear combine multiple Gaussian istributions. A new approach base on eep neural networks (DNNs) has been propose[12], [13]. This can convert acoustic at a higher egree of precision than the GMM-base one. VC approaches are also istinguishe by the number of source 1914
3 Many-to-One VC Training Data Singer B Singer C Converter Singer Z (Statistical Moel) Source Target Fig. 6. Overview of many-to-one singing VC Training Source DTW Neural Network training Target Conversion Conversion by Neural Network Fig. 8. [The training ata extraction metho that we propose. Input Fig. 7. Overview of our VC system Converte an target speakers. In aition, other approaches exist, such as singing VC. Examples also inclue conversion of sexuality, age, etc. We employ DNN-base many-to-one singing VC in our system, which converts an arbitrary singer s voice into a particular singer s voice. In many-to-one singing VC, the input singing of an arbitrary singer (source singer) is converte into the singing of a particular singer (target singer), as shown in Fig. 6. Therefore, the parallel ata set has to consist of multiple source singers voices an one target singer s voice. Fig. 7 shows an overview of our VC system. In the training step, first, the acoustic are extracte from source an target ata. Then, time alignment between these feature sequences are obtaine by ynamic time wrapping (DTW)[14]. Finally, the neural network conversion moel is traine using time-aligne acoustic feature sequences. In the conversion step, acoustic extracte from input ata are converte by the traine moel frame-by-frame. Then, the output singing voice is synthesize using a vocoer from converte. We extracte the singing ata set of many users singing the same song from the nana s atabase an applie it to this VC system because it is suitable for a parallel ata set. Although it is generally ifficult to get intene ata from big ata, such singing voice ata is easily extracte from the atabase using tree structure representing collaboration relationships (Fig. 5). However, all of the extracte ata is not necessarily the same phrases, because users can recor an post arbitrary content. For instance, there are some posts in which a singer is harmonizing with another singer an others sing only the hook of a song. Hence, a metho is neee to remove unsuitable ata an create appropriate parallel ata sets. An approach employing ynamic programming (DP) match- Max istance Fig. 9. Max istance of matching pass from iagonal. Segments (not to be use) Segment (to be use) Fig. 10. Selection of training ata consiering segment length. ing to an extract parallel ata set has been propose on the assumption that any two ata of users singing the same song has a higher similarity than two other ata of users singing ifferent songs[15]. DP matching is a classical elastic matching metho, wiely applie to pattern recognition tasks such as speech recognition[14] an character recognition[16]. It can ynamically match each vector of two vector series that have ifferent lengths, an the result is calle a matching path. Then, the accumulation of Eucliean istances between matche vectors is calculate simultaneously at the en of matching. It inicates similarity between vector series. Therefore, it is possible to compare the similarities between two ranomly selecte post ata in the atabase that have ifferent lengths. In the conventional metho, first, a target post is ecie. Then, all the singing voice posts in the same tree are compare with the target post by DP matching, an the posts that have a small value of the accumulation of istances are extracte as source ata of a parallel ata set. However, in this metho, most singers of the selecte source ata woul be similar to the target singer because the accumulation of istances epens on the similarity between singers voices as well as what song was sung. In many-to-one singing VC, various types of voice ata shoul be use for source training 1915
4 ata to convert arbitrary singers voices. Our metho uses matching paths instea of the accumulation of istances. In our metho, the ifferences in utterance timing between matche frames is calculate from matching paths, an the pairs of frames that have a smaller value of calculation results than threshol are extracte to be use for training base on the hypothesis that the posts that were sung with the same accompaniment have small ifferences in the utterance timing of each phrase. When satisfying the conitions of this metho, the matching path is close to iagonal, as shown in Fig. 8. We call every part of the matching path extracte with this metho a segment. We expect to use our metho to remove unsuitable ata an create parallel ata sets that consists of various types of voices. In this metho, two parameters have to be set. The first is the maximum value of istance between matching paths an iagonal. We call this parameter the max istance (Fig. 9). Increasing this value increases the amount of ata while egraing the quality of the ata. The secon one is the minimum value of segment length. We call this parameter the min seg-size. Fig. 10 shows an example of selection base on segment length. Increasing this value reuces the amount of ata while improving the quality of the ata. IV. EXPERIMENTS Two experiments were conucte to evaluate the propose metho. A. Experimental conitions In this section, we show common experimental conitions. We use ata from 9 trees of songs A, B,..., an I in nana s atabase. The target post ata was the full-length main meloy sung by one female singer. The source post ata were ranomly selecte from each tree, incluing posts singing backing chorus or partly singing. Singing voice signals were sample at 32 khz, an acoustic were extracte with a 5-ms shift. As an acoustic feature, 0 th through 43 th mel-cepstral coefficients were extracte from the smoothe spectrum analyze by STRAIGHT[17]. The DNN use in this system was traine from mel-cepstral coefficients. The architecture of the DNN was a 3-hienlayer fee-forwar neural network with 1024 units per hien layer. were normalize by the mean being zero an the variance being one. The mel-cepstral istortion between the target an the converte mel-cepstra was use as the objective evaluation measure, which is efine as: Mel-CD = 10 D 2 (c (1) c (2) ln 10 )2, (1) =1 where c (1) an c (2) are the th coefficients of the target an the converte mel-cepstra, respectively. B. Close test for parameter consieration This experiment was carrie out to etermine the two parameters, the max istance (the maximum value of istance between matching path an iagonal) an the min seg-size (the minimum value of segment length). Combinations of the parameters were compare base on the mel-cepstral istortion. From the 9 trees, 8 trees (songs A, B,..., an H) were selecte. Then, 50 source posts an 1 target post were selecte from each tree as the training ata. The total training ata of the source singers is 400 an the target singer is 8. Table I an II show the experimental results. They escribe Mel-CD an the percentage of extracte frames to all frames. In Table I, the best Mel-CD value in every column is bolface, an the best value in the table is unerline. In Table II, the corresponing cells are bol-face an unerline. The value - means that no frames were extracte to train the moel because no matching path satisfie the conition[s?]. Although more frames were use for training in the lowerleft sie of Table II, there are cells that have smaller Mel- CD values near the iagonal in Table I. This is because that there is a trae-off between the quantity an the quality of the extracte ata. The quality of the ata improve in the higher right sie of the table where the values of max istance are smaller an the values of min seg-size are larger. These results inicate that our metho with optimum parameters improves conversion accuracy. C. Open test In this experiment, four moels were traine using the singing ata of a ifferent number of collaboration trees. They TABLE I MEL-CD [DB] Min seg-size (s) Max istance (s) Without the propose metho: B TABLE II THE RATIO OF THE NUMBER OF FRAMES USED FOR TRAINING (%) Min seg-size [s] Max istance (s) The total number of frames: 12,247,
5 Mel-CD [B] The Number of Trees Fig. 11. Mel-CD [B] (open test) were compare for open ata using the best parameters of our metho in the previous experiment as follows: Max istance: 0.05 secon Min seg-size: 0.1 secon The four moels were traine using post ata in 1 (song A), 2 (songs A an B), 4 (songs A, B, C, an D), an 8 (songs A, B,..., an H) trees, respectively. Four hunre post ata sung by source singers were ranomly selecte from the tree/trees for all moels as source singer training ata. One post ata was selecte from each tree as a training ata of the target singer. The test ata set was compose of the post ata in the tree of song I, which inclue 18 source singers ata an one target singer s ata. Fig. 11 shows the results. Increasing the number of trees mostly cause a ecrease in mel-cepstral istortion because the iversity of the training ata improve. However, the moel using 4 trees outperforme the moel using 8 trees. It is assume that each tree has ifferent suitable parameters. Therefore, it is possible that parameters suitable for each tree are more largely iffer in the moel using 8trees than the moel using 4 trees. V. CONCLUSIONS Using ata poste to social meia, we propose a metho of extracting training ata that can be use for many-to-one singing voice conversions. For training, we use the pairs of matche frames that have small ifferences in utterance timing. We assume that posts that were sung with the same accompaniment woul have little ifference in utterance timing. Experimental results showe that setting two appropriate parameters (the maximum value of istance between the matching path an the iagonal an minimum value of segment length) while consiering the trae-off between the quantity an the quality of the training ata improve the objective evaluation measure. Increasing the number of trees use for training ata cause us to accurately convert songs that were not use for training. Future works inclue researching proper parameters base on various elements (such as tempo), applying ifferent parameters for each song, an subjective evaluation. VI. ACKNOWLEDGMENT This research was supporte by nana music, Inc. REFERENCES [1] J. Yin, W. Lo, an Z. Wu, From Big Data to Great Services, 2016 IEEE International Congress on Big Data (BigData Congress), pp , [2] B. Li, T. Sainath, A. Narayanan, J. Caroselli, M. Bacchiani, A. Misra, I. Shafran, H. Sak, G. Punak, K. Chin, K-C Sim, R. Weiss, K. Wilson, E. Variani, C. Kim, O. Siohan, M. Wein-traub, E. McDermott, R. Rose, an M. Shannon, Moeling for Google Home, in INTERSPEECH-2017, Aug. 2017, pp [3] P. Bell, M. J. F. Gales, T. Hain, J. Kilgour, P. Lanchantin, X. Liu, A. McParlan, S. Renals, O. Saz, M. Wester, an P. C. Woolan, The MGB Challenge: Evaluating Multi-Genre Broacast meia recognition, IEEE Automatic Speech Recognition an Unerstaning Workshop, [4] P. Karanasou, M. J. F. Gales, P. Lanchantin, X. Liu, Y. Qian, L. Wang, P. C. Woolan, an C. Zhang, Speaker iarisation an longituinal linking in multi-genre broacast ata, IEEE Automatic Speech Recognition an Unerstaning Workshop, [5] J. Villalba, A. Ortega, A. Miguel, an L. Lleia, Variational Bayesian PLDA for speaker iarization in the MGB Challenge, IEEE Automatic Speech Recognition an Unerstaning Workshop, [6] P. C. Woolan, X. Liu, Y. Qian, C. Zhang, M. J. F. Gales, P. Karanasou, P. Lanchantin, an L. Wang, Cambrige University transcription systems for the Multi-Genre Broacast Challenge, IEEE Automatic Speech Recognition an Unerstaning Workshop, [7] O. Saz, M. Doulaty, S. Deena, R. Milner, R. Ng, M. Hasan, Y. Liu, an T. Hain, The 2015 Sheffiel system for transcription of multi-genre broacast meia, IEEE Automatic Speech Recognition an Unerstaning Workshop, [8] nana, (2018) [9] T. Toa, A. W. Black, an K. Tokua, Spectral Conversion Base on Maximum Likelihoo Estimation Consiering Global Variance of Converte Parameter, ICASSP 2005, 2005 [10] T. Toa, A. W. Black, an K. Tokua, Voice Conversion Base on Maximum-Likelihoo Estimation of Spectral Parameter Trajectory, IEEE Transactions on Auio, Speech, an Language Processing, Vol. 15, No. 8, [11] Y. Stylianou, O. Cappe, an E. Moulines, Continuous Probabilistic Transform for Voice Conversion, Proc. of IEEE Trans. Speech Auio Process., vol. 6, pp , [12] S. Desai, E. V. Raghavenra, B. Yegnanarayana, A. W. Black an K. Prahalla, Voice conversion using artificial neural networks, Proceeings of ICASSP 2009 pp , [13] N. Hosaka, K. Hashimoto, K. Oura, Y. Nankaku, an K. Tokua, Voice Conversion Base on Trajectory Moel Training of Neural Networks Consiering Global Variance, Interspeech 2016, [14] H. Sakoe an S. Chiba, Dynamic programming algorithm optimization for spoken wor recognition, IEEE Transactions on s, Speech, an Signal Processing, vol. 26, No. 1, pp , [15] Y. Hono, K. Sawaa, K. Hashimoto, K. Oura, Y. Nankaku, K. Tokua, D. Kono, an D. Ishikawa Singing voice conversion using post ata in music SNS, Proc. of al Society of Japan Autumn Meeting, , pp , 2017 (in Japanese). [16] K. Yoshia an H. Sakoe, Online Hanwritten Character Recognition for a Personal Computer System, IEEE Transactions on Consumer Electronics, vol. CE-28, No. 3, pp , [17] H. Kawahara, I. Masua-Katsuse, an A. Cheveigne, Restructuring speech representations using a pitch-aaptive time-frequency smoothing an an instantaneous-frequency-base F0 extraction: Possible role of a repetitive structure in souns, Speech Communication, Vol. 27, No. 3 4, pp ,
Singing voice synthesis based on deep neural networks
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
More informationPromises and challenges of electronic journals 169. Heting Chu Palmer School of Library & Information Science, Long Island University, NY, USA
Promises an challenges of electronic journals 169 Learne Publishing (1999)13, 169 175 Introuction Rapi avancement of information technologies, incluing the internet an igitizing techniques, means that
More informationPerceptual Quantiser (PQ) to Hybrid Log-Gamma (HLG) Transcoding
Perceptual Quantiser (PQ) to Hybri Log-Gamma (HLG) Transcoing Part of the HR-TV series. Last upate June 07. Introuction This ocument escribes the transcoe process between PQ an HLG where the isplay brightness
More informationA DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM
A DISPLAY INDEPENDENT HIGH DYNAMIC RANGE TELEVISION SYSTEM T. Borer an A. Cotton BBC R&D, 56 Woo Lane, Lonon, W12 7SB, UK ABSTRACT High Dynamic Range (HDR) television has capture the imagination of the
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox Final Project (EECS 94) knoxm@eecs.berkeley.edu December 1, 006 1 Introduction Laughter is a powerful cue in communication. It communicates to listeners the emotional
More informationOBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES
OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,
More informationAutomatic Rhythmic Notation from Single Voice Audio Sources
Automatic Rhythmic Notation from Single Voice Audio Sources Jack O Reilly, Shashwat Udit Introduction In this project we used machine learning technique to make estimations of rhythmic notation of a sung
More informationDXR.1 Digital Audio Codec
DXR.1 Digital Auio Coec SECTION 1...INTRODUCTION... 3...DIGITAL SERVICES... 3...WHAT COMES WITH THE DXR.1?... 3 2...SETUP... 4...DATA CONNECTION... 4...POWER CONNECTION... 4...AUDIO CONNECTIONS... 5...CONTACT
More informationSinger Traits Identification using Deep Neural Network
Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic
More informationAudio-Based Video Editing with Two-Channel Microphone
Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science
More informationSinger Identification
Singer Identification Bertrand SCHERRER McGill University March 15, 2007 Bertrand SCHERRER (McGill University) Singer Identification March 15, 2007 1 / 27 Outline 1 Introduction Applications Challenges
More informationChord Classification of an Audio Signal using Artificial Neural Network
Chord Classification of an Audio Signal using Artificial Neural Network Ronesh Shrestha Student, Department of Electrical and Electronic Engineering, Kathmandu University, Dhulikhel, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationA Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis
INTERSPEECH 2014 A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis S. W. Lee 1, Zhizheng Wu 2, Minghui Dong 1, Xiaohai Tian 2, and Haizhou Li 1,2 1 Human Language Technology
More informationAutomatic Laughter Detection
Automatic Laughter Detection Mary Knox 1803707 knoxm@eecs.berkeley.edu December 1, 006 Abstract We built a system to automatically detect laughter from acoustic features of audio. To implement the system,
More informationLife Science Journal 2014;11(6)
A Stuy of Joranians Television Viewers Habits Hani H. Al-Dmour, Muhamma Alshurieh 2, Sa'a Salehih 3. Marketing Department Faculty of Business, The University of Joran. Amman Joran, E-mail: mourn@ju.eu.jo
More informationOutline. Introduction to number systems: sign/magnitude, ones complement, twos complement Review of latches, flip flops, counters
Outline Last time: Introuction to number systems: sign/magnitue, ones complement, twos complement Review of latches, flip flops, counters This lecture: Review Tables & Transition Diagrams Implementation
More informationSubjective Similarity of Music: Data Collection for Individuality Analysis
Subjective Similarity of Music: Data Collection for Individuality Analysis Shota Kawabuchi and Chiyomi Miyajima and Norihide Kitaoka and Kazuya Takeda Nagoya University, Nagoya, Japan E-mail: shota.kawabuchi@g.sp.m.is.nagoya-u.ac.jp
More informationAdvanced Signal Processing 2
Advanced Signal Processing 2 Synthesis of Singing 1 Outline Features and requirements of signing synthesizers HMM based synthesis of singing Articulatory synthesis of singing Examples 2 Requirements of
More informationDrum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods
Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods Kazuyoshi Yoshii, Masataka Goto and Hiroshi G. Okuno Department of Intelligence Science and Technology National
More informationInstrument Recognition in Polyphonic Mixtures Using Spectral Envelopes
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu
More informationLab 3 : CMOS Sequential Logic Gates
CARLETON UNIERSITY epartment of Electronics ELEC-3500 igital Electronics Januar 20, 2004 Lab 3 : CMOS Seuential Logic Gates esign an Specification of Seuential Logic Gates an Librar Cell igital circuits
More informationPOST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS
POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music
More informationA CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION
A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION Graham E. Poliner and Daniel P.W. Ellis LabROSA, Dept. of Electrical Engineering Columbia University, New York NY 127 USA {graham,dpwe}@ee.columbia.edu
More informationOn Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices
On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices Yasunori Ohishi 1 Masataka Goto 3 Katunobu Itou 2 Kazuya Takeda 1 1 Graduate School of Information Science, Nagoya University,
More information1. Introduction NCMMSC2009
NCMMSC9 Speech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices * Takeshi SAITOU 1, Masataka GOTO 1, Masashi
More informationJAMIA. Information Information for Authors
102 2005 Information for Authors Information JAMIA for Authors The Journal of the American Meical Informatics Association (JAMIA) enorses an recommens the guielines publishe as Uniform Requirements for
More informationA Music Retrieval System Using Melody and Lyric
202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent
More informationTowards Complexity Studies of Indonesian Songs
Towars Complexity Stuies of Inonesian Songs Hokky Situngkir [hs@compsoc.banungfe.net] Dept. Computational Sociology Banung Fe Institute Research Fellow Surya Research International August 8 th 2007 Abstract
More informationEfficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas
Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied
More informationTHE importance of music content analysis for musical
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 1, JANUARY 2007 333 Drum Sound Recognition for Polyphonic Audio Signals by Adaptation and Matching of Spectrogram Templates With
More informationSpeech and Speaker Recognition for the Command of an Industrial Robot
Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.
More informationMusic Genre Classification and Variance Comparison on Number of Genres
Music Genre Classification and Variance Comparison on Number of Genres Miguel Francisco, miguelf@stanford.edu Dong Myung Kim, dmk8265@stanford.edu 1 Abstract In this project we apply machine learning techniques
More informationImproving Frame Based Automatic Laughter Detection
Improving Frame Based Automatic Laughter Detection Mary Knox EE225D Class Project knoxm@eecs.berkeley.edu December 13, 2007 Abstract Laughter recognition is an underexplored area of research. My goal for
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 AN HMM BASED INVESTIGATION OF DIFFERENCES BETWEEN MUSICAL INSTRUMENTS OF THE SAME TYPE PACS: 43.75.-z Eichner, Matthias; Wolff, Matthias;
More informationAn Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions
1128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 11, NO. 10, OCTOBER 2001 An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions Kwok-Wai Wong, Kin-Man Lam,
More informationISSN ICIRET-2014
Robust Multilingual Voice Biometrics using Optimum Frames Kala A 1, Anu Infancia J 2, Pradeepa Natarajan 3 1,2 PG Scholar, SNS College of Technology, Coimbatore-641035, India 3 Assistant Professor, SNS
More informationSINGING COMPANION LESSON BOOK
SINGING COMPANION LESSON BOOK Name: 36 COMPREHENSIVE LESSONS from Malovance, Wieneke, Meloia an Burgmayer CURWEN HAND SIGNS The application of solfeggio is best reinforce by using the Curwen han signs
More informationInternational Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC
Scientific Journal of Impact Factor (SJIF): 5.71 International Journal of Advance Engineering and Research Development Volume 5, Issue 04, April -2018 e-issn (O): 2348-4470 p-issn (P): 2348-6406 MUSICAL
More informationA Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon
A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.
More informationExpressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016
Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Jordi Bonada, Martí Umbert, Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Spain jordi.bonada@upf.edu,
More informationEXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION
EXPLORING THE USE OF ENF FOR MULTIMEDIA SYNCHRONIZATION Hui Su, Adi Hajj-Ahmad, Min Wu, and Douglas W. Oard {hsu, adiha, minwu, oard}@umd.edu University of Maryland, College Park ABSTRACT The electric
More informationChristine Baldwin Project Manager, SuperJournal. David Pullinger Project Director, SuperJournal
What reaers value in acaemic journals 229 Learne Publishing (2000)13, 229 239 Introuction SuperJournal 1,2 was a research project in the Electronic Libraries (elib) programme 3 that examine how reaers
More informationINTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION
INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION ULAŞ BAĞCI AND ENGIN ERZIN arxiv:0907.3220v1 [cs.sd] 18 Jul 2009 ABSTRACT. Music genre classification is an essential tool for
More informationLecture Notes 12: Digital Cellular Communications
SNR Lecture Notes 2: Digital Cellular Communications Consier a cellular communications system with hexagonal cells each containing a base station an a number of mobile units Figure 5: Celluar Communication
More information2. AN INTROSPECTION OF THE MORPHING PROCESS
1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,
More informationA QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM
A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr
More informationAcoustic Scene Classification
Acoustic Scene Classification Marc-Christoph Gerasch Seminar Topics in Computer Music - Acoustic Scene Classification 6/24/2015 1 Outline Acoustic Scene Classification - definition History and state of
More informationExperiments on musical instrument separation using multiplecause
Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk
More informationAUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION
AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION Halfdan Rump, Shigeki Miyabe, Emiru Tsunoo, Nobukata Ono, Shigeki Sagama The University of Tokyo, Graduate
More informationMusic Information Retrieval Community
Music Information Retrieval Community What: Developing systems that retrieve music When: Late 1990 s to Present Where: ISMIR - conference started in 2000 Why: lots of digital music, lots of music lovers,
More informationRobert Alexandru Dobre, Cristian Negrescu
ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q
More informationAUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS
AUTOMATIC TIMBRE CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS Dominique Fourer, Jean-Luc Rouas, Pierre Hanna, Matthias Robine LaBRI - CNRS UMR 5800 - University of Boreaux {fourer, rouas, hanna,
More informationGaussian Mixture Model for Singing Voice Separation from Stereophonic Music
Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music Mine Kim, Seungkwon Beack, Keunwoo Choi, and Kyeongok Kang Realistic Acoustics Research Team, Electronics and Telecommunications
More informationKeywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Investigation
More informationSYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS
Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL
More informationClassification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors
Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors Priyanka S. Jadhav M.E. (Computer Engineering) G. H. Raisoni College of Engg. & Mgmt. Wagholi, Pune, India E-mail:
More informationAutomatic Piano Music Transcription
Automatic Piano Music Transcription Jianyu Fan Qiuhan Wang Xin Li Jianyu.Fan.Gr@dartmouth.edu Qiuhan.Wang.Gr@dartmouth.edu Xi.Li.Gr@dartmouth.edu 1. Introduction Writing down the score while listening
More informationMODELS of music begin with a representation of the
602 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 3, MARCH 2010 Modeling Music as a Dynamic Texture Luke Barrington, Student Member, IEEE, Antoni B. Chan, Member, IEEE, and
More informationMusic Source Separation
Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or
More informationRetrieval of textual song lyrics from sung inputs
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Retrieval of textual song lyrics from sung inputs Anna M. Kruspe Fraunhofer IDMT, Ilmenau, Germany kpe@idmt.fraunhofer.de Abstract Retrieving the
More informationTopics in Computer Music Instrument Identification. Ioanna Karydi
Topics in Computer Music Instrument Identification Ioanna Karydi Presentation overview What is instrument identification? Sound attributes & Timbre Human performance The ideal algorithm Selected approaches
More informationComputer Organization
Computer Organization Douglas Comer Computer Science Department Purue University 25 N. University Street West Lafayette, IN 4797-266 http://www.cs.purue.eu/people/comer Copyright 26. All rights reserve.
More informationhit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.
CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating
More informationAn Efficient Test Pattern Generator -Mersenne Twister-
R1-12 SASIMI 2013 Proceings An Efficient Test Pattern Generator -Mersenne Twister- Hiroshi Iwata Sayaka Satonaka Ken ichi Yamaguchi Department of Information Engineering, Faculty of Avanc Engineering Nara
More informationTOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC
TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu
More informationAUDIO KEY LINKS: PLAYBACK DEVICES IMPROVEMENT IST PRESTO Preservation Technologies for European Broadcast Archives
PRETO Preservation Technologies for European roacast Archives IT-1999-20013 AUDIO KEY LINK: PLAYACK DEVICE IMPROVEMENT Authors: Daniele AIROLA, alvatore CANGIALOI an Giorgio Dimino (RAI) 2 PRETO IT-1999-2013
More informationOn human capability and acoustic cues for discriminating singing and speaking voices
Alma Mater Studiorum University of Bologna, August 22-26 2006 On human capability and acoustic cues for discriminating singing and speaking voices Yasunori Ohishi Graduate School of Information Science,
More informationFULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT
10th International Society for Music Information Retrieval Conference (ISMIR 2009) FULL-AUTOMATIC DJ MIXING SYSTEM WITH OPTIMAL TEMPO ADJUSTMENT BASED ON MEASUREMENT FUNCTION OF USER DISCOMFORT Hiromi
More informationVoice & Music Pattern Extraction: A Review
Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation
More informationMusic Recommendation from Song Sets
Music Recommendation from Song Sets Beth Logan Cambridge Research Laboratory HP Laboratories Cambridge HPL-2004-148 August 30, 2004* E-mail: Beth.Logan@hp.com music analysis, information retrieval, multimedia
More informationThe Ukulele Circle of Fifths - Song Structure Lesson
The Ukulele Circle of Fifths - Song Structure Lesson You will learn: How the circle of fifths is constructe. How the circle of fifths helps you unerstan the structure of a song. How to use the circle of
More informationMUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES
MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES Jun Wu, Yu Kitano, Stanislaw Andrzej Raczynski, Shigeki Miyabe, Takuya Nishimoto, Nobutaka Ono and Shigeki Sagayama The Graduate
More informationAUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE
1th International Society for Music Information Retrieval Conference (ISMIR 29) AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE Tatsuya Kako, Yasunori
More informationBy Jon R. Davids, MD, Daniel M. Weigl, MD, Joye P. Edmonds, MLIS, AHIP, and Dawn W. Blackhurst, DrPH
1155 COPYRIGHT Ó 2010 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Reference Accuracy in Peer-Reviewe Peiatric Orthopaeic Literature By Jon R. Davis, MD, Daniel M. Weigl, MD, Joye P. Emons, MLIS,
More informationAutomatic Extraction of Popular Music Ringtones Based on Music Structure Analysis
Automatic Extraction of Popular Music Ringtones Based on Music Structure Analysis Fengyan Wu fengyanyy@163.com Shutao Sun stsun@cuc.edu.cn Weiyao Xue Wyxue_std@163.com Abstract Automatic extraction of
More informationCS229 Project Report Polyphonic Piano Transcription
CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project
More informationResearch Article. ISSN (Print) *Corresponding author Shireen Fathima
Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2014; 2(4C):613-620 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)
More informationA System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models
A System for Automatic Chord Transcription from Audio Using Genre-Specific Hidden Markov Models Kyogu Lee Center for Computer Research in Music and Acoustics Stanford University, Stanford CA 94305, USA
More informationTempo and Beat Analysis
Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:
More informationGYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS. Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University (2) National Research and Simulation Center, Rafael Ltd. 0 MICROPHONE
More informationSinger Recognition and Modeling Singer Error
Singer Recognition and Modeling Singer Error Johan Ismael Stanford University jismael@stanford.edu Nicholas McGee Stanford University ndmcgee@stanford.edu 1. Abstract We propose a system for recognizing
More informationMUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES
MUSICAL INSTRUMENT RECOGNITION WITH WAVELET ENVELOPES PACS: 43.60.Lq Hacihabiboglu, Huseyin 1,2 ; Canagarajah C. Nishan 2 1 Sonic Arts Research Centre (SARC) School of Computer Science Queen s University
More informationApplication Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio
Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio Jana Eggink and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 11
More informationStatistical Modeling and Retrieval of Polyphonic Music
Statistical Modeling and Retrieval of Polyphonic Music Erdem Unal Panayiotis G. Georgiou and Shrikanth S. Narayanan Speech Analysis and Interpretation Laboratory University of Southern California Los Angeles,
More informationA Bootstrap Method for Training an Accurate Audio Segmenter
A Bootstrap Method for Training an Accurate Audio Segmenter Ning Hu and Roger B. Dannenberg Computer Science Department Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 1513 {ninghu,rbd}@cs.cmu.edu
More informationRecognising Cello Performers Using Timbre Models
Recognising Cello Performers Using Timbre Models Magdalena Chudy and Simon Dixon Abstract In this paper, we compare timbre features of various cello performers playing the same instrument in solo cello
More informationSupervised Learning in Genre Classification
Supervised Learning in Genre Classification Introduction & Motivation Mohit Rajani and Luke Ekkizogloy {i.mohit,luke.ekkizogloy}@gmail.com Stanford University, CS229: Machine Learning, 2009 Now that music
More informationGRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM
19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 GRADIENT-BASED MUSICAL FEATURE EXTRACTION BASED ON SCALE-INVARIANT FEATURE TRANSFORM Tomoko Matsui
More informationPredicting Time-Varying Musical Emotion Distributions from Multi-Track Audio
Predicting Time-Varying Musical Emotion Distributions from Multi-Track Audio Jeffrey Scott, Erik M. Schmidt, Matthew Prockup, Brandon Morton, and Youngmoo E. Kim Music and Entertainment Technology Laboratory
More informationMusic Segmentation Using Markov Chain Methods
Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some
More informationON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt
ON FINDING MELODIC LINES IN AUDIO RECORDINGS Matija Marolt Faculty of Computer and Information Science University of Ljubljana, Slovenia matija.marolt@fri.uni-lj.si ABSTRACT The paper presents our approach
More informationA repetition-based framework for lyric alignment in popular songs
A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine
More informationFigure 1: Feature Vector Sequence Generator block diagram.
1 Introduction Figure 1: Feature Vector Sequence Generator block diagram. We propose designing a simple isolated word speech recognition system in Verilog. Our design is naturally divided into two modules.
More informationA PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES
12th International Society for Music Information Retrieval Conference (ISMIR 2011) A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES Erdem Unal 1 Elaine Chew 2 Panayiotis Georgiou
More informationRelease Year Prediction for Songs
Release Year Prediction for Songs [CSE 258 Assignment 2] Ruyu Tan University of California San Diego PID: A53099216 rut003@ucsd.edu Jiaying Liu University of California San Diego PID: A53107720 jil672@ucsd.edu
More informationUNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT
UNIVERSAL SPATIAL UP-SCALER WITH NONLINEAR EDGE ENHANCEMENT Stefan Schiemenz, Christian Hentschel Brandenburg University of Technology, Cottbus, Germany ABSTRACT Spatial image resizing is an important
More informationWeek 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University
Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based
More informationREpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 21, NO. 1, JANUARY 2013 73 REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation Zafar Rafii, Student
More informationAPPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC
APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,
More informationAnalysis of Packet Loss for Compressed Video: Does Burst-Length Matter?
Analysis of Packet Loss for Compressed Video: Does Burst-Length Matter? Yi J. Liang 1, John G. Apostolopoulos, Bernd Girod 1 Mobile and Media Systems Laboratory HP Laboratories Palo Alto HPL-22-331 November
More information