AN ON-THE-FLY MANDARIN SINGING VOICE SYNTHESIS SYSTEM
|
|
- Dayna Lambert
- 5 years ago
- Views:
Transcription
1 AN ON-THE-FLY MANDARIN SINGING VOICE SYNTHESIS SYSTEM Cheng-Yuan Lin*, J.-S. Roger Jang*, and Shaw-Hwa Hwang** *Dept. of Computer Science, National Tsing Hua University, Taiwan **Dept. of Electrical Engineering, National Taipei University, Taiwan ABSTRACT An on-the-fly Mandarin singing voice synthesis system, called SINVOIS (singing voice synthesis), is proposed in this paper. The SINVOIS system can receive the continuous speech of the lyrics of a song, and generate the singing voice immediately based on the music score information (embedded in a MIDI file) of the song. Two sub-systems are designed and embedded into the system. One is the synthesis unit generator and the other is the pitch-shifting module. In the first one, the Viterbi decoding algorithm is employed on a continuous speech to generate the synthesis unit for singing voice. And the PSOLA method is employed to implement the pitch-shifting function in the second one. Moreover, the energy, duration, and spectrum modifications on the synthesis unit are also implemented in the second part. The synthesized singing voice sounds reasonably good. From the subjective listening test, the MOS (mean opinion score) of 4.5 and 3. are obtained for the original and synthesized singing voices, respectively. synthesized singing voices for virtual singers, and so on. In a conventional concatenation-based Chinese TTS system, the synthesis unit is taken from a set of pre-recorded 4 syllabic clips, representing the distinct base syllables in Mandarin Chinese. A concatenation-based singing voice synthesis system works in a similar way, except that we need to synthesize the singing voice based on a given music score and lyrics of a song. More specifically, a singing voice synthesis system takes as inputs the lyrics and melody information of a song. The lyrics are converted into syllables and the corresponding syllable clips are selected for concatenation. Then the system performs pitch/time modification and adds other desirable effects such as vibrato and echoes to make the synthesized singing voice more naturally sounding. The following figure demonstrates the flow chart of a conventional singing voice synthesis system:. INTRODUCTION Text-to-speech (TTS) systems have been developed in the past few decades and the most recent TTS systems can produce human-like natural sounding speech. The success of TTS systems can be attributed to their wide applications as well as the advances in modern computers. On the other hand, the research and developments of singing voice synthesis are not as mature as speech synthesis, partly due to its limited application domains. However, as computer-based games and entertainments are becoming popular, interesting applications of singing voice synthesis are emerging, including software for vocal training and However, such conventional singing voice synthesis systems cannot be used to produce personalized singing unless one has to record the 4 base Mandarin syllables in advance, which is a time-consuming process. Moreover, we must re-create the co-articulate effects since they are not available in the original 4 base syllable recordings. Considering these disadvantages in the conventional systems, we propose the use of speech recognition technology as a front end of our SINVOIS system. In other words, to create a personalized singing voice, the use needs to read the lyrics, sentence by sentence, to our system. Our system then employs
2 forced alignment via Viterbi decoding to detect the boundary of each character, as well as its consonant and vowel parts. Once these parts are identified, we can use them as synthesis units to synthesize a singing voice of the song, retaining all the timber and co-articulate effects of the user. Other add-on features, such as vibrato and echoes, can be imposed in the post-processing. The following figure demonstrates the flow chart of our SINVOS system: 2. RELATED WORK Due to limited computing power, most previous approaches to singing voice synthesis employ acoustic models to implement the human voice production. These include:. The SPASM system by Perry Cook [4] 2. The CHANT system by Bennett et al. [] 3. Rule-based synthesis by Sundberg [4] 4. Frequency modulation method by Chowning [3] However, performance of the above methods is not acceptable since the acoustic models cannot produce natural sounding human voices. Recently, the success of concatenation based text-to-speech systems motivates the use of concatenation for singing voice synthesis. For example, the LYRICOS system by Macon et al. [9][] is a typical example of concatenation-based singing voice synthesis system. The SMALLTALK system by OKI company [7] in Japan is another example that adopts PSOLA [6] method (which will be introduced in forth section) to synthesize singing voices. Even though these systems can produce satisfactory performance, they cannot produce personalized singing voices on the fly for a specific user. 3. GENERATION OF SYNTHESIS UNIT The conventional method of synthesis unit generation for speech synthesis derives from a database of 4 syllables that was recorded previously by a specific person who possesses clear tone. Once the recordings of 4 base syllables are available, we need to process the speech data according to the following steps:. End-point detection [5] based on energies and zero crossing rates are employed to identify the exact position of the speech recordings. 2. We need to find the pitch marks of each syllable, which are positions at the time axis indicating the beginning of a pitch period. 3. The consonant part and the vowel part of each syllable are also labeled manually. For best performance, the above three steps are usually carried out manually, which is a rather time-consuming process. In our SINVOIS system, we need to synthesize the singing voice on the fly; hence all three steps are performed automatically. Moreover, we also need to identify each syllable boundary via Viterbi decoding. 3. Syllable Detection For a given recording of a lyric sentence, each syllable is detected by force alignment via Viterbi decoding [2][3]. The process can be divided into the following two steps:. Each character in the lyric sentence must be labeled with a base syllable. This task is not as trivial as it seems since we need to take care of some of the character-to-syllable mappings that are one-to-many. A maximum matching method is used in conjunction with a dictionary of about 9, terms to determine the best character-to-syllable mapping. 2. The syllable sequence from a lyric sentence is then converted into bi-phone models for constructing a single-sentence of a linear lexicon. Viterbi decoding [2][3] is then employed to align the frames of the speech recording to the bi-phone models in the one-sentence linear lexicon, such that a best state sequence of the maximal probability is found. The obtained optimal state sequence indicates the best alignment of each frame to a state in the lexicon. Therefore we can correctly identify the position of each syllable, including its consonant and vowel parts. Of course, before the use of Viterbi decoding, we need to have an acoustic model in advance. The acoustic model used here contains 52 bi-phone models, which are obtained from a speech corpus of 7 subjects to achieve speaker independency. The
3 complete acoustic model ensures the precision in syllable detection. The following plot demonstrates a typical result of syllable detection. Amplitude Wave form x 4 For simplicity, once a syllable is detected, we can also use zero crossing rates directly to distinguish consonant part from vowel part. 3.2 Identification of Pitch Mark Pitch marks are the positions where complete pitch periods start. We need to identify pitch marks for effective time/pitch modification. In our system, pitch identification is only performed on the vowel part of each syllable since the consonant part does not have a clearly defined pitch. Namely, the consonant part of a syllable is kept unchanged during pitch shifting. The steps involved in pitch mark identification are listed next:. Use ACF (autocorrelation function) or AMDF (average magnitude difference function) to compute the average pitch periodt of a given syllable recording. 2. Find the global maximum of the syllable waveform and label its time coordinate ast m ; this is the position of the first pitch mark. 3. Search other pitch marks to the right oft m by finding the maximum in the region [ tm +.9* T, tm +.* T ]. Repeat the same procedure until all pitch marks to the right of the global maximum are found. 4. Search the pitch marks to the left of t m and the region should be t.* T, t.9* ] instead. [ m m T Repeat the same procedure until all pitch marks to the left of the global maximum are found. The following plot shows the waveform after pitch marks (denoted as circles) are found. Once pitch marks are found, we can perform necessary pitch/time modification according to the music score of the song, and add other desirable effects for singing voices. These procedures are introduced in next section. 4. PITCH SHIFTING MODULE In this section we will introduce the essential operations of STS that include pitch/time scale modification and energy normalization. Afterward we further to do fine tuning such as echo effect, pitch vibrato, coarticulation effect to make the singing voice more natural. 4. Pitch Shifting Pitch shifting of speech/audio signals is an essential part in speech and music synthesis. There are several well-known approaches to pitch shifting:. PSOLA (Pitch Synchronous Overlap and Add) [6] 2. Cross-Fading [2] 3. Residual Signal with PSOLA [5] 4. Sinusoidal Modeling [] In our system, we adopt the PSOLA method to achieve a balance between quality and efficiency. The basic concept behind PSOLA is to multiply a hamming window centered at each pitch mark of the speech signal. The windowed signals at each pitch mark are then relocated as necessary. More specifically, if we want to shift up pitch, the distance between neighboring pitch marks will be decreased. On the contrary, if we want to shift down pitch, the distance between neighboring pitch marks should be increased. When performing a pitch-up operation, the half size of the hamming window is equal to the desired distance between neighboring pitch marks of the pitch-up signals. On the other hand, when performing a pitch-down operation, the half size of the hamming window is equal to the distance between neighboring pitch marks of the original signals. As a result, we might want to insert some zeros between two windowed signals if a
4 pitch-down operation with less than 5% of the original pitch frequency is desired. The following plot demonstrates the situations for both pitch-up and pitch-down operations: O rig ina l Pitch up. Compute the energy of each syllable in the recorded lyric sentence, E, E2,... EN, where N is the number of syllables. N 2. Compute the average energy Eave = E k. N k= 3. Multiply the waveform of the k -th syllable by E / E E / E = 3.6. ave k ave k a constant ( ) Pitch down Insert zero Hamming Window On the other hand, if the recorded lyric sentence already bears the desirable energy profile, then we do not need to apply such energy normalization procedure. 4.2 Time Modification Time modification is used to increase or decrease the duration of a synthesis unit. We use a simple linear mapping method for time modification in our system. The method can duplicate or delete fundamental periods as necessary, as shown in the following diagram: O rigin al Later O riginal Later contraction of waveform extension of waveform Before we want to do time modification for a syllable, we must separate the consonant and vowel parts. Usually the consonant part is not change at all; only the vowel part is shortened or lengthened. The speech-based recording has already included natural co-articulation; therefore we do not need to do special arrangement on this issue. (In our previous approach to singing voice synthesis based on 4 syllable recordings, we need to smooth the transition between syllables by a method of cross-fading on a small overlapped region.) 4.3 Energy Modification The concatenated singing voice occasionally results in unnatural sound since each synthesis unit has diverse level of energy (intensity or volume). Therefore, we can simply adjust the amplitude in each syllable such that the energy is equal to the average energy of the whole sentence. The energy normalization procedure is described as follows: 4.4 Other Desirable Effects Results of the above synthesis procedure constantly contain some undesirable artificial-sounding buzzy effects. As a result, we adopt the following formula to implement the echo effect: y[ n] = x[ n] + ay[ n k] Or in its z-transform: Y ( z) H ( z) = = k X ( z) az The value of k controls the amount of delay and it can be adjust accordingly. Usually the amount of delay is set to.7 second. The echo effect can essentially mask the undesirable buzzy components. Furthermore, the echo effect can make the whole synthesize singing voice more genuine and softer, which is also exemplified by the fact that almost every karaoke machine has a knob for echo effect control. Besides the echo effect, the inclusion of vibrato effect [9] is an important factor to make the synthesized singing voice more natural. Vibrato effect can be implemented according to the following guidelines:. We use the sinusoidal function to model the effect of vibrato. For instance, if we want to alter the pitch curve of a syllable to a sinusoidal function in the range [a b] (for instance, [.8,.2]), we can simply do so by rescaling and shifting the basic sinusoid sin( ω t) : sin( ω t ) * ( b a) a + b +, 2 where ω is the vibration angular frequency and t is the frame index. 2. Reassign the position of pitch marks based on the above sinusoidal function of the pitch curve with vibrato. 2
5 3. Only syllables with duration greater than.8 seconds are permitted to have vibrato effect. Moreover, vibrato only occurs in the vowel part. The following figure demonstrates the synthesized singing voice without vibrato effect. The first plot is time-domain waveform; the second plot is the corresponding pitch curve without vibrato: Amplitude Semitone Wave form pitch: After median/merging filter demonstrates the average score of MOS test for each song: Song MOS From the above table, it is obvious that the synthesized singing voices are acceptable, but definitely not satisfactory enough to be described as natural sounding. The major reason is that the synthesis units are obtained from recordings of speech instead of singing. Therefore the synthesized voices are reminiscent of speech instead of natural singing. However, the effect of on-the-fly synthesis is entertaining and most people are eager to try the system for fun Time (in seconds) 6 CONCLUSIONS AND FUTURE WORK After adding the vibrato effect, the waveform as well as the pitch curve is shown in the following plot: Amplitude Semitone Wave form pitch: After median/merging filter Time (in seconds) 5 RESULTS AND ANALYSIS The performance of our SINVOS system depends on three factors: the outcome of force alignment via Viterbi decoding, the result of pitch/time modification, and the special effects of singing voice. We have 5 persons try 5 different Mandarin Chinese pop songs and obtain a 95% recognition rate on syllable detection. The resulted synthesized singing voices all seemed to be acceptable. We adopt a test of MOS (mean opinion score) [8] to obtain subjective assessments of our system. In the test, we have ten persons to listen to the fifteen synthesized singing voice and each person has to give a score for each song. The score ranges from t o 5, with 5 representing the highest grade for naturalness. The following table In the paper, we have described the development of a singing voce synthesis system called SINVOS (singing voice synthesis). The system can accept a user s speech input of the lyric sentences, and generate a synthesized singing voice based on the input recording and the song s music score. The operation of the system is divided into two parts: one is the synthesis unit generator via Viterbi decoding, and the other is time/pitch modification and special effects. To assess the performance of SINVOS, we designed an experiment with MOS for subjective evaluation. The experiment results are acceptable, but not totally satisfactory due to the way our synthesis units are obtained. However, the fun part of the system also comes from the personal recording, which can be used for on-the-fly synthesis that can retain personal features via audio timber. This is only a preliminary study and there are many directions for future work. Some of the immediately future work includes:. Find the transformation in the frequency domain that can capture and transform the speech recordings into their singing duplicates. 2. Find the most likely pitch contour via methods in system identification, such that the synthesized singing voice will have a natural pitch contour. 3. Try some other frequency domain techniques for pitch shifting, such as sinusoidal modeling [].
6 7 REFERENCES [] Bennett, Gerald, and Rodet, Xavier, Synthesis of the singing voice, in Current Directions in Computer Music Research (M. V. Mathews and J. R. Pierce, eds.), pp. 9-44, MIT Press, 989. [2] Chen, S.G. and Lin, G.J., High Quality and Low Complexity Pitch Modification of Acoustic Signals, Proceedings of the 995 IEEE International Conference on Acoustic, Speech, and Signal Processing, May, Detroit, USA, 995, p [3] Chowning, John M., Frequency Modulation Synthesis of the Singing Voice, in Current Directions in Computer Music Research (Max. V. Mathews and John. R. Pierce, eds.), pp , MIT Press, 989. [4] Cook, P.R., SPASM, a real time vocal track physical model controller and singer, the companion software synthesis system, Computer Music Journal, vol. 7, pp.3-43, spring 993. [5] Edgington, M. and Lowry, A., Residual-based speech modification algorithms for text-to-speech synthesis, Spoken Language, 996. ICSLP 96. Proceedings, Fourth International Conference on Volume: 3, 996, Page(s): vol.3 [6] F. Charpentier and Moulines, Pitch-synchronous Waveform Processing Technique for Text-to-Speech Synthesis Using Diphones, European Conf. On Speech Communication and Technology, pp.3-9, Paris, 989. [7] [8] ITU-T, Methods for Subjective Determination of Transmission Quality, 996, Int. Telecommunication Unit. [9] Macon, Michael W. and Jensen-Link, Leslie and Oliverio, James and Clements, Mark A. and George, E. Bryan, A Singing voice synthesis system based on sinusoidal modeling, Proc. of International Conference on Acoustics, Speech, and Signal Processing, Vol., pp , 997. [] Macon, Michael W., and Jensen-Link, Leslie and Oliverio, James and Clements, Mark A. and George, E. Bryan, "Concatenation-based MIDI-to-Singing Voice Synthesis," 3rd Meeting of the Audio Engineering Society, New York, 997. [] Macon, Michael W., M. W. Macon, Speech Synthesis Based on Sinusoidal Modeling, PhD thesis, Georgia Institute of Technology, October 996. [2] Ney, F., and Aubert, X., Dynamic programming search: from digit strings to large vocabulary word graphs, in C. H. Lee, F Soong, and K. Paliwal, eds., Automatic Speech and Speaker Recognition, Kluwer, Norwell, Mass., 996. [3] Rabiner, L., and Juang, B-H., Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, N.J., pp , 993. [4] Sundberg, Johan, Synthesis of Singing by Rule, in Current Directions in Computer Music Research (Max. V. Mathews and John. R. Pierce, eds.), pp , MIT Press, 989. [5] Yiying Zhang, Xiaoyan Zhu, Yu Hao, Yupin Luo, A robust and fast endpoint detection algorithm for isolated word recognition, Intelligent Processing Systems, 997. ICIPS ' IEEE International Conference on Volume: 2, 997, Page(s): vol.2
Music Radar: A Web-based Query by Humming System
Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,
More informationSinging voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm
Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm ALEJANDRO RAMOS-AMÉZQUITA Computer Science Department Tecnológico de Monterrey (Campus Ciudad de México)
More informationExpressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016
Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Jordi Bonada, Martí Umbert, Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Spain jordi.bonada@upf.edu,
More informationStudy of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet
American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629
More informationInstrument Recognition in Polyphonic Mixtures Using Spectral Envelopes
Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu
More informationLaboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB
Laboratory Assignment 3 Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB PURPOSE In this laboratory assignment, you will use MATLAB to synthesize the audio tones that make up a well-known
More informationSINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION
th International Society for Music Information Retrieval Conference (ISMIR ) SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION Chao-Ling Hsu Jyh-Shing Roger Jang
More informationA prototype system for rule-based expressive modifications of audio recordings
International Symposium on Performance Science ISBN 0-00-000000-0 / 000-0-00-000000-0 The Author 2007, Published by the AEC All rights reserved A prototype system for rule-based expressive modifications
More informationRobert Alexandru Dobre, Cristian Negrescu
ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q
More informationAPPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC
APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,
More informationQuery By Humming: Finding Songs in a Polyphonic Database
Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu
More informationA Study of Synchronization of Audio Data with Symbolic Data. Music254 Project Report Spring 2007 SongHui Chon
A Study of Synchronization of Audio Data with Symbolic Data Music254 Project Report Spring 2007 SongHui Chon Abstract This paper provides an overview of the problem of audio and symbolic synchronization.
More informationAutomatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting
Automatic Commercial Monitoring for TV Broadcasting Using Audio Fingerprinting Dalwon Jang 1, Seungjae Lee 2, Jun Seok Lee 2, Minho Jin 1, Jin S. Seo 2, Sunil Lee 1 and Chang D. Yoo 1 1 Korea Advanced
More informationCSC475 Music Information Retrieval
CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0
More information1. Introduction NCMMSC2009
NCMMSC9 Speech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices * Takeshi SAITOU 1, Masataka GOTO 1, Masashi
More informationPitch correction on the human voice
University of Arkansas, Fayetteville ScholarWorks@UARK Computer Science and Computer Engineering Undergraduate Honors Theses Computer Science and Computer Engineering 5-2008 Pitch correction on the human
More informationDETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION
DETECTION OF SLOW-MOTION REPLAY SEGMENTS IN SPORTS VIDEO FOR HIGHLIGHTS GENERATION H. Pan P. van Beek M. I. Sezan Electrical & Computer Engineering University of Illinois Urbana, IL 6182 Sharp Laboratories
More information2. AN INTROSPECTION OF THE MORPHING PROCESS
1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,
More informationCommunication Lab. Assignment On. Bi-Phase Code and Integrate-and-Dump (DC 7) MSc Telecommunications and Computer Networks Engineering
Faculty of Engineering, Science and the Built Environment Department of Electrical, Computer and Communications Engineering Communication Lab Assignment On Bi-Phase Code and Integrate-and-Dump (DC 7) MSc
More informationAdvanced Signal Processing 2
Advanced Signal Processing 2 Synthesis of Singing 1 Outline Features and requirements of signing synthesizers HMM based synthesis of singing Articulatory synthesis of singing Examples 2 Requirements of
More informationAvailable online at ScienceDirect. Procedia Computer Science 46 (2015 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 381 387 International Conference on Information and Communication Technologies (ICICT 2014) Music Information
More informationAutomatic Construction of Synthetic Musical Instruments and Performers
Ph.D. Thesis Proposal Automatic Construction of Synthetic Musical Instruments and Performers Ning Hu Carnegie Mellon University Thesis Committee Roger B. Dannenberg, Chair Michael S. Lewicki Richard M.
More informationMANDARIN SINGING VOICE SYNTHESIS BASED ON HARMONIC PLUS NOISE MODEL AND SINGING EXPRESSION ANALYSIS
MANDARIN SINGING VOICE SYNTHESIS BASED ON HARMONIC PLUS NOISE MODEL AND SINGING EXPRESSION ANALYSIS Ju-Chiang Wang Hung-Yan Gu Hsin-Min Wang Institute of Information Science, Academia Sinica Dept. of Computer
More informationThe Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng
The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng S. Zhu, P. Ji, W. Kuang and J. Yang Institute of Acoustics, CAS, O.21, Bei-Si-huan-Xi Road, 100190 Beijing,
More informationELEC 484 Project Pitch Synchronous Overlap-Add
ELEC 484 Project Pitch Synchronous Overlap-Add Joshua Patton University of Victoria, BC, Canada This report will discuss steps towards implementing a real-time audio system based on the Pitch Synchronous
More informationComputer Coordination With Popular Music: A New Research Agenda 1
Computer Coordination With Popular Music: A New Research Agenda 1 Roger B. Dannenberg roger.dannenberg@cs.cmu.edu http://www.cs.cmu.edu/~rbd School of Computer Science Carnegie Mellon University Pittsburgh,
More informationInternational Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013
Carnatic Swara Synthesizer (CSS) Design for different Ragas Shruti Iyengar, Alice N Cheeran Abstract Carnatic music is one of the oldest forms of music and is one of two main sub-genres of Indian Classical
More informationMusic Source Separation
Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or
More informationSpeech and Speaker Recognition for the Command of an Industrial Robot
Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.
More informationNormalized Cumulative Spectral Distribution in Music
Normalized Cumulative Spectral Distribution in Music Young-Hwan Song, Hyung-Jun Kwon, and Myung-Jin Bae Abstract As the remedy used music becomes active and meditation effect through the music is verified,
More informationQuarterly Progress and Status Report. Formant frequency tuning in singing
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Formant frequency tuning in singing Carlsson-Berndtsson, G. and Sundberg, J. journal: STL-QPSR volume: 32 number: 1 year: 1991 pages:
More informationA Music Retrieval System Using Melody and Lyric
202 IEEE International Conference on Multimedia and Expo Workshops A Music Retrieval System Using Melody and Lyric Zhiyuan Guo, Qiang Wang, Gang Liu, Jun Guo, Yueming Lu 2 Pattern Recognition and Intelligent
More informationSinging-voice Synthesis Using ANN Vibrato-parameter Models *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, 425-442 (2014) Singing-voice Synthesis Using ANN Vibrato-parameter Models * Department of Computer Science and Information Engineering National Taiwan
More informationMusic Representations
Lecture Music Processing Music Representations Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals
More informationA QUERY BY EXAMPLE MUSIC RETRIEVAL ALGORITHM
A QUER B EAMPLE MUSIC RETRIEVAL ALGORITHM H. HARB AND L. CHEN Maths-Info department, Ecole Centrale de Lyon. 36, av. Guy de Collongue, 69134, Ecully, France, EUROPE E-mail: {hadi.harb, liming.chen}@ec-lyon.fr
More information1 Introduction to PSQM
A Technical White Paper on Sage s PSQM Test Renshou Dai August 7, 2000 1 Introduction to PSQM 1.1 What is PSQM test? PSQM stands for Perceptual Speech Quality Measure. It is an ITU-T P.861 [1] recommended
More informationDepartment of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement
Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine Project: Real-Time Speech Enhancement Introduction Telephones are increasingly being used in noisy
More informationProposal for Application of Speech Techniques to Music Analysis
Proposal for Application of Speech Techniques to Music Analysis 1. Research on Speech and Music Lin Zhong Dept. of Electronic Engineering Tsinghua University 1. Goal Speech research from the very beginning
More informationMachine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas
Machine Learning Term Project Write-up Creating Models of Performers of Chopin Mazurkas Marcello Herreshoff In collaboration with Craig Sapp (craig@ccrma.stanford.edu) 1 Motivation We want to generative
More informationAdaptive Resampling - Transforming From the Time to the Angle Domain
Adaptive Resampling - Transforming From the Time to the Angle Domain Jason R. Blough, Ph.D. Assistant Professor Mechanical Engineering-Engineering Mechanics Department Michigan Technological University
More informationSystem Identification
System Identification Arun K. Tangirala Department of Chemical Engineering IIT Madras July 26, 2013 Module 9 Lecture 2 Arun K. Tangirala System Identification July 26, 2013 16 Contents of Lecture 2 In
More informationRegion Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling
International Conference on Electronic Design and Signal Processing (ICEDSP) 0 Region Adaptive Unsharp Masking based DCT Interpolation for Efficient Video Intra Frame Up-sampling Aditya Acharya Dept. of
More informationAN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY
AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT
More informationMelody Retrieval On The Web
Melody Retrieval On The Web Thesis proposal for the degree of Master of Science at the Massachusetts Institute of Technology M.I.T Media Laboratory Fall 2000 Thesis supervisor: Barry Vercoe Professor,
More informationQuarterly Progress and Status Report. Musicians and nonmusicians sensitivity to differences in music performance
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Musicians and nonmusicians sensitivity to differences in music performance Sundberg, J. and Friberg, A. and Frydén, L. journal:
More informationSYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS
Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL
More informationhit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.
CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating
More informationOBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES
OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,
More informationAnalysis, Synthesis, and Perception of Musical Sounds
Analysis, Synthesis, and Perception of Musical Sounds The Sound of Music James W. Beauchamp Editor University of Illinois at Urbana, USA 4y Springer Contents Preface Acknowledgments vii xv 1. Analysis
More informationAutomatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 31, 821-838 (2015) Automatic Singing Performance Evaluation Using Accompanied Vocals as Reference Bases * Department of Electronic Engineering National Taipei
More informationSinging voice synthesis based on deep neural networks
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda
More informationMelody transcription for interactive applications
Melody transcription for interactive applications Rodger J. McNab and Lloyd A. Smith {rjmcnab,las}@cs.waikato.ac.nz Department of Computer Science University of Waikato, Private Bag 3105 Hamilton, New
More informationPitch-Synchronous Spectrogram: Principles and Applications
Pitch-Synchronous Spectrogram: Principles and Applications C. Julian Chen Department of Applied Physics and Applied Mathematics May 24, 2018 Outline The traditional spectrogram Observations with the electroglottograph
More informationPOST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS
POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS Andrew N. Robertson, Mark D. Plumbley Centre for Digital Music
More informationTempo and Beat Analysis
Advanced Course Computer Science Music Processing Summer Term 2010 Meinard Müller, Peter Grosche Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Tempo and Beat Analysis Musical Properties:
More informationDISTRIBUTION STATEMENT A 7001Ö
Serial Number 09/678.881 Filing Date 4 October 2000 Inventor Robert C. Higgins NOTICE The above identified patent application is available for licensing. Requests for information should be addressed to:
More informationAcoustic Measurements Using Common Computer Accessories: Do Try This at Home. Dale H. Litwhiler, Terrance D. Lovell
Abstract Acoustic Measurements Using Common Computer Accessories: Do Try This at Home Dale H. Litwhiler, Terrance D. Lovell Penn State Berks-LehighValley College This paper presents some simple techniques
More informationACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal
ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING José Ventura, Ricardo Sousa and Aníbal Ferreira University of Porto - Faculty of Engineering -DEEC Porto, Portugal ABSTRACT Vibrato is a frequency
More informationAUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC
AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC A Thesis Presented to The Academic Faculty by Xiang Cao In Partial Fulfillment of the Requirements for the Degree Master of Science
More informationDoubletalk Detection
ELEN-E4810 Digital Signal Processing Fall 2004 Doubletalk Detection Adam Dolin David Klaver Abstract: When processing a particular voice signal it is often assumed that the signal contains only one speaker,
More informationA METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS
A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS Matthew Roddy Dept. of Computer Science and Information Systems, University of Limerick, Ireland Jacqueline Walker
More informationTempo Estimation and Manipulation
Hanchel Cheng Sevy Harris I. Introduction Tempo Estimation and Manipulation This project was inspired by the idea of a smart conducting baton which could change the sound of audio in real time using gestures,
More informationAutomatic characterization of ornamentation from bassoon recordings for expressive synthesis
Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Montserrat Puiggròs, Emilia Gómez, Rafael Ramírez, Xavier Serra Music technology Group Universitat Pompeu Fabra
More informationMusic Alignment and Applications. Introduction
Music Alignment and Applications Roger B. Dannenberg Schools of Computer Science, Art, and Music Introduction Music information comes in many forms Digital Audio Multi-track Audio Music Notation MIDI Structured
More informationDSP First Lab 04: Synthesis of Sinusoidal Signals - Music Synthesis
DSP First Lab 04: Synthesis of Sinusoidal Signals - Music Synthesis Pre-Lab and Warm-Up: You should read at least the Pre-Lab and Warm-up sections of this lab assignment and go over all exercises in the
More informationA Modified Static Contention Free Single Phase Clocked Flip-flop Design for Low Power Applications
JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.8, NO.5, OCTOBER, 08 ISSN(Print) 598-657 https://doi.org/57/jsts.08.8.5.640 ISSN(Online) -4866 A Modified Static Contention Free Single Phase Clocked
More informationNUMEROUS elaborate attempts have been made in the
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 46, NO. 12, DECEMBER 1998 1555 Error Protection for Progressive Image Transmission Over Memoryless and Fading Channels P. Greg Sherwood and Kenneth Zeger, Senior
More informationVoice & Music Pattern Extraction: A Review
Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation
More informationWelcome to Vibrationdata
Welcome to Vibrationdata Acoustics Shock Vibration Signal Processing February 2004 Newsletter Greetings Feature Articles Speech is perhaps the most important characteristic that distinguishes humans from
More informationVOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION
VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION Tomoyasu Nakano Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan
More informationFrame Synchronization in Digital Communication Systems
Quest Journals Journal of Software Engineering and Simulation Volume 3 ~ Issue 6 (2017) pp: 06-11 ISSN(Online) :2321-3795 ISSN (Print):2321-3809 www.questjournals.org Research Paper Frame Synchronization
More informationComparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction
Comparison of Dictionary-Based Approaches to Automatic Repeating Melody Extraction Hsuan-Huei Shih, Shrikanth S. Narayanan and C.-C. Jay Kuo Integrated Media Systems Center and Department of Electrical
More informationSINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam
SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG Sangeon Yong, Juhan Nam Graduate School of Culture Technology, KAIST {koragon2, juhannam}@kaist.ac.kr ABSTRACT We present a vocal
More information... A Pseudo-Statistical Approach to Commercial Boundary Detection. Prasanna V Rangarajan Dept of Electrical Engineering Columbia University
A Pseudo-Statistical Approach to Commercial Boundary Detection........ Prasanna V Rangarajan Dept of Electrical Engineering Columbia University pvr2001@columbia.edu 1. Introduction Searching and browsing
More informationLab experience 1: Introduction to LabView
Lab experience 1: Introduction to LabView LabView is software for the real-time acquisition, processing and visualization of measured data. A LabView program is called a Virtual Instrument (VI) because
More informationA repetition-based framework for lyric alignment in popular songs
A repetition-based framework for lyric alignment in popular songs ABSTRACT LUONG Minh Thang and KAN Min Yen Department of Computer Science, School of Computing, National University of Singapore We examine
More informationAudio-Based Video Editing with Two-Channel Microphone
Audio-Based Video Editing with Two-Channel Microphone Tetsuya Takiguchi Organization of Advanced Science and Technology Kobe University, Japan takigu@kobe-u.ac.jp Yasuo Ariki Organization of Advanced Science
More informationSinger Traits Identification using Deep Neural Network
Singer Traits Identification using Deep Neural Network Zhengshan Shi Center for Computer Research in Music and Acoustics Stanford University kittyshi@stanford.edu Abstract The author investigates automatic
More informationHow to Obtain a Good Stereo Sound Stage in Cars
Page 1 How to Obtain a Good Stereo Sound Stage in Cars Author: Lars-Johan Brännmark, Chief Scientist, Dirac Research First Published: November 2017 Latest Update: November 2017 Designing a sound system
More informationMusic Information Retrieval Using Audio Input
Music Information Retrieval Using Audio Input Lloyd A. Smith, Rodger J. McNab and Ian H. Witten Department of Computer Science University of Waikato Private Bag 35 Hamilton, New Zealand {las, rjmcnab,
More informationTopic 4. Single Pitch Detection
Topic 4 Single Pitch Detection What is pitch? A perceptual attribute, so subjective Only defined for (quasi) harmonic sounds Harmonic sounds are periodic, and the period is 1/F0. Can be reliably matched
More informationMusic 209 Advanced Topics in Computer Music Lecture 4 Time Warping
Music 209 Advanced Topics in Computer Music Lecture 4 Time Warping 2006-2-9 Professor David Wessel (with John Lazzaro) (cnmat.berkeley.edu/~wessel, www.cs.berkeley.edu/~lazzaro) www.cs.berkeley.edu/~lazzaro/class/music209
More informationA System for Acoustic Chord Transcription and Key Extraction from Audio Using Hidden Markov models Trained on Synthesized Audio
Curriculum Vitae Kyogu Lee Advanced Technology Center, Gracenote Inc. 2000 Powell Street, Suite 1380 Emeryville, CA 94608 USA Tel) 1-510-428-7296 Fax) 1-510-547-9681 klee@gracenote.com kglee@ccrma.stanford.edu
More informationPitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound
Pitch Perception and Grouping HST.723 Neural Coding and Perception of Sound Pitch Perception. I. Pure Tones The pitch of a pure tone is strongly related to the tone s frequency, although there are small
More informationInvestigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing
Universal Journal of Electrical and Electronic Engineering 4(2): 67-72, 2016 DOI: 10.13189/ujeee.2016.040204 http://www.hrpub.org Investigation of Digital Signal Processing of High-speed DACs Signals for
More informationRec. ITU-R BT RECOMMENDATION ITU-R BT * WIDE-SCREEN SIGNALLING FOR BROADCASTING
Rec. ITU-R BT.111-2 1 RECOMMENDATION ITU-R BT.111-2 * WIDE-SCREEN SIGNALLING FOR BROADCASTING (Signalling for wide-screen and other enhanced television parameters) (Question ITU-R 42/11) Rec. ITU-R BT.111-2
More informationPitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.
Pitch The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. 1 The bottom line Pitch perception involves the integration of spectral (place)
More informationSinging Pitch Extraction and Singing Voice Separation
Singing Pitch Extraction and Singing Voice Separation Advisor: Jyh-Shing Roger Jang Presenter: Chao-Ling Hsu Multimedia Information Retrieval Lab (MIR) Department of Computer Science National Tsing Hua
More informationKeywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Investigation
More informationMusical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering
Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals By: Ed Doering Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals By: Ed Doering Online:
More informationKeywords Xilinx ISE, LUT, FIR System, SDR, Spectrum- Sensing, FPGA, Memory- optimization, A-OMS LUT.
An Advanced and Area Optimized L.U.T Design using A.P.C. and O.M.S K.Sreelakshmi, A.Srinivasa Rao Department of Electronics and Communication Engineering Nimra College of Engineering and Technology Krishna
More informationWeek 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University
Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based
More informationTOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC
TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC G.TZANETAKIS, N.HU, AND R.B. DANNENBERG Computer Science Department, Carnegie Mellon University 5000 Forbes Avenue, Pittsburgh, PA 15213, USA E-mail: gtzan@cs.cmu.edu
More informationImplementation of Memory Based Multiplication Using Micro wind Software
Implementation of Memory Based Multiplication Using Micro wind Software U.Palani 1, M.Sujith 2,P.Pugazhendiran 3 1 IFET College of Engineering, Department of Information Technology, Villupuram 2,3 IFET
More informationBASE-LINE WANDER & LINE CODING
BASE-LINE WANDER & LINE CODING PREPARATION... 28 what is base-line wander?... 28 to do before the lab... 29 what we will do... 29 EXPERIMENT... 30 overview... 30 observing base-line wander... 30 waveform
More informationMusic Segmentation Using Markov Chain Methods
Music Segmentation Using Markov Chain Methods Paul Finkelstein March 8, 2011 Abstract This paper will present just how far the use of Markov Chains has spread in the 21 st century. We will explain some
More informationColor Image Compression Using Colorization Based On Coding Technique
Color Image Compression Using Colorization Based On Coding Technique D.P.Kawade 1, Prof. S.N.Rawat 2 1,2 Department of Electronics and Telecommunication, Bhivarabai Sawant Institute of Technology and Research
More informationPCM ENCODING PREPARATION... 2 PCM the PCM ENCODER module... 4
PCM ENCODING PREPARATION... 2 PCM... 2 PCM encoding... 2 the PCM ENCODER module... 4 front panel features... 4 the TIMS PCM time frame... 5 pre-calculations... 5 EXPERIMENT... 5 patching up... 6 quantizing
More informationControlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach
Controlling Musical Tempo from Dance Movement in Real-Time: A Possible Approach Carlos Guedes New York University email: carlos.guedes@nyu.edu Abstract In this paper, I present a possible approach for
More informationTANSEN: A QUERY-BY-HUMMING BASED MUSIC RETRIEVAL SYSTEM. M. Anand Raju, Bharat Sundaram* and Preeti Rao
TANSEN: A QUERY-BY-HUMMING BASE MUSIC RETRIEVAL SYSTEM M. Anand Raju, Bharat Sundaram* and Preeti Rao epartment of Electrical Engineering, Indian Institute of Technology, Bombay Powai, Mumbai 400076 {maji,prao}@ee.iitb.ac.in
More information