Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm

Size: px
Start display at page:

Download "Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm"

Transcription

1 Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm ALEJANDRO RAMOS-AMÉZQUITA Computer Science Department Tecnológico de Monterrey (Campus Ciudad de México) Calle del Puente 222, Colonia Ejidos de HuipulcoTlalpan 14380, México D.F. MEXICO Abstract: -The present work shows the development of a Spanish singing voice synthesizer where a TD- PSOLA algorithm is applied. The main goal of the development was to test the hypothesis that while diphones are linguistically the units with the best intelligibility-flexibility compromise for the purposes of spoken voice synthesis, it is the syllables the best suited units for concatenation singing voice synthesis. Such hypothesis is particularly strong for Spanish, since its rules for syllable construction are comprehensive, relatively simple, and only a handful.to test the hypothesis a relatively small amount of vocals and syllables in Spanish were recorded by a soprano singer at both F4 and C5 tones, with duration of 1 second each(±0.2sec.). The modification of the syllables was carried only in regards to tone and duration.matlab was used as the programming platform mainly because of the author s relative expertise on it. To evaluate the performance of the system several melodic tasks were asked of it including the singing of a popular Mexican song (Las Mañanitas). Results show that a highly intelligible synthesized Spanish singing voice based on syllable concatenation can be achieved with minimum control mechanisms. While the time duration variation introduces very few noticeable digital errors, a transposition of up to a just fourth was possible without generating very obvious digital errors. A variation of 5% (0.05) in the frequency scale corresponds to a semitone variation in the equally tempered modern scale. Key-Words: -Singing voice, synthesis, concatenation, syllables, Spanish, Time Domain, PSOLA. 1 Introduction The singing voice synthesis springs from two wellresearched fields: spoken voice coding and synthesis, and musical instrument synthesis. The field of voice synthesis was studied since the XVIII century when Wolfang von Kempelen built his mechanical synthesizer. On the other side, Leon Theremin created the first well-known electrical synthesizer and in 1939 the Vocoder (Voice Coder),built by Homer Dudley, constituted the first Singing Voice synthesizer.nowadays, the human voice synthesis models can be classified in spectral and physical models. The first are based fundamentally in mechanisms of the hearing perception, while the later are based on the modeling of the mechanisms of production of the sound sources. The benefit of these last ones is the fact that the parameters used in such are closely related to the control mechanisms that a singer uses on his or her own vocal system, and therefore some of the actual control mechanisms are incorporated in the design. However, the amount of parameters and their mapping onto the intuitive controls of the production mechanisms to the output of the model isn t a trivial task [1]. A wide variety of pseudo physical models are also available. In these, the model is decomposed in a particular source and a vocal tract. A typical example is the lineal prediction method for which the resonances of the vocal tract are modeled as poles in a filter, and the residual error is considered a signal. The problem with such approach is that the filter modification does not produce the expected results since there is a lot more to it than just the glottal excitation of the sound source signal. This is due to the non-linear effects of the vocal tract that such model cannot reproduce. On the other hand, the Vocoders divide the speech spectrum in channels for which the gain and source parameters are approached, while the formant synthesizers resemble the linear prediction method in the sense that they include the option for different sound sources, voiced and unvoiced, and the vocal tract is modeled by a set of formant filters [4]. The formant wave functions and the frequencymodulated approach are slightly different spectral ISBN:

2 methods. The first ones model the formant s impulse response in the time domain and each one of these functions can be exited to the fundamental frequency required to produce the singing voice. The Frequency modulated approach tries to generate a spectrum that resembles the one of the singing voice involving two oscillators, a carrier and another one that drives its frequency, having excellent results in elegantly modeling the singer s effort. However, the most successful examples for Voice Synthesis are those based on the sampling approach, in which the output signal is the result of the sequential concatenation of samples of a particular database. As such, this does not constitute a synthesis technique, but according to author s Bonada and Serra [1] it should be regarded as a synthesis model. The success of the method lies in its simplicity, and of the fact that it captures de natural sound from its real counterpart. The most fundamental problem of the approach is the lack of flexibility and expression that a professional musician expects of an instrument. For such instruments, utilizing large databases and sampling a wide portion of the instrument s sound space can achieve and acceptable quality level. Although as technology progresses the storing capacity becomes less and less of an impediment, it remains an issue that one must considerate. A reasonable database can be used when a way to morph a particular sound in the database into a different one is incorporated in a manner in which the outcome sounds natural as well. To achieve such a transformation, one can parameterize the stored samples, or one can store exact waveforms in the database, as it happens with the PSOLA methods. The PSOLA method (Pitch Synchronous Overlap and Add) is one that decomposes the signal in elementary waveforms, each one corresponding in length to the period of a tone. When such waveforms are added with a certain amount of overlapping, the original signal can be rebuilt. A temporal modification can be achieved by the repetition or elimination of tone periods. By changing the time between elementary waveforms the modification of frequency can be achieved. Such methods work only for the voiced sections of the signal, since there has to be a fundamental period for the model to work. The PSOLA method works well for small transformations, and it resembles the wavetable reading method. 2 Problem Formulation Spoken voice synthesis and singing voice synthesis have been around for a long time, and yet there is still the issue of compromising the natural sound for intelligibility and vice versa. The differences between the singing and the spoken voice have not yet been understood, and there are reasons to believe that they might differ in basic aspects such as the fundamental phonetic unit. Also, the different human languages are distinct with regards to phones; graphs, pronunciation and orthographic rules, and therefore certain knowledge of the acoustic and phonetic characteristics must be developed for each of the languages individually. The present work is aimed to tackle such a notion particularly for the Spanish language, one for which little or no research has been developed for sung voice. The system developed is based in the synthesis model of concatenation of pre-recorded sung syllables, in an effort to prove or disprove the hypothesis that it is in fact the syllables the basic structural unit of the singing Spanish voice (at least in Mexican popular songs). The hypothesis is sustained by the fact in all Spanish lyrical scores a melodic line is present where there is a one-to-one correspondence between notes and syllables (except on melisma). Being the first approach to the singing voice synthesis in Spanish, the primary objectives were modest but extraordinarily important in the developmental steps of a complete and comprehensive Spanish singing voice synthesis system. The main objective was to simply test the results of applying the PSOLA algorithm to the modification of the tone and duration of a subset of syllables in Spanish, and their trial in a particular Mexican-Spanish popular song (Las Mañanitas). The results would be measured in terms of the balance between how natural the resulting voice would sound, and the intelligibility of the performance. 2.1 The PSOLA decision The problem with the concatenation of segments is that it isn t possible to generalize them in contexts that are not included in the training process due to the prosodic variability. For such purposes there are certain technics that allow the prosody modification in a unit to match a desired prosody. Although such technics degrade the quality of the synthesized voice, they also bring benefits that exceed theinconvenience of the distortion introduced by its use. The objective of the prosody modification is the change in amplitude, duration and tone of a voice segment. The modification of the amplitude can be ISBN:

3 easily achieved by direct multiplication, however, the duration and tone present more difficulties [5].The PSOLA algorithm allows for the smooth concatenation of prerecorded samples of spoken voice, and provides good tone and duration controls making it the perfect algorithmic choice for the present work. Although all the different PSOLA versions work in a similar manner, it is the time domain algorithm the most widely used due to its computational efficiency. The basic algorithm consists of three steps: 1) the analysis, in which the signal is first divided into separate short time signals that are often superimposed; 2) the modification of each analysis signal to generate the synthesized signal; and 3) the synthesis step, in which such segments are recombined by overlapping and adding [2]. The short time signals are obtained of the digital waveform, multiplying the signal by a window sequence of synchronized tone analysis. The window usually used is the Hanning window centered on successive instants called pitch marks. Such marks are placed at a synchronized rate with respect to the toneover the voiced sections, and at a constant rate over the unvoiced sections. The length of the window is proportional to the local tonal period and the window factor goes from 2 to 4. The pitch marks are determined either by inspection of the signal, or automatically by some estimation method. The segment recombination in the synthesis step is performed after defining the new sequence of pitch marks [2]. The frequency manipulation is achieved by the change of pitch marks intervals. The duration, on the other hand, is modified through the repetition or elimination of voice segments Spanish syllables and the PSOLA method Presumably, a syllable-based system would dramatically diminish the noise generated by the discontinuities of the concatenation procedure. This is due to their intrinsic articulatory characteristics since they include the language s co-articulation, and constitute a clear phonetic unit, which minimizes the border effect naturally present in systems based on other units like diphones. Also, the manner of syllable construction in Spanish facilitates the implementation of the PSOLA algorithm, since the tone of the note to which the syllable is associated always relates the fundamental frequency to the sung vowel. Being able to extract information of the tone, the PSOLA algorithms can control the most important voiced characteristic of the syllables, and more importantly, it can modify it. 3 Problem Solution The implementation of the PSOLA algorithm for this first stage of the development was carried on a Matlab platform, mainly because of the familiarity of the author with it, but also because of its straight forwardness and accessibility. The program basically consisted of one single function (tdpsola) that included a sub-function necessary to find the pitch marks (find_pmarks). The pre-recorded syllables were fed into the program in.wav format and the voice of a Mexican soprano singer was chosen for the construction of a small database of Spanish sung syllables that would allow the testing of the algorithm. 3.1 The database construction For the construction of the database a half hour warming session took place before the actual recording of the segments that included three note legato, chromatic scales with maximum range of an octave and a half, and arpeggios of one and a half octave. The singer s register is of two octaves, from C-4 to C-6. The recordings were carried in a home studio (5 x 5 x 2.5 meters) using a Pro Tools 7.4 system and a digidesignmbox digital audio card at a 48 khz sampling frequency and depth of 24 bits (highest available). An AKG 414 microphone model was used during the recordings in cardioid mode. Considering the vocal interval of most popular singers, it was required of the singer that the syllables were sung without vibrato at two different tones; F-4 and C-5 at a tempo of 60 bpm. The length of the syllables were of 1 second (±0.2sec.)and the selection of syllables to record included all the vowels and the syllables that are included in the popular songs that the system would be required to synthesize. 3.2 The tdpsola function The novelty of the application of the TD-PSOLA algorithm in the present work is the possibility of choosing the percentage of frequency and duration modification at the beginning of the signal and at the end of it, and the fact that is being applied to singing voice. The modification is distributed linearly throughout the signal. At the end of the modifying procedure, the edges of the modified signal are eliminated in terms of the first and last cycles since such areas are the most likely to present undesired features like distortion. The general structure of the main function tdpsolapresents six input parameters: 1) the syllable ISBN:

4 to modify, 2) the sampling frequency of the signal, 3) the frequency scaling percentage at the beginning of the syllable, 4) the frequency scaling percentage of the ending of the syllable, 5) the duration scaling percentage at the beginning of the syllable and 6) the duration scaling percentage at the end of the syllable Thepitch_marks function For the prosodic modification the most important step is the calculation of the new pitch marks, procedure by which the length between pitch marks can be obtained through the multiplication of the modification percentage times the actual length. After obtaining the number of modified pitch marks, the number of pitch marks to add or subtract has to be calculated for the first period. To increase the duration of a signal, the periods must be doubled until the number of samples required to achieve the wanted duration are less than the period duration. To decrease the duration of the signal the number of periods has to be eliminated until the number of samples to eliminate is less than the duration of the period. When duplicating windows, the even copies (2 nd, 4 th, 6 th, etc) are inverted to avoid the generation of periodic effects in aperiodic signals. The new windows, multiplied by the Hanning window, must be added to the output signal and its samples can in fact be overlapped with previous samples but the normalization factor has to be calculated to diminish the distortion that the Hanning window can introduce. The pitch_marksfunction has two input parameters, the audio signal to analyze and the sampling frequency, and it outputs the positions of the new pitch marks. This function, as can be inferred from the past paragraph, is the most fundamental section of the synthesiss systemand its functioning can be divided and explained in four steps: 1. The generation of the energy contour of the signal. 2. The approximation of the pitch marks from the local maxima in the energy contour. 3. The addition of pitch marks to the aperiodic sections of the signal. 4. The optimization of the pitch marks to the signal maxima. The calculation of the pitch marks was performed automatically by means of an algorithm developed by Vladimir Goncharoff and Patrick Gries of the University of Illinois in Chicago [3]. For the evaluation of the performance of the singing voice synthesizer, a reduced number of task related to melodic notions were given to it which involved the testing of the change of tone and duration, the execution of intervals in the mayor scale and the performance of vowel glissandos. Although it is difficult to report written results about a singing performance, it can be said that the most important test performed was the actual singing of the Mexican song, Las mañanitas. Fig. 1 Las Mañanitas, popular Mexican song. The lyrics on the 6 th and 7 th bar were replaced for the most common version: ser, día de tu santo te las can Fig. 2 Unmodifiedsyllable Es, 1st bar 3.3 Results and Results Analysis Fig. 3 ModifiedSyllable Es, 1st bar ISBN:

5 Fig. 4 Unmodifiedsyllable San, 7th bar introduces less noticeable errors since its applied mainly on the vowels of the segment, and the adding of new cycles is less problematic for voiced segments when no transposition is involved due to the fact that it has no spectral impact and the overlapping is unnoticeable. For lack of better adjectives, the performance of the synthesizer can vary from recited to expressive, and it was found that a better outcome could be achieved if the modification of the first pitch marks and the last coincides, making the execution more melodic and reducing the legato effect. This was to be expected since the period of a Spanish singing performer usually does not change within a syllable (supporting the idea that it is the fundamental singing unit), and when a singer forces a glissando into a syllable he or she probably does not do it linearly over voiced and unvoiced segments alike. Finally, it can be stated that although a formal intelligibility test has not yet been performed, the listener can clearly recognize the tune that is being sung and that a promising path towards a comprehensive Spanish singing voice synthesis system lies ahead. Fig. 5 ModifiedSyllable San, 7th bar Results show that a highly intelligible Spanish singing voice can be achieved with minimum control mechanisms. This in itself suggests the choosing of the syllables as the basic phonetic unit for singing as a right decision. However, the natural sound of the voice is, to some degree, compromised, and its severity depends of the interval, whether in frequency or duration, as well as the recorded execution of the database. As a manner of example, Figures 2 through 5 show spectral differences for the unvoiced S phonemes for different placements (beginning and end of the syllable), and for different frequency and temporal modification. Figures 3 and 5 show the result that the transposition of a recorded syllable with vibrato has, exaggerating such vibrato making it sound unnatural. Nevertheless, results show that depending on the syllable and when no vibrato is present, a transposition of up to a just fourth is possible without generating very obvious digital errors, and that a variation of 5% (0.05) in the frequency scale corresponds to a semitone variation in the equally tempered modern scale. Although the percentage of variation that the voice segments allow without the introduction of noticeable errors depended very much on the syllable itself, it was clear that the duration variation 4 Conclusion It is a common objective of any effort towards singing voice synthesis to achieve a synthesis engine capable of sounding as natural and expressive as a real singer, by having only the score and lyrics as input. The general architecture proposed to achieve such a goal includes a section for the generalization of the traditional score that can include any symbolic information required for the synthesizer s control; a section destined to translate the input controls into low level interpretative actions; another section that creates the parametric trajectories that express appropriately the different paths within the sound space of the instrument, and a module that contains the synthesis engine that produces the output signal by concatenation of a transformed sample sequence that resembles the performed trajectory[1].it is this author s opinion that just as an effective speaking voice synthesis was only possible after a fundamental understanding of the physical acoustics of the voice production was gathered, a well rounded singing voice synthesis (that includes both natural and intelligible outputs) is only possible after a profound knowledge of the singing voice sound space is obtained. Such sound space is undeniably related not only to the human physical ISBN:

6 singing capabilities and techniques, but also to the language sung, its phonetic and grammatical inner laws as well as the type of music style it forms a part of. Most of this knowledge is yet to be attained for Spanish language. Although of the general architecture of a singing voice synthesis system the present work only lacks the section for the generalization of the traditional score, its real contribution is the fact that it represents the means to obtain more information about the soundspace of the instrument itself. For the concatenation approach, the importance of the database is paramount since not only does it include the interpretative recordings, but it also carries the models and measurements related to the interpretative space and provide relevant information for the conversion of a high level representation (score) to an output signal. In that sense, substantial evidence has been gathered supporting the idea that in Spanish singing it is the syllable the basic phonetic unit and that understanding the singer s manipulation of such unit will not only provide the information for natural and expressive singing voice synthesis, but it will also shed light on Spanish singing techniques and Spanish language itself. Finally, acknowledging the fact that there are only around 2,000 Spanish syllables, the results obtained from this work lead to the promising conclusion that a complete and comprehensive Spanish singing synthesis system can be achieved with a syllabic database with low concatenation and prosodic distortion. In a second developmental stage, higher-level controls will be added to drive the transformation of the parametric trajectories to obtain a finer control from the engine, and the construction of a user interface (musical in nature) forthe generalization of the score would render the work complete. References: [1] J. Bonada and X. Sierra, Synthesis of the Singing Voice by Performance Sampling and Spectral Models, IEEE Signal Processing Magazine, Vol. 24, 2007, pp [2] S. Lemmety, Review of Speech Synthesis Technology, eses/lemmetty_mst/thesis.pdf, date of last access: April 9 th [3] V. Goncharoff, P. Gries. An Algorithm for Accurately Marking Pitch Pulses in Speech Signals, Proceedings of the IASTED International Conference in Signal and Image Processing, Las Vegas Nevada, USA, 1998, pp [4] V. Siivola. A Survey for Methods for the Synthesis of the Singing Voice, date of last access: April 9 th [5] X, Huang, A. Acero, and H. W. Hon, Spoken Language Processing: A guide to Theory, Algorithm and System Development, Prentice Hall, ISBN:

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016 Jordi Bonada, Martí Umbert, Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Spain jordi.bonada@upf.edu,

More information

Pitch-Synchronous Spectrogram: Principles and Applications

Pitch-Synchronous Spectrogram: Principles and Applications Pitch-Synchronous Spectrogram: Principles and Applications C. Julian Chen Department of Applied Physics and Applied Mathematics May 24, 2018 Outline The traditional spectrogram Observations with the electroglottograph

More information

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES Vishweshwara Rao and Preeti Rao Digital Audio Processing Lab, Electrical Engineering Department, IIT-Bombay, Powai,

More information

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC Vishweshwara Rao, Sachin Pant, Madhumita Bhaskar and Preeti Rao Department of Electrical Engineering, IIT Bombay {vishu, sachinp,

More information

Analysis, Synthesis, and Perception of Musical Sounds

Analysis, Synthesis, and Perception of Musical Sounds Analysis, Synthesis, and Perception of Musical Sounds The Sound of Music James W. Beauchamp Editor University of Illinois at Urbana, USA 4y Springer Contents Preface Acknowledgments vii xv 1. Analysis

More information

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH by Princy Dikshit B.E (C.S) July 2000, Mangalore University, India A Thesis Submitted to the Faculty of Old Dominion University in

More information

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring 2009 Week 6 Class Notes Pitch Perception Introduction Pitch may be described as that attribute of auditory sensation in terms

More information

AN ON-THE-FLY MANDARIN SINGING VOICE SYNTHESIS SYSTEM

AN ON-THE-FLY MANDARIN SINGING VOICE SYNTHESIS SYSTEM AN ON-THE-FLY MANDARIN SINGING VOICE SYNTHESIS SYSTEM Cheng-Yuan Lin*, J.-S. Roger Jang*, and Shaw-Hwa Hwang** *Dept. of Computer Science, National Tsing Hua University, Taiwan **Dept. of Electrical Engineering,

More information

2. AN INTROSPECTION OF THE MORPHING PROCESS

2. AN INTROSPECTION OF THE MORPHING PROCESS 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals,

More information

Quarterly Progress and Status Report. Formant frequency tuning in singing

Quarterly Progress and Status Report. Formant frequency tuning in singing Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Formant frequency tuning in singing Carlsson-Berndtsson, G. and Sundberg, J. journal: STL-QPSR volume: 32 number: 1 year: 1991 pages:

More information

Augmentation Matrix: A Music System Derived from the Proportions of the Harmonic Series

Augmentation Matrix: A Music System Derived from the Proportions of the Harmonic Series -1- Augmentation Matrix: A Music System Derived from the Proportions of the Harmonic Series JERICA OBLAK, Ph. D. Composer/Music Theorist 1382 1 st Ave. New York, NY 10021 USA Abstract: - The proportional

More information

1. Introduction NCMMSC2009

1. Introduction NCMMSC2009 NCMMSC9 Speech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices * Takeshi SAITOU 1, Masataka GOTO 1, Masashi

More information

AUD 6306 Speech Science

AUD 6306 Speech Science AUD 3 Speech Science Dr. Peter Assmann Spring semester 2 Role of Pitch Information Pitch contour is the primary cue for tone recognition Tonal languages rely on pitch level and differences to convey lexical

More information

Advanced Signal Processing 2

Advanced Signal Processing 2 Advanced Signal Processing 2 Synthesis of Singing 1 Outline Features and requirements of signing synthesizers HMM based synthesis of singing Articulatory synthesis of singing Examples 2 Requirements of

More information

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications Matthias Mauch Chris Cannam György Fazekas! 1 Matthias Mauch, Chris Cannam, George Fazekas Problem Intonation in Unaccompanied

More information

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG Sangeon Yong, Juhan Nam Graduate School of Culture Technology, KAIST {koragon2, juhannam}@kaist.ac.kr ABSTRACT We present a vocal

More information

Robert Alexandru Dobre, Cristian Negrescu

Robert Alexandru Dobre, Cristian Negrescu ECAI 2016 - International Conference 8th Edition Electronics, Computers and Artificial Intelligence 30 June -02 July, 2016, Ploiesti, ROMÂNIA Automatic Music Transcription Software Based on Constant Q

More information

A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS

A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS Matthew Roddy Dept. of Computer Science and Information Systems, University of Limerick, Ireland Jacqueline Walker

More information

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013 Carnatic Swara Synthesizer (CSS) Design for different Ragas Shruti Iyengar, Alice N Cheeran Abstract Carnatic music is one of the oldest forms of music and is one of two main sub-genres of Indian Classical

More information

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629

More information

Tempo Estimation and Manipulation

Tempo Estimation and Manipulation Hanchel Cheng Sevy Harris I. Introduction Tempo Estimation and Manipulation This project was inspired by the idea of a smart conducting baton which could change the sound of audio in real time using gestures,

More information

ELEC 484 Project Pitch Synchronous Overlap-Add

ELEC 484 Project Pitch Synchronous Overlap-Add ELEC 484 Project Pitch Synchronous Overlap-Add Joshua Patton University of Victoria, BC, Canada This report will discuss steps towards implementing a real-time audio system based on the Pitch Synchronous

More information

Speech and Speaker Recognition for the Command of an Industrial Robot

Speech and Speaker Recognition for the Command of an Industrial Robot Speech and Speaker Recognition for the Command of an Industrial Robot CLAUDIA MOISA*, HELGA SILAGHI*, ANDREI SILAGHI** *Dept. of Electric Drives and Automation University of Oradea University Street, nr.

More information

Music Radar: A Web-based Query by Humming System

Music Radar: A Web-based Query by Humming System Music Radar: A Web-based Query by Humming System Lianjie Cao, Peng Hao, Chunmeng Zhou Computer Science Department, Purdue University, 305 N. University Street West Lafayette, IN 47907-2107 {cao62, pengh,

More information

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution. CS 229 FINAL PROJECT A SOUNDHOUND FOR THE SOUNDS OF HOUNDS WEAKLY SUPERVISED MODELING OF ANIMAL SOUNDS ROBERT COLCORD, ETHAN GELLER, MATTHEW HORTON Abstract: We propose a hybrid approach to generating

More information

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. Pitch The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high. 1 The bottom line Pitch perception involves the integration of spectral (place)

More information

Northern Territory Music School Vocal Syllabus

Northern Territory Music School Vocal Syllabus Northern Territory Music School Vocal Syllabus Introduction to the NT Music School Vocal Syllabus. Work on the Northern Territory Music School (NTMS) Vocal Syllabus (formerly referred to as Levels of Attainment)

More information

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Comparison Parameters and Speaker Similarity Coincidence Criteria: Comparison Parameters and Speaker Similarity Coincidence Criteria: The Easy Voice system uses two interrelating parameters of comparison (first and second error types). False Rejection, FR is a probability

More information

EE-217 Final Project The Hunt for Noise (and All Things Audible)

EE-217 Final Project The Hunt for Noise (and All Things Audible) EE-217 Final Project The Hunt for Noise (and All Things Audible) 5-7-14 Introduction Noise is in everything. All modern communication systems must deal with noise in one way or another. Different types

More information

A prototype system for rule-based expressive modifications of audio recordings

A prototype system for rule-based expressive modifications of audio recordings International Symposium on Performance Science ISBN 0-00-000000-0 / 000-0-00-000000-0 The Author 2007, Published by the AEC All rights reserved A prototype system for rule-based expressive modifications

More information

Singing voice synthesis based on deep neural networks

Singing voice synthesis based on deep neural networks INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Singing voice synthesis based on deep neural networks Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda

More information

CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION

CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION Emilia Gómez, Gilles Peterschmitt, Xavier Amatriain, Perfecto Herrera Music Technology Group Universitat Pompeu

More information

Florida Performing Fine Arts Assessment Item Specifications for Benchmarks in Course: Chorus 5 Honors

Florida Performing Fine Arts Assessment Item Specifications for Benchmarks in Course: Chorus 5 Honors Task A/B/C/D Item Type Florida Performing Fine Arts Assessment Course Title: Chorus 5 Honors Course Number: 1303340 Abbreviated Title: CHORUS 5 HON Course Length: Year Course Level: 2 Credit: 1.0 Graduation

More information

Pitch correction on the human voice

Pitch correction on the human voice University of Arkansas, Fayetteville ScholarWorks@UARK Computer Science and Computer Engineering Undergraduate Honors Theses Computer Science and Computer Engineering 5-2008 Pitch correction on the human

More information

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS Item Type text; Proceedings Authors Habibi, A. Publisher International Foundation for Telemetering Journal International Telemetering Conference Proceedings

More information

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB Laboratory Assignment 3 Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB PURPOSE In this laboratory assignment, you will use MATLAB to synthesize the audio tones that make up a well-known

More information

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS

SYNTHESIS FROM MUSICAL INSTRUMENT CHARACTER MAPS Published by Institute of Electrical Engineers (IEE). 1998 IEE, Paul Masri, Nishan Canagarajah Colloquium on "Audio and Music Technology"; November 1998, London. Digest No. 98/470 SYNTHESIS FROM MUSICAL

More information

Music Representations

Music Representations Lecture Music Processing Music Representations Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Book: Fundamentals of Music Processing Meinard Müller Fundamentals

More information

Type-2 Fuzzy Logic Sensor Fusion for Fire Detection Robots

Type-2 Fuzzy Logic Sensor Fusion for Fire Detection Robots Proceedings of the 2 nd International Conference of Control, Dynamic Systems, and Robotics Ottawa, Ontario, Canada, May 7 8, 2015 Paper No. 187 Type-2 Fuzzy Logic Sensor Fusion for Fire Detection Robots

More information

Query By Humming: Finding Songs in a Polyphonic Database

Query By Humming: Finding Songs in a Polyphonic Database Query By Humming: Finding Songs in a Polyphonic Database John Duchi Computer Science Department Stanford University jduchi@stanford.edu Benjamin Phipps Computer Science Department Stanford University bphipps@stanford.edu

More information

Speaking in Minor and Major Keys

Speaking in Minor and Major Keys Chapter 5 Speaking in Minor and Major Keys 5.1. Introduction 28 The prosodic phenomena discussed in the foregoing chapters were all instances of linguistic prosody. Prosody, however, also involves extra-linguistic

More information

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg Making music with voice MENU: A: The instrument B: Getting heard C: Expressivity The instrument Summary RADIATED SPECTRUM Level Frequency Velum VOCAL TRACT Frequency curve Formants Level Level Frequency

More information

Precision testing methods of Event Timer A032-ET

Precision testing methods of Event Timer A032-ET Precision testing methods of Event Timer A032-ET Event Timer A032-ET provides extreme precision. Therefore exact determination of its characteristics in commonly accepted way is impossible or, at least,

More information

An interdisciplinary approach to audio effect classification

An interdisciplinary approach to audio effect classification An interdisciplinary approach to audio effect classification Vincent Verfaille, Catherine Guastavino Caroline Traube, SPCL / CIRMMT, McGill University GSLIS / CIRMMT, McGill University LIAM / OICM, Université

More information

Design of a pitch quantization and pitch correction system for real-time music effects signal processing

Design of a pitch quantization and pitch correction system for real-time music effects signal processing Design of a pitch quantization and pitch correction system for real-time music effects signal processing Corey Cheng * * Massachusetts Institute of Technology, 617-253-2268, coreyc@mit.edu EconoSonoMetrics,

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Journal of Energy and Power Engineering 10 (2016) 504-512 doi: 10.17265/1934-8975/2016.08.007 D DAVID PUBLISHING A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations

More information

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound Pitch Perception and Grouping HST.723 Neural Coding and Perception of Sound Pitch Perception. I. Pure Tones The pitch of a pure tone is strongly related to the tone s frequency, although there are small

More information

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS Rui Pedro Paiva CISUC Centre for Informatics and Systems of the University of Coimbra Department

More information

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes hello Jay Biernat Third author University of Rochester University of Rochester Affiliation3 words jbiernat@ur.rochester.edu author3@ismir.edu

More information

Work Package 9. Deliverable 32. Statistical Comparison of Islamic and Byzantine chant in the Worship Spaces

Work Package 9. Deliverable 32. Statistical Comparison of Islamic and Byzantine chant in the Worship Spaces Work Package 9 Deliverable 32 Statistical Comparison of Islamic and Byzantine chant in the Worship Spaces Table Of Contents 1 INTRODUCTION... 3 1.1 SCOPE OF WORK...3 1.2 DATA AVAILABLE...3 2 PREFIX...

More information

Bertsokantari: a TTS based singing synthesis system

Bertsokantari: a TTS based singing synthesis system INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Bertsokantari: a TTS based singing synthesis system Eder del Blanco 1, Inma Hernaez 1, Eva Navas 1, Xabier Sarasola 1, Daniel Erro 1,2 1 AHOLAB

More information

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T ) REFERENCES: 1.) Charles Taylor, Exploring Music (Music Library ML3805 T225 1992) 2.) Juan Roederer, Physics and Psychophysics of Music (Music Library ML3805 R74 1995) 3.) Physics of Sound, writeup in this

More information

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY Eugene Mikyung Kim Department of Music Technology, Korea National University of Arts eugene@u.northwestern.edu ABSTRACT

More information

A comparison of the acoustic vowel spaces of speech and song*20

A comparison of the acoustic vowel spaces of speech and song*20 Linguistic Research 35(2), 381-394 DOI: 10.17250/khisli.35.2.201806.006 A comparison of the acoustic vowel spaces of speech and song*20 Evan D. Bradley (The Pennsylvania State University Brandywine) Bradley,

More information

HST 725 Music Perception & Cognition Assignment #1 =================================================================

HST 725 Music Perception & Cognition Assignment #1 ================================================================= HST.725 Music Perception and Cognition, Spring 2009 Harvard-MIT Division of Health Sciences and Technology Course Director: Dr. Peter Cariani HST 725 Music Perception & Cognition Assignment #1 =================================================================

More information

Music 209 Advanced Topics in Computer Music Lecture 4 Time Warping

Music 209 Advanced Topics in Computer Music Lecture 4 Time Warping Music 209 Advanced Topics in Computer Music Lecture 4 Time Warping 2006-2-9 Professor David Wessel (with John Lazzaro) (cnmat.berkeley.edu/~wessel, www.cs.berkeley.edu/~lazzaro) www.cs.berkeley.edu/~lazzaro/class/music209

More information

Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA)

Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA) Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA) Norbert Schnell, Geoffroy Peeters, Serge Lemouton, Philippe Manoury, Xavier Rodet! " % & ( )! *, IRCAM -CENTRE GEORGES-POMPIDOU

More information

EE513 Audio Signals and Systems. Introduction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

EE513 Audio Signals and Systems. Introduction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky EE513 Audio Signals and Systems Introduction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky Question! If a tree falls in the forest and nobody is there to hear it, will it

More information

Introduction! User Interface! Bitspeek Versus Vocoders! Using Bitspeek in your Host! Change History! Requirements!...

Introduction! User Interface! Bitspeek Versus Vocoders! Using Bitspeek in your Host! Change History! Requirements!... version 1.5 Table of Contents Introduction!... 3 User Interface!... 4 Bitspeek Versus Vocoders!... 6 Using Bitspeek in your Host!... 6 Change History!... 9 Requirements!... 9 Credits and Contacts!... 10

More information

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication Proceedings of the 3 rd International Conference on Control, Dynamic Systems, and Robotics (CDSR 16) Ottawa, Canada May 9 10, 2016 Paper No. 110 DOI: 10.11159/cdsr16.110 A Parametric Autoregressive Model

More information

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath Objectives Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath In the previous chapters we have studied how to develop a specification from a given application, and

More information

A New "Duration-Adapted TR" Waveform Capture Method Eliminates Severe Limitations

A New Duration-Adapted TR Waveform Capture Method Eliminates Severe Limitations 31 st Conference of the European Working Group on Acoustic Emission (EWGAE) Th.3.B.4 More Info at Open Access Database www.ndt.net/?id=17567 A New "Duration-Adapted TR" Waveform Capture Method Eliminates

More information

How to Obtain a Good Stereo Sound Stage in Cars

How to Obtain a Good Stereo Sound Stage in Cars Page 1 How to Obtain a Good Stereo Sound Stage in Cars Author: Lars-Johan Brännmark, Chief Scientist, Dirac Research First Published: November 2017 Latest Update: November 2017 Designing a sound system

More information

Getting Started with the LabVIEW Sound and Vibration Toolkit

Getting Started with the LabVIEW Sound and Vibration Toolkit 1 Getting Started with the LabVIEW Sound and Vibration Toolkit This tutorial is designed to introduce you to some of the sound and vibration analysis capabilities in the industry-leading software tool

More information

Lab 5 Linear Predictive Coding

Lab 5 Linear Predictive Coding Lab 5 Linear Predictive Coding 1 of 1 Idea When plain speech audio is recorded and needs to be transmitted over a channel with limited bandwidth it is often necessary to either compress or encode the audio

More information

Figure 2: Original and PAM modulated image. Figure 4: Original image.

Figure 2: Original and PAM modulated image. Figure 4: Original image. Figure 2: Original and PAM modulated image. Figure 4: Original image. An image can be represented as a 1D signal by replacing all the rows as one row. This gives us our image as a 1D signal. Suppose x(t)

More information

MHSIB.5 Composing and arranging music within specified guidelines a. Creates music incorporating expressive elements.

MHSIB.5 Composing and arranging music within specified guidelines a. Creates music incorporating expressive elements. G R A D E: 9-12 M USI C IN T E R M E DI A T E B A ND (The design constructs for the intermediate curriculum may correlate with the musical concepts and demands found within grade 2 or 3 level literature.)

More information

Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics

Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics Master Thesis Signal Processing Thesis no December 2011 Single Channel Speech Enhancement Using Spectral Subtraction Based on Minimum Statistics Md Zameari Islam GM Sabil Sajjad This thesis is presented

More information

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance Methodologies for Expressiveness Modeling of and for Music Performance by Giovanni De Poli Center of Computational Sonology, Department of Information Engineering, University of Padova, Padova, Italy About

More information

Voice & Music Pattern Extraction: A Review

Voice & Music Pattern Extraction: A Review Voice & Music Pattern Extraction: A Review 1 Pooja Gautam 1 and B S Kaushik 2 Electronics & Telecommunication Department RCET, Bhilai, Bhilai (C.G.) India pooja0309pari@gmail.com 2 Electrical & Instrumentation

More information

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis

Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Automatic characterization of ornamentation from bassoon recordings for expressive synthesis Montserrat Puiggròs, Emilia Gómez, Rafael Ramírez, Xavier Serra Music technology Group Universitat Pompeu Fabra

More information

VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION

VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION Tomoyasu Nakano Masataka Goto National Institute of Advanced Industrial Science and Technology (AIST), Japan

More information

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016 6.UAP Project FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System Daryl Neubieser May 12, 2016 Abstract: This paper describes my implementation of a variable-speed accompaniment system that

More information

CSC475 Music Information Retrieval

CSC475 Music Information Retrieval CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32 Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0

More information

CS229 Project Report Polyphonic Piano Transcription

CS229 Project Report Polyphonic Piano Transcription CS229 Project Report Polyphonic Piano Transcription Mohammad Sadegh Ebrahimi Stanford University Jean-Baptiste Boin Stanford University sadegh@stanford.edu jbboin@stanford.edu 1. Introduction In this project

More information

NON-LINEAR EFFECTS MODELING FOR POLYPHONIC PIANO TRANSCRIPTION

NON-LINEAR EFFECTS MODELING FOR POLYPHONIC PIANO TRANSCRIPTION NON-LINEAR EFFECTS MODELING FOR POLYPHONIC PIANO TRANSCRIPTION Luis I. Ortiz-Berenguer F.Javier Casajús-Quirós Marisol Torres-Guijarro Dept. Audiovisual and Communication Engineering Universidad Politécnica

More information

Edit Menu. To Change a Parameter Place the cursor below the parameter field. Rotate the Data Entry Control to change the parameter value.

Edit Menu. To Change a Parameter Place the cursor below the parameter field. Rotate the Data Entry Control to change the parameter value. The Edit Menu contains four layers of preset parameters that you can modify and then save as preset information in one of the user preset locations. There are four instrument layers in the Edit menu. See

More information

Digital music synthesis using DSP

Digital music synthesis using DSP Digital music synthesis using DSP Rahul Bhat (124074002), Sandeep Bhagwat (123074011), Gaurang Naik (123079009), Shrikant Venkataramani (123079042) DSP Application Assignment, Group No. 4 Department of

More information

SMS Composer and SMS Conductor: Applications for Spectral Modeling Synthesis Composition and Performance

SMS Composer and SMS Conductor: Applications for Spectral Modeling Synthesis Composition and Performance SMS Composer and SMS Conductor: Applications for Spectral Modeling Synthesis Composition and Performance Eduard Resina Audiovisual Institute, Pompeu Fabra University Rambla 31, 08002 Barcelona, Spain eduard@iua.upf.es

More information

Music Source Separation

Music Source Separation Music Source Separation Hao-Wei Tseng Electrical and Engineering System University of Michigan Ann Arbor, Michigan Email: blakesen@umich.edu Abstract In popular music, a cover version or cover song, or

More information

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University Week 14 Query-by-Humming and Music Fingerprinting Roger B. Dannenberg Professor of Computer Science, Art and Music Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Metadata-based

More information

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics) 1 Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics) Pitch Pitch is a subjective characteristic of sound Some listeners even assign pitch differently depending upon whether the sound was

More information

Simple Harmonic Motion: What is a Sound Spectrum?

Simple Harmonic Motion: What is a Sound Spectrum? Simple Harmonic Motion: What is a Sound Spectrum? A sound spectrum displays the different frequencies present in a sound. Most sounds are made up of a complicated mixture of vibrations. (There is an introduction

More information

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING José Ventura, Ricardo Sousa and Aníbal Ferreira University of Porto - Faculty of Engineering -DEEC Porto, Portugal ABSTRACT Vibrato is a frequency

More information

Experiments on musical instrument separation using multiplecause

Experiments on musical instrument separation using multiplecause Experiments on musical instrument separation using multiplecause models J Klingseisen and M D Plumbley* Department of Electronic Engineering King's College London * - Corresponding Author - mark.plumbley@kcl.ac.uk

More information

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function

y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function y POWER USER MUSIC PRODUCTION and PERFORMANCE With the MOTIF ES Mastering the Sample SLICE function Phil Clendeninn Senior Product Specialist Technology Products Yamaha Corporation of America Working with

More information

Analyzing & Synthesizing Gamakas: a Step Towards Modeling Ragas in Carnatic Music

Analyzing & Synthesizing Gamakas: a Step Towards Modeling Ragas in Carnatic Music Mihir Sarkar Introduction Analyzing & Synthesizing Gamakas: a Step Towards Modeling Ragas in Carnatic Music If we are to model ragas on a computer, we must be able to include a model of gamakas. Gamakas

More information

Fraction by Sinevibes audio slicing workstation

Fraction by Sinevibes audio slicing workstation Fraction by Sinevibes audio slicing workstation INTRODUCTION Fraction is an effect plugin for deep real-time manipulation and re-engineering of sound. It features 8 slicers which record and repeat the

More information

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices Yasunori Ohishi 1 Masataka Goto 3 Katunobu Itou 2 Kazuya Takeda 1 1 Graduate School of Information Science, Nagoya University,

More information

ALGORHYTHM. User Manual. Version 1.0

ALGORHYTHM. User Manual. Version 1.0 !! ALGORHYTHM User Manual Version 1.0 ALGORHYTHM Algorhythm is an eight-step pulse sequencer for the Eurorack modular synth format. The interface provides realtime programming of patterns and sequencer

More information

SPEECH TO SINGING SYNTHESIS: INCORPORATING PATAH LAGU IN THE FUNDAMENTAL FREQUENCY CONTROL MODEL FOR MALAY ASLI SONG

SPEECH TO SINGING SYNTHESIS: INCORPORATING PATAH LAGU IN THE FUNDAMENTAL FREQUENCY CONTROL MODEL FOR MALAY ASLI SONG How to cite this paper: Nurmaisara Za ba & Nursuriati Jamil. (2017). Speech to singing synthesis: incorporating patah lagu in the fundamental frequency control model for malay asli song in Zulikha, J.

More information

A Case Based Approach to the Generation of Musical Expression

A Case Based Approach to the Generation of Musical Expression A Case Based Approach to the Generation of Musical Expression Taizan Suzuki Takenobu Tokunaga Hozumi Tanaka Department of Computer Science Tokyo Institute of Technology 2-12-1, Oookayama, Meguro, Tokyo

More information

1 Ver.mob Brief guide

1 Ver.mob Brief guide 1 Ver.mob 14.02.2017 Brief guide 2 Contents Introduction... 3 Main features... 3 Hardware and software requirements... 3 The installation of the program... 3 Description of the main Windows of the program...

More information

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music A Melody Detection User Interface for Polyphonic Music Sachin Pant, Vishweshwara Rao, and Preeti Rao Department of Electrical Engineering Indian Institute of Technology Bombay, Mumbai 400076, India Email:

More information

ACTIVE SOUND DESIGN: VACUUM CLEANER

ACTIVE SOUND DESIGN: VACUUM CLEANER ACTIVE SOUND DESIGN: VACUUM CLEANER PACS REFERENCE: 43.50 Qp Bodden, Markus (1); Iglseder, Heinrich (2) (1): Ingenieurbüro Dr. Bodden; (2): STMS Ingenieurbüro (1): Ursulastr. 21; (2): im Fasanenkamp 10

More information

Acoustic Prosodic Features In Sarcastic Utterances

Acoustic Prosodic Features In Sarcastic Utterances Acoustic Prosodic Features In Sarcastic Utterances Introduction: The main goal of this study is to determine if sarcasm can be detected through the analysis of prosodic cues or acoustic features automatically.

More information

Acoustic concert halls (Statistical calculation, wave acoustic theory with reference to reconstruction of Saint- Petersburg Kapelle and philharmonic)

Acoustic concert halls (Statistical calculation, wave acoustic theory with reference to reconstruction of Saint- Petersburg Kapelle and philharmonic) Acoustic concert halls (Statistical calculation, wave acoustic theory with reference to reconstruction of Saint- Petersburg Kapelle and philharmonic) Borodulin Valentin, Kharlamov Maxim, Flegontov Alexander

More information

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE Centre for Marine Science and Technology A Matlab toolbox for Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE Version 5.0b Prepared for: Centre for Marine Science and Technology Prepared

More information

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing Book: Fundamentals of Music Processing Lecture Music Processing Audio Features Meinard Müller International Audio Laboratories Erlangen meinard.mueller@audiolabs-erlangen.de Meinard Müller Fundamentals

More information

Pitch is one of the most common terms used to describe sound.

Pitch is one of the most common terms used to describe sound. ARTICLES https://doi.org/1.138/s41562-17-261-8 Diversity in pitch perception revealed by task dependence Malinda J. McPherson 1,2 * and Josh H. McDermott 1,2 Pitch conveys critical information in speech,

More information