Advanced Signal Processing 2

Similar documents
Singing voice synthesis based on deep neural networks

1. Introduction NCMMSC2009

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

A HMM-based Mandarin Chinese Singing Voice Synthesis System

2. AN INTROSPECTION OF THE MORPHING PROCESS

A Comparative Study of Spectral Transformation Techniques for Singing Voice Synthesis

Bertsokantari: a TTS based singing synthesis system

Speech and Speaker Recognition for the Command of an Industrial Robot

Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Pitch-Synchronous Spectrogram: Principles and Applications

On Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

VOCALISTENER: A SINGING-TO-SINGING SYNTHESIS SYSTEM BASED ON ITERATIVE PARAMETER ESTIMATION

Quarterly Progress and Status Report. Formant frequency tuning in singing

Analysis of the effects of signal distance on spectrograms

Music Source Separation

A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS

Real-time magnetic resonance imaging investigation of resonance tuning in soprano singing

A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Improving Frame Based Automatic Laughter Detection

Automatic Rhythmic Notation from Single Voice Audio Sources

INTER GENRE SIMILARITY MODELLING FOR AUTOMATIC MUSIC GENRE CLASSIFICATION

Laugh when you re winning

Welcome to Vibrationdata

Pitch Analysis of Ukulele

Various Applications of Digital Signal Processing (DSP)

DEVELOPMENT OF MIDI ENCODER "Auto-F" FOR CREATING MIDI CONTROLLABLE GENERAL AUDIO CONTENTS

Crash Course in Digital Signal Processing

A FUNCTIONAL CLASSIFICATION OF ONE INSTRUMENT S TIMBRES

Digital music synthesis using DSP

A comparison of the acoustic vowel spaces of speech and song*20

MANDARIN SINGING VOICE SYNTHESIS BASED ON HARMONIC PLUS NOISE MODEL AND SINGING EXPRESSION ANALYSIS

Music Radar: A Web-based Query by Humming System

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Query By Humming: Finding Songs in a Polyphonic Database

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Real-valued parametric conditioning of an RNN for interactive sound synthesis

Music Segmentation Using Markov Chain Methods

Robert Alexandru Dobre, Cristian Negrescu

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

On human capability and acoustic cues for discriminating singing and speaking voices

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Analysis, Synthesis, and Perception of Musical Sounds

EE513 Audio Signals and Systems. Introduction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

AN ON-THE-FLY MANDARIN SINGING VOICE SYNTHESIS SYSTEM

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 FORMANT FREQUENCY ADJUSTMENT IN BARBERSHOP QUARTET SINGING

Physics and Neurophysiology of Hearing

Chord Classification of an Audio Signal using Artificial Neural Network

Kent Academic Repository

SPEECH TO SINGING SYNTHESIS: INCORPORATING PATAH LAGU IN THE FUNDAMENTAL FREQUENCY CONTROL MODEL FOR MALAY ASLI SONG

Automatic Laughter Detection

Singer Traits Identification using Deep Neural Network

AUTOMATIC IDENTIFICATION FOR SINGING STYLE BASED ON SUNG MELODIC CONTOUR CHARACTERIZED IN PHASE PLANE

Multimodal databases at KTH

Automatic Construction of Synthetic Musical Instruments and Performers

A probabilistic framework for audio-based tonal key and chord recognition

LOUDNESS EFFECT OF THE DIFFERENT TONES ON THE TIMBRE SUBJECTIVE PERCEPTION EXPERIMENT OF ERHU

Singer Recognition and Modeling Singer Error

Introductions to Music Information Retrieval

Sho-So-In: Control of a Physical Model of the Sho by Means of Automatic Feature Extraction from Real Sounds

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Toward a Computationally-Enhanced Acoustic Grand Piano

Singing Voice Detection for Karaoke Application

Composer Identification of Digital Audio Modeling Content Specific Features Through Markov Models

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

MODELING OF GESTURE-SOUND RELATIONSHIP IN RECORDER

Pitch. There is perhaps no aspect of music more important than pitch. It is notoriously

Phone-based Plosive Detection

Voice Controlled Car System

Singer Identification

Analyzing & Synthesizing Gamakas: a Step Towards Modeling Ragas in Carnatic Music

AN ANALYSIS OF SOUND FOR FAULT ENGINE

Melody transcription for interactive applications

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

Database Adaptation for Speech Recognition in Cross-Environmental Conditions

Supervised Learning in Genre Classification

Timbre perception

Quarterly Progress and Status Report. X-ray study of articulation and formant frequencies in two female singers

Classification of Timbre Similarity

Automatic Laughter Detection

About Giovanni De Poli. What is Model. Introduction. di Poli: Methodologies for Expressive Modeling of/for Music Performance

Features for Audio and Music Classification

CSC475 Music Information Retrieval

A New Method for Calculating Music Similarity

CHAPTER 4 SEGMENTATION AND FEATURE EXTRACTION

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

EMERGENT SOUNDSCAPE COMPOSITION: REFLECTIONS ON VIRTUALITY

The Perception of Formant Tuning in Soprano Voices

Laughter Animation Synthesis

4. ANALOG TV SIGNALS MEASUREMENT

Acoustic Scene Classification

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Linear Time Invariant (LTI) Systems

increase by 6 db each if the distance between them is halved. Likewise, vowels with a high first formant, such as /a/, or a high second formant, such

HARMONIC ANALYSIS OF ACOUSTIC WAVES

Development of high power gyrotron and EC technologies for ITER

Transcription:

Advanced Signal Processing 2 Synthesis of Singing 1

Outline Features and requirements of signing synthesizers HMM based synthesis of singing Articulatory synthesis of singing Examples 2

Requirements of a singing synthesizer Integration of musical score Note properties Pitch Duration Integration can be done manually or automatically Synthesis of (singing) speech sounds Direct synthesis of singing Conversion from spoken synthetic speech Modeling singing effects Vibrato Overshoot, etc. Addition / Improvement of naturalness 3

HMM-based synthesis of singing Based on [1] and [2] Unit-selection for singing would require vast amount of recorded data HMM-based system by relatively little training data System is similar to HMM-based speech synthesis Two main differences: Contextual factors Time-lag - modeling 4

HMM-based synthesis Overview (Analysis) Parameter extraction Mel-cepstral coefficients Fundamental frequencies Training/Estimation of Context-dependent HMMs State duration models Time-lag models 5

HMM-based synthesis Overview (Synthesis) Musical score contextdependent label sequence Song HMM = concatenation of context-dep. HMMs Determination of state durations (time-lag models!) Generate speech parameters from HMMs Synthesize speech by MLSAfilter 6

HMM-based synthesis - Contextual factors Different from those in synthesis of reading speech This method uses: Phoneme Tone (musical notes like A4 or C5# ) Duration of notes (in units of 100ms) Position in the current musical bar For all of them: preceding, current and succeeding one is taken into account Determined automatically from score and lyrics 7

HMM-based synthesis - Time-lag modeling (1) Strictly following score will sound unnatural Lags between start of notes and speech 8

HMM-based synthesis - Time-lag modeling (2) Context-dependent labels are assigned Clustered by decision tree Result: Decision tree-clustered context-dependent time-lag models One-dimensional Gaussians 9

HMM-based synthesis - Time-lag modeling (3) At synthesis stage: Determination of each note duration from score Simultaneously determine time-lags and state durations The joint probability has to be maximized P d, g T,=P d g,t, P g N = k=1 Pd k T k,g k, g k 1, P g k d k - state durations of k th note, g k time-lag (of start timing of k+1 th note), T k duration of k th note from score leads to a set of linear equations 10

HMM-based synthesis Experimental evaluation Self-recorded singing database with manual corrections Results: smooth and natural-sounding Time-lag models substantially improved quality Characteristics of original singer found in synthesized voice Samples 11

Articulatory synthesis of singing Based on [3] and [4] Completely different approach Main features: Complex three-dimensional model of vocal tract Sound synthesis by simulation of this model Input of the system is a gestural score Extension of an existing speech synthesizer Transformation of musical score into gestural score Pitch-dependent articulation of vowels 12

Articulatory synthesis Overview (1) 3D wireframe representation of male vocal tract Parameters determined by MRIimages for German vowels and consonants Shape and position of movable structures is a function of 23 parameters Dynamic MRI-data used for coarticulated consonants 13

Articulatory synthesis Overview (2) Acoustical simulation by branched tube model Short abutting elliptical tube sections Represented by an area function and a discrete perimeter function 14

Articulatory synthesis Overview (3) Analogy of acoustical and electrical transmission Branched tube model represented by inhomogeneous transmission line circuit with lumped elements Each tube section Two-port T- type network, elements are function of tube geometry Simulated by finite difference equations in time domain Additional techniques to simulate several types of losses All major speech sounds possible 15

Articulatory Synthesis Gestural Score (1) Is the input of the synthesizer Generation of parameters for vocal tract model Utterances represented by patterns of articulatory gestures Gestures are goal-oriented articulatory movements (What has to be done by vocal tract, but not how) Six types of gestures How to obtain: Transformation of defined XML-format for songs 16

Articulatory Synthesis Gestural Score (2) Example: [mu:zi:k] Only one configuration for the group {[b], [p], [m]} Consonant and vowel intervals overlap coarticulation Two lowest rows are examples for target functions of vocal parameters They are called motor commands Realized by third-order dynamical systems 17

Articulatory Synthesis Pitch dependent vocal tract target shapes Vocal tract shape for the same vowel depends on pitch Vowels at higher pitches are sung more open Tuning of vocal tract formants is necessary First formant should match first harmonic voice source Two extreme shapes for 110 Hz and 440 Hz Linear interpolation in between Low pitch shape is the one for speech synthesis 18

Conclusion The challenge of synthesizing singing Giving a speech synthesizer the ability to sing Two very different approaches HMM-based Articulatory synthesis 19

Singing samples HMM-based Singing Synthesis Synthesis of Singing Challenge 2007, Belgium 20

References [1] Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda: An HMM-based Singing Voice Synthesis System, 2006 [2] Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura: Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, 1999 [3] Peter Birkholz: Articulatory Synthesis of Singing, 2007 [4] Peter Birkholz, Ingmar Steiner, Stefan Breuer: Control Concepts for Articulatory Speech Synthesis, 2007 21

Thank you for your attention! 22