Pitch-Synchronous Spectrogram: Principles and Applications

Similar documents
Simple Harmonic Motion: What is a Sound Spectrum?

Welcome to Vibrationdata

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

Tempo and Beat Analysis

Analysis of the effects of signal distance on spectrograms

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Lab 5 Linear Predictive Coding

Digital music synthesis using DSP


A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

UNIVERSITY OF DUBLIN TRINITY COLLEGE

CHAPTER 20.2 SPEECH AND MUSICAL SOUNDS

Quarterly Progress and Status Report. Formant frequency tuning in singing

2. AN INTROSPECTION OF THE MORPHING PROCESS

Getting Started with the LabVIEW Sound and Vibration Toolkit

EE513 Audio Signals and Systems. Introduction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Singing voice synthesis in Spanish by concatenation of syllables based on the TD-PSOLA algorithm

Week 6 - Consonants Mark Huckvale

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Jaw Harp: An Acoustic Study. Acoustical Physics of Music Spring 2015 Simon Li

Fraction by Sinevibes audio slicing workstation

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Computer-based sound spectrograph system

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Saxophonists tune vocal tract resonances in advanced performance techniques

increase by 6 db each if the distance between them is halved. Likewise, vowels with a high first formant, such as /a/, or a high second formant, such

The Physics Of Sound. Why do we hear what we hear? (Turn on your speakers)

Kent Academic Repository

1. Introduction NCMMSC2009

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

Assessing and Measuring VCR Playback Image Quality, Part 1. Leo Backman/DigiOmmel & Co.

S I N E V I B E S FRACTION AUDIO SLICING WORKSTATION

Please feel free to download the Demo application software from analogarts.com to help you follow this seminar.

ECE438 - Laboratory 4: Sampling and Reconstruction of Continuous-Time Signals

EVTA SESSION HELSINKI JUNE 06 10, 2012

Supplementary Course Notes: Continuous vs. Discrete (Analog vs. Digital) Representation of Information

Music Segmentation Using Markov Chain Methods

Timbre perception

BBN ANG 141 Foundations of phonology Phonetics 3: Acoustic phonetics 1

2018 Fall CTP431: Music and Audio Computing Fundamentals of Musical Acoustics

1 Introduction to PSQM

Music 170: Wind Instruments

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

CSC475 Music Information Retrieval

Measurement of overtone frequencies of a toy piano and perception of its pitch

Music Representations

NanoGiant Oscilloscope/Function-Generator Program. Getting Started

Classification of Voice Modality using Electroglottogram Waveforms

DETECTING ENVIRONMENTAL NOISE WITH BASIC TOOLS

HARMONIC ANALYSIS OF ACOUSTIC WAVES

Advanced Signal Processing 2

Note on Posted Slides. Noise and Music. Noise and Music. Pitch. PHY205H1S Physics of Everyday Life Class 15: Musical Sounds

Real-time magnetic resonance imaging investigation of resonance tuning in soprano singing

Pitch. There is perhaps no aspect of music more important than pitch. It is notoriously

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Music Source Separation

Some Phonatory and Resonatory Characteristics of the Rock, Pop, Soul, and Swedish Dance Band Styles of Singing

Lecture 15: Research at LabROSA

ni.com Digital Signal Processing for Every Application

Application of cepstrum prewhitening on non-stationary signals

advanced spectral processing

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

3 Voiced sounds production by the phonatory system

EE-217 Final Project The Hunt for Noise (and All Things Audible)

Speech and Speaker Recognition for the Command of an Industrial Robot

Quarterly Progress and Status Report. Acoustic analysis of three male voices of different quality

Activity P42: Sound Waves (Power Output, Sound Sensor)

Available online at International Journal of Current Research Vol. 9, Issue, 08, pp , August, 2017

Music Representations

Automatic Classification of Instrumental Music & Human Voice Using Formant Analysis

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

THE DIGITAL DELAY ADVANTAGE A guide to using Digital Delays. Synchronize loudspeakers Eliminate comb filter distortion Align acoustic image.

International Journal of Computer Architecture and Mobility (ISSN ) Volume 1-Issue 7, May 2013

Voice & Music Pattern Extraction: A Review

(Adapted from Chicago NATS Chapter PVA Book Discussion by Chadley Ballantyne. Answers by Ken Bozeman)

Laugh when you re winning

ADSR AMP. ENVELOPE. Moog Music s Guide To Analog Synthesized Percussion. The First Step COMMON VOLUME ENVELOPES

Toward a Computationally-Enhanced Acoustic Grand Piano

SOUND LABORATORY LING123: SOUND AND COMMUNICATION

6.5 Percussion scalograms and musical rhythm

APP USE USER MANUAL 2017 VERSION BASED ON WAVE TRACKING TECHNIQUE

Vocal tract resonances in speech, singing, and playing musical instruments

Experiment P32: Sound Waves (Sound Sensor)

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Electrical and Electronic Laboratory Faculty of Engineering Chulalongkorn University. Cathode-Ray Oscilloscope (CRO)

Quarterly Progress and Status Report. An attempt to predict the masking effect of vowel spectra

Vocal-tract Influence in Trombone Performance

Physiological and Acoustic Characteristics of the Female Music Theatre Voice in belt and legit qualities

Quarterly Progress and Status Report. Simultaneous analysis of vocal fold vibration and transglottal airflow; Exploring a new experimental set-up

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

The BAT WAVE ANALYZER project

Voice source and acoustic measures of girls singing classical and contemporary commercial styles

IBEGIN MY FIRST ARTICLE AS Associate Editor of Journal of Singing for

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Module 8 : Numerical Relaying I : Fundamentals

The Effect of Time-Domain Interpolation on Response Spectral Calculations. David M. Boore

DEVELOPING THE MALE HEAD VOICE. A Paper by. Shawn T. Eaton, D.M.A.

Transcription:

Pitch-Synchronous Spectrogram: Principles and Applications C. Julian Chen Department of Applied Physics and Applied Mathematics May 24, 2018

Outline The traditional spectrogram Observations with the electroglottograph (EGG) Process of human voice production Pitch-synchronous segmentation of voice signals Pitch-synchronous spectrogram Display of timbre spectrum within each period Display of power evolution within each period Free evaluation version and full versions

The traditional spectrogram The graph is always a mixture of pitch and timbre. (A) with a wide window, the overtones of fundamental frequency dominate. (B) with a narrow window, a mixture of formant peaks and details in each pitch period dominate.

Display of timbre spectrum The curve is always a mixture of pitch and timbre. It is very difficult to decipher formant frequencies and peak profiles.

The Source-Filter Theory Source: Fourier transform of glottal airflow waveform, -12 db/oct. Filter: an all-pole transfer function. Radiation factor: +6 db per octave, which is against the law of energy conservation.

Pitch-Asynchronous Speech Parameterization (1) The speech signal is blocked into overlapping frames with a fixed window size (25 msec) and a fixed shift (10 msec), and then multiplied by a processing window, typically a Hamming window. The windows often cross phoneme boundaries. Timbre and pitch cannot be separated.

Pitch-Asynchronous Speech Parameterization (2) Using an all-pole filter model from LPC analysis, the formants of speech signals can be extracted. But the process is not convergent.

Anatomy of voice-production organs

Observation of Speech Signals (1) Vowel [a], King-TTS-012, 050007, 2.23-2.28 sec.

Observation of Speech Signals (2) Vowel [i], King-TTS-012, 004419, 1.938 1.968 sec.

Observation of Speech Signals (3) Vowel [u], King-TTS-012, 005044, 1.06 1.11 sec.

Observation of Speech Signals (4) Vowel [e], King-TTS-012, 050053, 2.535 2.585 sec.

Observation of Speech Signals (5) Vowel [o], King-TTS-012, 051022, 1.827 1.877 sec.

The Electroglottograph (EGG) A non-invasive instrument to detect the change of electric conductance between the two vocal cords, thus to monitor the opening and closing of the glottis (circa 1956).

What the Correlation of EGG Signals and Voice Signals Tells Us? A voice waveform is triggered by a glottal closing, starting with an impulse. The acoustic wave is strong in the closed phase, and weak in the open phase. (Fig. 5.6, Resonance in Singing, D. G. Miller).

The Handclap Analogy (Robert Sataloff) Sound is actually produced by the closing of the vocal folds, in a manner similar to the sound generated by hand clapping. (T)he more frequent they open and close, the higher the pitch. (Sataloff).

The Water-Hammer Analogy (Ronald Baken) The sharp cutoff of flow is particularly crucial, because it is this relatively sudden stoppage of the air flow that is the raw material of voice. An impulse-like shock wave is produced that excites air molecules in the vocal tract. (R. Baken)

Principle of Superposition (Peter Ladefoged) The voice signal is a superposition of elementary decaying waves, each elementary wave starts at a glottal closing event. Pitch is the repetition rate of glottal closing. (Ladefoged)

What Is Timber Spectrum? As the glottis closes, the air moving in the vocal tract at that moment maintains its momentum. The kinetic energy of the moving air in the vocal tract is converted into acoustic energy. The impulse resonates in the vocal tract. The decaying elementary wave in each pitch period is determined by the geometry of vocal tract, thus it represents instantaneous timber. Accurate timber spectrum must be computed from the waveform in each pitch period.

Process Within Each Pitch Period A glottal closing starts a pitch period. The acoustic wave decays exponentially during the closed phase. A glottal opening connects the vocal tract with the lungs thus accelerates power decay. A glottal opening also generates random noise. The excitation at a glottal opening is mostly weaker than that at a glottal closing.

Pitch-Synchronous Segmentation Using EGG The sharp peaks in EGG derivative occur about 1 msec before the starting impulse, which is in the weakly varying section of a pitch period, suitable as segmentation points.

Pitch-Synchronous Segmentation from Voice By multiplying the voice signal with an asymmetric window, an excitation profile function is generated. The peaks of the excitation profile function generate pitch marks.

Ends-meeting procedure to make waveform cyclic After an ends-meeting procedure, the waveform of each pitch period becomes a sample of a smooth periodic function.

Example of a pitch-synchronous spectrogram For voiced sections, the vertical lines represent glottal closing instants. In each pitch period, the amplitude timbre spectrum is displayed. Unvoiced sections has no glottal closings.

Display of Timbre Spectrum and Power Decay By left-clicking the spectrogram at a pitch period, its timbre spectrum is displayed. By right-clicking at a pitch period, a graph of power decay in that pitch period is displayed.

Examples: Timbre spectra of some vowels

Examples: Consistency of Timbre Spectra Six examples of timbre spectra of vowel [i]. All showing a strong peak at about 300 Hz, and a group of peaks around 2-4 khz.

Examples: Timbre spectra of some consonants

Examples: Power decay in a single pitch period

A Free Evaluation Version Includes pitch-synchronous segmentation of voice signals, spectrogram generation, timbre spectrum generation, and power decay computation. Only works on Mac OS Requires an installation of Tcl/Tk Partially open-source: the C++ program is compiled, the Tcl/Tk source code is open. Includes two sets of standard speech data: the CMU ARCTIC databases for US English speakers, male speaker bdl and female speaker slt Manually corrected phoneme label files for the two sets of speech data are also included

Input panel of the evaluation version The entire package is in a single dir, PSS. In that dir, type IMAC: PSS usermane$ wish pss.tcl <enter> An input panel appears:

References 1. D. G. Miller, Resonance in Singing, Inside View Press, 2008. 2. R. T. Sataloff, The Human Voice, Scientific American, December 1992, Vol. 108. 3. R. J. Baken, Electroglottography, Journal of Voice, Vol 6, page 98-110 (1992) 4. R. J. Baken, An Overview of Laryngeal Function for Voice Production, in Professional Voice, Third Edition, edited by R. T. Sataloff, Plural Publishing, Vol. 1, pages 237-256 (2005). 5. P. Ladefoged, Elements of Acoustic Phonetics, University of Chicago Press, 1966. 6. C. J. Chen, Elements of Human Voice, World Scientific Publishing, 2016.