Topic 4. Single Pitch Detection

Similar documents
Topic 10. Multi-pitch Analysis

ACCURATE ANALYSIS AND VISUAL FEEDBACK OF VIBRATO IN SINGING. University of Porto - Faculty of Engineering -DEEC Porto, Portugal

Query By Humming: Finding Songs in a Polyphonic Database

CSC475 Music Information Retrieval

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Tempo and Beat Analysis

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Efficient Vocal Melody Extraction from Polyphonic Music Signals

NEURAL NETWORKS FOR SUPERVISED PITCH TRACKING IN NOISE. Kun Han and DeLiang Wang

Automatic music transcription

Pitch Detection/Tracking Strategy for Musical Recordings of Solo Bowed-String and Wind Instruments

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Automatic Rhythmic Notation from Single Voice Audio Sources

Tempo and Beat Tracking

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Voice & Music Pattern Extraction: A Review

Robert Alexandru Dobre, Cristian Negrescu

HST 725 Music Perception & Cognition Assignment #1 =================================================================

Binning based algorithm for Pitch Detection in Hindustani Classical Music

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Pattern Recognition in Music

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

Music Radar: A Web-based Query by Humming System

Detection and demodulation of non-cooperative burst signal Feng Yue 1, Wu Guangzhi 1, Tao Min 1

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Upgrading E-learning of basic measurement algorithms based on DSP and MATLAB Web Server. Milos Sedlacek 1, Ondrej Tomiska 2

A method of subject extension pitch extraction for humming and singing signals

2. AN INTROSPECTION OF THE MORPHING PROCESS

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION. Gregory Sell and Pascal Clark

Measurement of overtone frequencies of a toy piano and perception of its pitch

Chapter 1. Introduction to Digital Signal Processing

Progress in calculating tonality of technical sounds

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Violin Timbre Space Features

CHAPTER 4 SEGMENTATION AND FEATURE EXTRACTION

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Augmentation Matrix: A Music System Derived from the Proportions of the Harmonic Series

Music Representations

Audio Feature Extraction for Corpus Analysis

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Department of Electrical & Electronic Engineering Imperial College of Science, Technology and Medicine. Project: Real-Time Speech Enhancement

Signal Processing for Melody Transcription

Singing accuracy, listeners tolerance, and pitch analysis

Speech and Speaker Recognition for the Command of an Industrial Robot

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

THE importance of music content analysis for musical

Getting Started. Connect green audio output of SpikerBox/SpikerShield using green cable to your headphones input on iphone/ipad.

Analysis, Synthesis, and Perception of Musical Sounds

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

Simple Harmonic Motion: What is a Sound Spectrum?

White Noise Suppression in the Time Domain Part II

Getting Started with the LabVIEW Sound and Vibration Toolkit

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

A prototype system for rule-based expressive modifications of audio recordings

Comparison Parameters and Speaker Similarity Coincidence Criteria:

Automatic Construction of Synthetic Musical Instruments and Performers

Violin Driven Synthesis from Spectral Models

Singing voice synthesis based on deep neural networks

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Advanced Techniques for Spurious Measurements with R&S FSW-K50 White Paper

A NEW LOOK AT FREQUENCY RESOLUTION IN POWER SPECTRAL DENSITY ESTIMATION. Sudeshna Pal, Soosan Beheshti

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

MUSIC is a ubiquitous and vital part of the lives of billions

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

AUD 6306 Speech Science

MAutoPitch. Presets button. Left arrow button. Right arrow button. Randomize button. Save button. Panic button. Settings button

Statistical Modeling and Retrieval of Polyphonic Music

REAL-TIME PITCH TRAINING SYSTEM FOR VIOLIN LEARNERS

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

DETECTION OF PITCHED/UNPITCHED SOUND USING PITCH STRENGTH CLUSTERING

Pitch is one of the most common terms used to describe sound.

Getting started with Spike Recorder on PC/Mac/Linux

MPEG-7 AUDIO SPECTRUM BASIS AS A SIGNATURE OF VIOLIN SOUND

Pitch-Gesture Modeling Using Subband Autocorrelation Change Detection

Psychoacoustics. lecturer:

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Melody transcription for interactive applications

Lecture 9 Source Separation

Vocoder Reference Test TELECOMMUNICATIONS INDUSTRY ASSOCIATION

Appendix D. UW DigiScope User s Manual. Willis J. Tompkins and Annie Foong

Audio-Based Video Editing with Two-Channel Microphone

An Audio Front End for Query-by-Humming Systems

Automatic Laughter Detection

ISSN ICIRET-2014

Singer Recognition and Modeling Singer Error

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

Spectrum Analyser Basics

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

DIGITAL COMMUNICATION

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Assessing and Measuring VCR Playback Image Quality, Part 1. Leo Backman/DigiOmmel & Co.

Transcription:

Topic 4 Single Pitch Detection

What is pitch? A perceptual attribute, so subjective Only defined for (quasi) harmonic sounds Harmonic sounds are periodic, and the period is 1/F0. Can be reliably matched to fundamental frequency (F0) In computer audition, people do not often discriminate pitch from F0 F0 is a physical attribute, so objective ECE 477 - Computer Audition, Zhiyao Duan 2017 2

Why is pitch detection important? Harmonic sounds are ubiquitous Music, speech, bird singing Pitch (F0) is an important attribute of harmonic sounds, and it relates to other properties Music melody key, scale (e.g., chromatic, diatonic, pentatonic), style, emotion, etc. Speech intonation word disambiguation (for tonal language), statement/question, emotion, etc. What scales are used? What emotion? ECE 477 - Computer Audition, Zhiyao Duan 2017 3

General Process of Pitch Detection Segment audio into time frames Pitch changes over time Detect pitch (if any) in each frame Need to detect if the frame contains pitch or not Post-processing to consider contextual info Pitch contours are often continuous ECE 477 - Computer Audition, Zhiyao Duan 2017 4

An Example ECE 477 - Computer Audition, Zhiyao Duan 2017 5

How long should the frame be? Too long: Contains multiple pitches (low time resolution) Too short Can t obtain reliable detection (low freq resolution) Should be at least about 3 periods of the signal 0.2 waveform 0.1 Amplitude 0-0.1-0.2 0.74 0.745 0.75 0.755 0.76 0.765 0.77 0.775 0.78 Time (s) 3 periods For speech or music, how long should the frame be? ECE 477 - Computer Audition, Zhiyao Duan 2017 6

Pitch-related Properties Time domain signal is periodic. F0 = 1/period Spectral peaks have harmonic relations. F0 is the greatest common divisor Spectral peaks are equally spaced. F0 is the frequency gap ECE 477 - Computer Audition, Zhiyao Duan 2017 7

Pitch Detection Methods Time domain signal is periodic. F0 = 1/period Spectral peaks have harmonic relations. F0 is the greatest common divisor Spectral peaks are equally spaced. F0 is the frequency gap Time domain Detect period Frequency domain Detect the divisor Cepstrum domain Detect the gap ECE 477 - Computer Audition, Zhiyao Duan 2017 8

Time Domain: Autocorrelation A periodic signal correlates strongly with itself when offset by the period (and multiple periods) Problem: sensitive to peak amplitude changes Which peak would be higher if signal amplitude increases? Lower octave error (or subharmonic error) ECE 477 - Computer Audition, Zhiyao Duan 2017 9

YIN Step 2 Replace ACF with difference function [de Cheveigne, 2002] Look for dips instead of peaks, which is why it s called YIN opposed to YANG. Immune to amplitude changes Problem Some dips close to 0 lag might be deeper due to imperfect periodicity ECE 477 - Computer Audition, Zhiyao Duan 2017 10

YIN Step 3 Cumulative mean normalized difference function Then take the deepest dip? Problem May choose higher-order dips lower octave error (or sub-harmonic error) ECE 477 - Computer Audition, Zhiyao Duan 2017 11

Absolute Threshold YIN Step 4 Set threshold to say 0.1 Pick the first dip that exceeds the threshold 0.1 ECE 477 - Computer Audition, Zhiyao Duan 2017 12

YIN Step 5 & 6 Step 5: parabolic interpolation to find the exact dip location The dip location in the discrete world may deviate from the exact dip location Step 6: use the best local estimate Some analysis points may be better than others (result in smaller d ) Use the pitch estimate from the best analysis point within the frame ECE 477 - Computer Audition, Zhiyao Duan 2017 13

Frequency Domain Approach Idea: for each F0 candidate, calculate the support (e.g., spectral energy) it receives from its harmonic positions. Harmonic Product Spectrum (HPS) [Schroeder, 1968; Noll, 1970] ECE 477 - Computer Audition, Zhiyao Duan 2017 14

Cepstral Domain Approach Idea: find the frequency gap between adjacent spectral peaks The log-amplitude spectrum looks pretty periodic The gap can be viewed as the period of the spectrum How to find the period then? Cepstrum s idea: Fourier transform! ECE 477 - Computer Audition, Zhiyao Duan 2017 15

Cepstrum power cepstrum = F 1 log F x t 2 2 Spectrum - Cepstrum Frequency - Quefrency Filtering - Liftering Signal period ECE 477 - Computer Audition, Zhiyao Duan 2017 16

Pitched or Non-pitched? Some frames may be silent or inharmonic, so they may not contain a pitch at all. Silence can be detected by RMS value How about inharmonic frames? YIN: threshold on dip, aperiodicity HPS: threshold on the peak amplitude of the product spectrum Cepstrum: threshold on ratio between amplitudes of the two highest cepstral peaks [Rabiner 1976] ECE 477 - Computer Audition, Zhiyao Duan 2017 17

How to evaluate pitch detection? Choose some recordings (speech, music) Get ground-truth Listen to the signal and inspect the spectrum to manually annotate (time consuming!) Automatic annotation using simultaneously recorded laryngograph signals for speech (not quite reliable!) Pitched/non-pitched classification error Calculate the difference between estimated pitch with ground-truth Threshold for speech: 10% or 20% in Hz Threshold for music: 1 quarter-tone (about 3% in Hz) ECE 477 - Computer Audition, Zhiyao Duan 2017 18

Different Methods vs. Ground-truth frame 25 frame 65 ECE 477 - Computer Audition, Zhiyao Duan 2017 19

Frame 65 Pitched (Voiced) Has clear harmonic patterns Different methods give close results, and consistent to the ground-truth 196 Hz. 40 30 Log Magnitude (db) 20 10 0-10 -20 0 500 1000 1500 2000 2500 3000 Frequency (Hz) ECE 477 - Computer Audition, Zhiyao Duan 2017 20

Frame 25 Non-pitched (Unvoiced) No clear harmonic patterns Different methods give inconsistent results. 40 30 Log Magnitude (db) 20 10 0-10 -20 0 500 1000 1500 2000 2500 3000 Frequency (Hz) ECE 477 - Computer Audition, Zhiyao Duan 2017 21

Pitch Detection with Noise Can we still hear pitch if there is some background noise, say in a restaurant? Violin + babble noise Will pitch detection algorithms still work? Which domain is less sensitive to which kind of noise? How to improve pitch detection in noisy environments? ECE 477 - Computer Audition, Zhiyao Duan 2017 22

Summary Pitch detection is important for many tasks Time domain: find the period of waveform Frequency domain: find the divisor of peaks Cepstral domain: find the frequency gap between spectral peaks Pitch detection research is pretty mature in noiseless conditions. Pitch detection in noisy environments (also called robust pitch detection, noise-resilient pitch detection) is an active research topic. BaNa [Yang et al., 2014]; PEFAC [Gonzales & Brookes, 2014]; ECE 477 - Computer Audition, Zhiyao Duan 2017 23

References Childers, D. G., Skinner, D.P., and Kemerait, R.C. (1977). The cepstrum: A guide to processing. In Proc. IEEE. de Cheveigne, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. JASA. Noll, A. M. (1970). Pitch determination of human speech by the harmonic product spectrum, the harmonic sum spectrum and a maximum likelihood estimate. In Proc. SCPC. Rabiner, L. R., Cheng, M. J., Osenberg, A. E., & McGonegal, C. A. (1976). A comparative performance study of several pitch detection algorithms. TASSP. Schroeder, M. R. (1968). Period histogram and product spectrum: New methods for fundamental frequency measurement. JASA. Yang, N., Ba, H., Demirkol, I., & Heinzelman, W. (2014). A noise resilient fundamental frequency detection algorithm for speech and music. TASLP. Gonzalez, S., & Brookes, M. (2014). PEFAC - a pitch estimation algorithm robust to high levels of noise. TASLP. ECE 477 - Computer Audition, Zhiyao Duan 2017 24