CSC475 Music Information Retrieval

Similar documents
Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Robert Alexandru Dobre, Cristian Negrescu

Music Representations

Automatic music transcription

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

9.35 Sensation And Perception Spring 2009

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

Query By Humming: Finding Songs in a Polyphonic Database

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Music Radar: A Web-based Query by Humming System

Creative Computing II

Music 175: Pitch II. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) June 2, 2015

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Musical Signal Processing with LabVIEW Introduction to Audio and Musical Signals. By: Ed Doering

Topic 4. Single Pitch Detection

HST 725 Music Perception & Cognition Assignment #1 =================================================================

Pitch Perception. Roger Shepard

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

Spectrum Analyser Basics

Tempo and Beat Analysis

Measurement of overtone frequencies of a toy piano and perception of its pitch

Outline. Why do we classify? Audio Classification

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

CSC475 Music Information Retrieval

Figure 1: Feature Vector Sequence Generator block diagram.

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Musical Acoustics Lecture 16 Interval, Scales, Tuning and Temperament - I

Math and Music: The Science of Sound

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Automatic Rhythmic Notation from Single Voice Audio Sources

Simple Harmonic Motion: What is a Sound Spectrum?

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

We realize that this is really small, if we consider that the atmospheric pressure 2 is

2. AN INTROSPECTION OF THE MORPHING PROCESS

Voice & Music Pattern Extraction: A Review

Music Representations

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Audio Feature Extraction for Corpus Analysis

Psychoacoustics. lecturer:

Introduction To LabVIEW and the DSP Board

Music Segmentation Using Markov Chain Methods

ECE 4220 Real Time Embedded Systems Final Project Spectrum Analyzer

PHY 103: Scales and Musical Temperament. Segev BenZvi Department of Physics and Astronomy University of Rochester

2018 Fall CTP431: Music and Audio Computing Fundamentals of Musical Acoustics

BBN ANG 141 Foundations of phonology Phonetics 3: Acoustic phonetics 1

CTP 431 Music and Audio Computing. Basic Acoustics. Graduate School of Culture Technology (GSCT) Juhan Nam

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

Analysis, Synthesis, and Perception of Musical Sounds

聲音有高度嗎? 音高之聽覺生理基礎. Do Sounds Have a Height? Physiological Basis for the Pitch Percept

Pattern Recognition in Music

Spectral toolkit: practical music technology for spectralism-curious composers MICHAEL NORRIS

UNIVERSITY OF DUBLIN TRINITY COLLEGE

Author Index. Absolu, Brandt 165. Montecchio, Nicola 187 Mukherjee, Bhaswati 285 Müllensiefen, Daniel 365. Bay, Mert 93

Upgrading E-learning of basic measurement algorithms based on DSP and MATLAB Web Server. Milos Sedlacek 1, Ondrej Tomiska 2

Module 8 : Numerical Relaying I : Fundamentals

DIGITAL COMMUNICATION

An Integrated Music Chromaticism Model

LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly

Mathematics & Music: Symmetry & Symbiosis

TECHNIQUES FOR AUTOMATIC MUSIC TRANSCRIPTION. Juan Pablo Bello, Giuliano Monti and Mark Sandler

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

OCTAVE C 3 D 3 E 3 F 3 G 3 A 3 B 3 C 4 D 4 E 4 F 4 G 4 A 4 B 4 C 5 D 5 E 5 F 5 G 5 A 5 B 5. Middle-C A-440

AN INTRODUCTION TO MUSIC THEORY Revision A. By Tom Irvine July 4, 2002

Speech and Speaker Recognition for the Command of an Industrial Robot

Music Information Retrieval with Temporal Features and Timbre

ECE438 - Laboratory 4: Sampling and Reconstruction of Continuous-Time Signals

The Pythagorean Scale and Just Intonation

LESSON 1 PITCH NOTATION AND INTERVALS

Voice Controlled Car System

Ch. 1: Audio/Image/Video Fundamentals Multimedia Systems. School of Electrical Engineering and Computer Science Oregon State University

Lab P-6: Synthesis of Sinusoidal Signals A Music Illusion. A k cos.! k t C k / (1)

AUTOMATIC ACCOMPANIMENT OF VOCAL MELODIES IN THE CONTEXT OF POPULAR MUSIC

Lecture 5: Tuning Systems

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Adaptive Resampling - Transforming From the Time to the Angle Domain

Loudness and Sharpness Calculation

Topic 10. Multi-pitch Analysis

Lecture 1: What we hear when we hear music

Music Source Separation

Supplemental Material for Gamma-band Synchronization in the Macaque Hippocampus and Memory Formation

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Removal of Decaying DC Component in Current Signal Using a ovel Estimation Algorithm

DSP First Lab 04: Synthesis of Sinusoidal Signals - Music Synthesis

Lecture 7: Music

The Tone Height of Multiharmonic Sounds. Introduction

CTP431- Music and Audio Computing Musical Acoustics. Graduate School of Culture Technology KAIST Juhan Nam

ENGIN 100: Music Signal Processing. PROJECT #1: Tone Synthesizer/Transcriber

Signal Processing for Melody Transcription

Algorithms for melody search and transcription. Antti Laaksonen

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Music Database Retrieval Based on Spectral Similarity

AN APPROACH FOR MELODY EXTRACTION FROM POLYPHONIC AUDIO: USING PERCEPTUAL PRINCIPLES AND MELODIC SMOOTHNESS

Progress in calculating tonality of technical sounds

Realizing Waveform Characteristics up to a Digitizer s Full Bandwidth Increasing the effective sampling rate when measuring repetitive signals

Tempo Estimation and Manipulation

ANALYSING DIFFERENCES BETWEEN THE INPUT IMPEDANCES OF FIVE CLARINETS OF DIFFERENT MAKES

Transcription:

CSC475 Music Information Retrieval Monophonic pitch extraction George Tzanetakis University of Victoria 2014 G. Tzanetakis 1 / 32

Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0 estimation 4 Example Applications G. Tzanetakis 2 / 32

Music Notation Music notation sytesm typically encode information about discrete musical pitch (notes on a piano) and timing. G. Tzanetakis 3 / 32

Terminology The term pitch is used in different ways in the literature which can result in some confusion. Perceptual Pitch: is a perceived quality of sound that can be ordered from low to high. Musical Pitch: refers to a discrete finite set of perceived pitches that are played on musical instruments Measured Pitch: is a calculated quantity of a sound using an algorithm that tries to match the perceived pitch. Monophonic: refers to a piece of music in which a single sound source (instrument or voice) is playing and only one pitch is heard at any particular time instance. G. Tzanetakis 4 / 32

Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0 estimation 4 Example Applications G. Tzanetakis 5 / 32

Psychoacoustics Definition The scientific study of sound perception. Frequently testing the limits of perception: Frequency range 20Hz-20000Hz Intensity (0dB-120dB) Masking Missing fundamental (presence of harmonics at integer multiples of fundamental give the impression of missing pitch) G. Tzanetakis 6 / 32

Origins of Psychoacoustics Pythagoras of Samos established a connection between perception (music intervals) and physical measurable quantities (string lengths) using the monochord. G. Tzanetakis 7 / 32

Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0 estimation 4 Example Applications G. Tzanetakis 8 / 32

Pitch Detection Pitch is a PERCEPTUAL attribute correlated but not equivalent to fundamental frequency. Simple pitch detection algorithms most deal with fundamental frequency estimation but more sophisticated ones take into account knowledge about the human auditory system. Time Domain Frequency Domain Perceptual G. Tzanetakis 9 / 32

Time-domain Zerocrossings Zero-crossings are sensitive to noise so frequency low-pass filtering is utilized. Figure : C4 Sine [Sound] Figure : C4 Clarient [Sound] G. Tzanetakis 10 / 32

AutoCorrelation In autocorrelation the signal is delayed and multiplied with itself for different time lags l. The autocorrelation functions has peaks at the lags in which the signal is self-similar. Definition N 1 r x [l] = x[n]x[n + l] l = 0, 1,..., L 1 n=0 Efficient Computation X [f ] = DFT {X (t)} S[f ] = X [f ]X [f ] R[l] = DFT 1 {S[f ]} G. Tzanetakis 11 / 32

Autocorrelation examples Figure : C4 Sine Figure : C4 Clarinet Note G. Tzanetakis 12 / 32

Average Magnitude Difference Function The average magnitude difference function also shifts the signal but instead of multiplication uses subtraction to detect periodicities as nulls. No multiplications make it efficient for DSP chips and real-time processing. Definition N 1 AMDF (m) = x[n] x[n + m] k n=0 G. Tzanetakis 13 / 32

AMDF Examples Figure : C4 Sine Figure : C4 Clarinet Note G. Tzanetakis 14 / 32

Frequency Domain Pitch Detection Figure : C4 Sine Figure : C4 Clarinet Note Fundamental frequency (as well as pitch) will correspond to peaks in the spectrum (not necessarily the highest though). G. Tzanetakis 15 / 32

Plotting over time Figure : Spectrogram Figure : Correlogram [Sound] G. Tzanetakis 16 / 32

Modern pitch detection Modern pitch detection algorithm are based on the basic approaches we have presented but with various enhancements and extra steps to make them more effective for the signals of interest. Open source and free implementations available. YIN from the yin and yang of oriental philosophy that alludes to the interplay between autocorrelation and cancellation. SWIPE a sawtooh waveform inspired pitch estimator based on matching spectra G. Tzanetakis 17 / 32

Pitch Perception Pitch is not just fundamental frequency Periodicity or harmonicity or both? How can perceived pitch be measured? A common approach is to adjust sine wave until match In 1924 Fletcher observed that one can still hear a pitch when playing harmonic partials missing the fundamental frequency (i.e bass notes with small radio) G. Tzanetakis 18 / 32

Duplex theory of pitch perception Proposed by J.C.R Licklider in 1951 (also a realy visionary regarding the future of computers) One perception but two overlapping mechanisms Counting cycles of a period < 800Hz Place of excitation along basilar membrane > 1600Hz G. Tzanetakis 19 / 32

The human auditory system Incoming sound generates a wave in the fluid filled cochlea (causing the basilar membrane to be displaced - 15000 inner hair cells). Originally it was thought that the chochlea acted as a frequency analyzer similar to the Fourier transform and the perceived pitch was based on the place of highest excitation. Evidence from both perception and biophysics showed that pitch perception can not be explained solely by the place theory. G. Tzanetakis 20 / 32

Auditory Models From On the importance of time: a temporal representation of sound by Malcolm Slaney and R. F. Lyon. G. Tzanetakis 21 / 32

Perceptual Pitch Scales Attempt to quantify the perception of frequency Typically obtained through just noticeable difference (JND) experiments using sine waves All agree that perception is linear in frequency below a certain breakpoint and logarithmic above it, but disagree on what that breakpoint is (popular choices include 1000, 700, 625 and 228) Examples: Mel, Bark, ERB G. Tzanetakis 22 / 32

Musical Pitch In many styles of music a set of finite and discrete frequencies are used rather than the whole frequency continuum. The fundamental unit that is subdivided is the octave (ratio of 2 in frequency). Tuning systems subdivide the octave logarithmically into distinct intervals Tension between harmonic ratios for consonant intervals, desire to modulate to different keys, regularlity, and presence of pure fifths (ratio of 1.5 or 3:2) G. Tzanetakis 23 / 32

Tuning systems Just Intonation uses integer ratios that make intervals sound more consonant: 1, 9, 5, 4, 3, 5, 15, 2 1 8 4 3 2 3 8 1 Pythagorean tuning derives all notes from perfect fifths 3 ( 1, 256, 9,... ). Pythagorean comma (about 1 of a 2 1 243 8 4 semitone) reguired to get to a correct octave 2. 1 Equal Temperament is what is used today. All notes are spaced by logarithmically equal distances. Each step is higher by 12 2 i.e to go up a step you need to multiply the current frequency by 12 2 = 1.0594 G. Tzanetakis 24 / 32

Notation The 12 notes corresponding to each octave are mapped to white and black keys on a piano keyboard. The white keys are named using letters (A,B,C,D,E,F,G) or syllables (Do, Re, Mi, Fa, Sol, La, Ti) and the black keys are referenced using modifiers (flat # or sharp b). For example the black key to the right of a C can be referenced as either a C# or a Db. G. Tzanetakis 25 / 32

MIDI In order to associate each note with an actual frequency a reference tuning must be provided for one note. Today the common choice is A4 and 440Hz. MIDI (Music Instrument Digital Interface) which is a digital format for storing pitch and timing information, stores each note as an integer between 0 and 128. Converting from frequency f to MIDI note number m can be done as follows: m = 69 + 12log 2 (f /440) G. Tzanetakis 26 / 32

Pitch Helix Pitch perception has two dimesions: Height: naturally organizes pitches from low to high Chroma: represents the inherent circularity of pitch (octaves) Linear pitch (i.e log(frequency)) can be wrapped around a cylinder to mode the octave equivalence. G. Tzanetakis 27 / 32

From frequency to musical pitch Sketch of a simple pitch detection algorithm Perform the FFT on a short segment of audio typically around 10-20 milliseoncds Select the bin with the highest peak Convert the bin index k to a frequency f in Hertz: f = k (Sr/N) where Sr is the sampling rate, and N is the FFT size. Map the value in Hertz to a MIDI note number m = 69 + 12log 2 (f /440) G. Tzanetakis 28 / 32

Table of Contents I 1 Motivation and Terminology 2 Psychacoustics 3 F0 estimation 4 Example Applications G. Tzanetakis 29 / 32

Query by Humming (QBH) Users sings a melody [Musart QBH examples] Computer searches a database of refererence tracks for a track that contains the melody Monophonic pitch extraction is the first step Many more challenges: difficult queries, variations, tempo changes, partial matches, efficient indexing Commercial implementation: Midomi/SoundHound Academic search for classical music: Musipedia G. Tzanetakis 30 / 32

Chant analysis Computational Ethnomusicology Transition from oral to written transmission Study how diverse recitation traditions having their origin in primarily non-notated melodies later became codified Cantillion - joint work with Daniel Biro [Link] G. Tzanetakis 31 / 32

Summary There are many fundamental frequency estimation (sometimes also called pitch detection) algorithms It is important to distinguish between fundamental frequency, measured pitch and perceived pitch F0 estimation algortihms can roughly be categorized as time-domain, frequency-domain and perceptual Query-by-humming requires a monophonic pitch extraction step Chant analysis is another more academic application G. Tzanetakis 32 / 32