Topic 10. Multi-pitch Analysis

Similar documents
Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Topics in Computer Music Instrument Identification. Ioanna Karydi

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Transcription of the Singing Melody in Polyphonic Music

Automatic music transcription

Topic 1. Auditory Scene Analysis

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

Topic 4. Single Pitch Detection

Experiments on musical instrument separation using multiplecause

THE importance of music content analysis for musical

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

AUTOMATIC music transcription (AMT) is the process

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Tempo and Beat Analysis

POLYPHONIC TRANSCRIPTION BASED ON TEMPORAL EVOLUTION OF SPECTRAL SIMILARITY OF GAUSSIAN MIXTURE MODELS

Multipitch estimation by joint modeling of harmonic and transient sounds

City, University of London Institutional Repository

EVALUATION OF MULTIPLE-F0 ESTIMATION AND TRACKING SYSTEMS

A Shift-Invariant Latent Variable Model for Automatic Music Transcription

Music Information Retrieval with Temporal Features and Timbre

AN ACOUSTIC-PHONETIC APPROACH TO VOCAL MELODY EXTRACTION

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. X, NO. X, MONTH 20XX 1

Musical Instrument Identification Using Principal Component Analysis and Multi-Layered Perceptrons

Supervised Learning in Genre Classification

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

Query By Humming: Finding Songs in a Polyphonic Database

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Introductions to Music Information Retrieval

CS229 Project Report Polyphonic Piano Transcription

Note on Posted Slides. Noise and Music. Noise and Music. Pitch. PHY205H1S Physics of Everyday Life Class 15: Musical Sounds

A NOVEL CEPSTRAL REPRESENTATION FOR TIMBRE MODELING OF SOUND SOURCES IN POLYPHONIC MIXTURES

A CLASSIFICATION APPROACH TO MELODY TRANSCRIPTION

Automatic Piano Music Transcription

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Proc. of NCC 2010, Chennai, India A Melody Detection User Interface for Polyphonic Music

/$ IEEE

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

6.UAP Project. FunPlayer: A Real-Time Speed-Adjusting Music Accompaniment System. Daryl Neubieser. May 12, 2016

Voice & Music Pattern Extraction: A Review

2. AN INTROSPECTION OF THE MORPHING PROCESS

Recognising Cello Performers Using Timbre Models

Efficient Vocal Melody Extraction from Polyphonic Music Signals

HUMANS have a remarkable ability to recognize objects

Robert Alexandru Dobre, Cristian Negrescu

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

Automatic Rhythmic Notation from Single Voice Audio Sources

Polyphonic music transcription through dynamic networks and spectral pattern identification

Classification of Musical Instruments sounds by Using MFCC and Timbral Audio Descriptors

Transcription An Historical Overview

Statistical Modeling and Retrieval of Polyphonic Music

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Classification of Timbre Similarity

Recognising Cello Performers using Timbre Models

Simple Harmonic Motion: What is a Sound Spectrum?

MAutoPitch. Presets button. Left arrow button. Right arrow button. Randomize button. Save button. Panic button. Settings button

AUD 6306 Speech Science

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

TOWARDS EXPRESSIVE INSTRUMENT SYNTHESIS THROUGH SMOOTH FRAME-BY-FRAME RECONSTRUCTION: FROM STRING TO WOODWIND

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

MUSIC TRANSCRIPTION USING INSTRUMENT MODEL

WE ADDRESS the development of a novel computational

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

Data Driven Music Understanding

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

A prototype system for rule-based expressive modifications of audio recordings

International Journal of Advance Engineering and Research Development MUSICAL INSTRUMENT IDENTIFICATION AND STATUS FINDING WITH MFCC

Measurement of overtone frequencies of a toy piano and perception of its pitch

A PERPLEXITY BASED COVER SONG MATCHING SYSTEM FOR SHORT LENGTH QUERIES

LEARNING SPECTRAL FILTERS FOR SINGLE- AND MULTI-LABEL CLASSIFICATION OF MUSICAL INSTRUMENTS. Patrick Joseph Donnelly

Automatic Construction of Synthetic Musical Instruments and Performers

A probabilistic framework for audio-based tonal key and chord recognition

Musical Instrument Recognizer Instrogram and Its Application to Music Retrieval based on Instrumentation Similarity

Music Information Retrieval for Jazz

Automatic Transcription of Polyphonic Vocal Music

Hidden Markov Model based dance recognition

A DISCRETE FILTER BANK APPROACH TO AUDIO TO SCORE MATCHING FOR POLYPHONIC MUSIC

ON THE USE OF PERCEPTUAL PROPERTIES FOR MELODY ESTIMATION

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Cross-Dataset Validation of Feature Sets in Musical Instrument Classification

pitch estimation and instrument identification by joint modeling of sustained and attack sounds.

The Research of Controlling Loudness in the Timbre Subjective Perception Experiment of Sheng

Comparison Parameters and Speaker Similarity Coincidence Criteria:

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Singing accuracy, listeners tolerance, and pitch analysis

Transcription:

Topic 10 Multi-pitch Analysis

What is pitch? Common elements of music are pitch, rhythm, dynamics, and the sonic qualities of timbre and texture. An auditory perceptual attribute in terms of which sounds may be ordered from low to high. ---- Wikipedia For (quasi) harmonic sound e.g. a flute note, it is well defined by the Fundamental Frequency (F0). Oboe C4 Oboe G4 Clarinet C4 A mixture of (quasi) harmonic sounds has multiple pitches (F0s). 2

Multi-pitch Analysis of Polyphonic Music Given polyphonic music played by several harmonic instruments Estimate a pitch trajectory for each instrument 3

Why is it important? A fundamental problem in computer audition for harmonic sounds Many potential applications Automatic music transcription Harmonic source separation Melody-based music search Chord recognition Music education 4

How difficult is it? Let s do a test! Chord 1 Chord 2 Q1: How many pitches are there? 2 3 Q2: What are their pitches? C4/G4 C4/F4/A4 Q3: Can you find a pitch in Chord 1 and a pitch in Chord 2 that are played by the same instrument? Clarinet G4 Horn C4 Clarinet A4 Viola F4 Horn C4 5

We humans are amazing! In Rome, he (14 years old) heard Gregorio Allegri's Miserere once in performance in the Sistine Chapel. He wrote it out entirely from memory, only returning to correct minor errors... -- Gutman, Robert (2000). Mozart: A Cultural Biography Wolfgang Amadeus Mozart Can we make computers compete with Mozart?? 6

Our Task Spectrogram Groundtruth pitch trajectories 7

Subtasks in Multi-pitch Analysis Three levels according to MIREX 2007-2015: Level 1: Multi-pitch Estimation (MPE) Estimate pitches and polyphony in each time frame Level 2: Note Tracking Track pitches within a note Level 3: Streaming (timbre tracking) Estimate a pitch trajectory for each source (instrument) across multiple notes 8

State of the Art Level 1: Multi-pitch Estimation Klapuri 03, Goto 04, Davy 06, Klapuri 06, Yeh 05, Emiya 07, Pertusa 08, Duan 10, etc. Level 2: Note Tracking Ryynanen 05, Kameoka 07, Poliner 07, Lagrange 07, Chang 08, Benetos 11, Cogliati 16, Ewert 17, etc. Level 3: Streaming (timbre tracking) Vincent 06, Bay 12, Duan 14 9

Level 1: Multi-pitch Estimation Estimate pitches in each single frame

Multi-pitch Estimation (MPE) Why difficult? Clarinet, C4 C4 Major Overlapping harmonics C4 (46.7%), E4 (33.3%), G4 (60%) How to associate the 28 significant peaks to sources? Instantaneous polyphony estimation Large hypothesis space 11

Two Methods at Level 1 Iterative spectral subtraction [Klapuri, 2003] Probabilistic modeling of peaks and nonpeak regions [Duan et al., 2010] 12

Iterative Spectral Subtraction [Klapuri, 2003] 13

Bandwise F0 Estimation magnitude spectrum in Band b original magnitude spectrum (noise reduced) 14

Bandwise F0 Estimation # of partials Weight of F0 hyp, n Normalization factor Freq. offset 15

Integrate Weights Across Subbands Piano note (65Hz) Piano note (470Hz) Inharmonicity of higher harmonics should be considered 16

Spectral Subtraction Given the estimated predominant F0, we can find out all its harmonics and subtract their energy from the mixture spectrum. How much energy should we subtract? All? Some harmonics are overlapped by those of other F0s, hence their energy is larger. 17

Spectral Smoothness 18

Polyphony Estimation I.e., when to stop the iterations? Stop if the energy of the harmonics of the estimated predominant F0 is smaller than a threshold. 19

Error Rate More errors in later iterations 20

Advantages Simple idea Fast algorithm Discussions Handles inharmonicity Disadvantages Spectra in later iterations severely corrupted Spectral smoothness is not enough to determine the amount of energy to subtract Why bandwise estimation? 21

Amplitude Probabilistic Modeling of Peaks A maximum likelihood estimation method θ = arg max θ Θ p O θ [Duan et al., 2010] Best pitch estimate (a set of pitches) Observed power spectrum Pitch hypothesis, (a set of pitches) Spectrum: peaks & the non-peak region Fourier Transform Power Spectrum: Frequency 22

Peaks / Non-peak Region Peaks: ideally correspond to harmonics Non-peak region: frequencies further than a threshold from any peak 23

Likelihood as Dual Parts p O θ = p O peak θ p O non peak θ Probability of observing these peaks: f k, a k, k = 1,, K. Probability of not having any harmonics in the non-peak region Pitch hyp True pitch True pitch Pitch hyp p O peak θ is large p O non peak θ is small p O peak θ is small p O non peak θ is large 24

Likelihood as Dual Parts p O θ = p O peak θ p O non peak θ Probability of observing these peaks: f k, a k, k = 1,, K. Probability of not having any harmonics in the non-peak region True pitch Pitch hyp p O peak θ is large p O non peak θ is large 25

Likelihood Models p O peak θ Frequency and Amplitude of the k-th peak Probability of observing these peaks p O non peak θ Probability of not having any harmonics in the non-peak region Freq of the h-th harmonic The h-th harmonic of F0 exists or not Learned from training data 26

For polyphonic music Model Training 3000 random chords of polyphony 1 to 6 Mixed using note samples from 16 instruments with pitch ranges from C2 (65 Hz) to B6 (1976 Hz) For multi-talker speech 500 speech excerpts with 1-3 simultaneous talkers Mixed from single-talker speech Obtained ground-truth pitches before mixing 27

Greedy Search Algorithm θ = arg max θ Θ p O θ Parameter space is too big for exhaustive search Greedy search algorithm Initialize θ = For i = 1 to MaxPolyphony Add a pitch to θ, s.t. likelihood increases most End Estimate polyphony N Return the first N pitches of θ 28

Polyphony Estimation Likelihood increases with estimated polyphony Polyphony estimate Likelihood increase with polyphony from 1 to MaxPolyphony T is set to 0.88 empirically 29

Experiments Polyphony Estimation 6000 musical chords mixed using notes unseen in training data (1000 for each polyphony) 30

Post Processing Estimation in each single frame is not robust Insertion, deletion and substitution errors Refine estimates using neighboring frames Only keep consistent estimates 31

Advantages Discussions Model parameters can be learned from training data Disadvantages Assumes conditional independence of peak amplitudes, given F0s Doesn t consider the relation between peak amplitudes, e.g., spectral smoothness 32

Level 2: Note Tracking Estimate a pitch trajectory for each note

Two Methods at Level 2 Probabilistic modeling of the spectraltemporal content a note of a source [Kameoka, et al., 2007] Classification-based piano note transcription [Poliner & Ellis, 2007] 34

Harmonic Temporal Structured Clustering (HTC) [Kameoka et al, 2007] Jointly estimates pitch, intensity, onset, duration of notes. Detailed parametric model for the spectral content of a note of a source Approximating the spectrogram with superimposed HTC source models 35

HTC Source Model Relative energy of n-th harmonic Harmonic envelope over time Total energy of the source Pitch 36

The Model in A Single Frame 37

Harmonic Envelope Onset time 38

Reconstruction using HTC models Activation weight of source k 39

Model parameters The Unknowns Pitch, onset time, harmonic width, harmonic envelope over time, duration, etc. Latent variable Activation weights of sources EM algorithm 40

Advantages Discussions Very detailed model Jointly estimates pitch, onset, duration, etc. Disadvantages Model is very complicated 41

Classification-based Piano Note Transcription [Poliner & Ellis, 2007] Train 88 (one-versus-all) SVM classifiers, one for each key of piano, from training audio frames Multi-label classification on each frame of the test audio Data: MIDI synthesized audio + Yamaha Disklavier playback grand piano Feature: a part of the magnitude spectrum 42

HMM Post Processing 88 HMMs, one for each key 2 states: the pitch (key) is on/off Transition probability: learned from training data Observation probability (state likelihood): the probabilistic output of SVMs Viterbi algorithm to refine pitch estimates 43

HMM Post Processing Result SVM probabilistic output, i.e. state likelihood Refined pitch estimates, overlaid with ground-truth pitches 44

Advantages Discussions The first classification-based transcription method Simple idea Easy to implement Disadvantages The classification and post-processing of piano keys are performed totally independently Induces more octave errors 45

Level 3: Multi-pitch Streaming Estimate a pitch trajectory for each harmonic source

Frequency Frequency A 2-stage System Stage 1: Estimate pitches in each single time frame [Duan et al., 2010] Time Stage 2: Connect pitch estimates across frames into pitch trajectories [Duan et al., 2014] Time 47

How to Stream Pitches?? Label pitches by pitch order in each frame, i.e. highest, second highest, third highest,? Connect pitches by continuity? Only achieves note tracking 48

Clustering Pitches by Timbre! Human use timbre to discriminate and track sound sources Timbre is that attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar. ---- American Standards Association 49

Pitch How to Represent Timbre? Harmonic structure [Duan et. al. 2008] violin clarinet Calculate for each pitch from the mixture 60 Magnitude (db) 40 20 0-20 Time -40 0 500 1000 1500 2000 2500 3000 Frequency (Hz) 50

Pitch Timbre Feature for Talkers Characterizes talkers Calculated from mixture Magnitude (db) 60 40 20 0-20 -40 0 500 1000 1500 2000 2500 3000 Frequency (Hz) Discrete Cosine Transform Time Uniform Discrete Cepstrum (UDC) 51

Clustering by timbre is not enough Pitch (MIDI number) Pitch (MIDI number) Ground-truth pitch trajectories 62 60 58 56 54 14 15 16 17 18 19 Time (second) K-means clustering with harmonic structure features 62 60 58 56 54 14 15 16 17 18 19 Time (second) 52

Frequency Use Pitch Locality Constraints Cannot-link: between simultaneous pitches (only for monophonic instruments) Must-link: between pitch estimates close in both time and frequency Time 53

Constrained Clustering Objective: minimize timbre inconsistency Constraints: pitch locality Inconsistent constraints: caused by incorrect pitch estimates, interweaving pitch trajectories, etc. Heavily constrained: nearly every pitch estimate is involved in at least one constraint Algorithm: iteratively update the clustering s.t. The objective monotonically decreases The set of satisfied constraints monotonically expands 54

The Proposed Algorithm f: objective function; C: all constraints; Π n : clustering in n-th iteration; C n : {constraints satisfied by Π n } ; 1. n 0; Start from an initial clustering <Π 0, C 0 >; 2. n n + 1; Find a new clustering Π n such that f Π n 1 > f Π n, and Π n also satisfies C n 1 ; 3. C n = {constraints satisfied by Π n }; so C n 1 C n It converges to some local minimum < Π, C >. f Π 0 > f Π 1 > > f Π C 0 C 1 C 55

Frequency Find A New Clustering to 1. Decrease the objective function 2. Satisfy satisfied constraints Swap set: a connected graph between two clusters by already satisfied constraints One more must link is satisfied now Try all swap sets to find one that decreases objective Time 56

Timbre Objective & Locality Constraints Results on 10 quartets played by violin, clarinet, saxophone and bassoon Accuracy of input pitch estimates Accuracy of random guess clustering 57

Works with Different MPE Methods Results on 60 duets, 40 trios, and 10 quartets : Duan 10 + Proposed : Klapuri 06 + Proposed : Pertusa 08 + Proposed 58

Example on Music Original violin (blue) Separated violin (blue) Pitch (MIDI number) Pitch (MIDI number) 90 80 70 60 50 Ground-truth Pitch Trajectories 40 0 5 10 15 20 25 Time (second) Our Result 90 80 70 60 50 40 0 5 10 15 20 25 Time (second) Original clarinet (green) Separated clarinet (green) 59

Comparisons on Speech 400 2-talker and 3-talker speech excerpts : Wohlmayr et al 11 : Hu & Wang 12 : Proposed 60

Example on Speech Ground-truth pitch trajectories Frequency (Hz) 300 200 100 0 10 20 30 40 Time (second) Frequency (Hz) Our Results 300 200 100 0 10 20 30 40 Time (second) 61

Advantages: Discussions Able to stream pitches across notes Considers both timbre and pitch location info Disadvantages: Algorithm is slow and complicated. Constraints are binary. Cannot deal with polyphonic instruments e.g. piano and guitar. 62