Automatic music transcription

Similar documents
Tempo and Beat Analysis

Transcription of the Singing Melody in Polyphonic Music

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

Tempo and Beat Tracking

THE importance of music content analysis for musical

Topic 10. Multi-pitch Analysis

Robert Alexandru Dobre, Cristian Negrescu

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

CSC475 Music Information Retrieval

A MID-LEVEL REPRESENTATION FOR CAPTURING DOMINANT TEMPO AND PULSE INFORMATION IN MUSIC RECORDINGS

POST-PROCESSING FIDDLE : A REAL-TIME MULTI-PITCH TRACKING TECHNIQUE USING HARMONIC PARTIAL SUBTRACTION FOR USE WITHIN LIVE PERFORMANCE SYSTEMS

Query By Humming: Finding Songs in a Polyphonic Database

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

SINGING PITCH EXTRACTION BY VOICE VIBRATO/TREMOLO ESTIMATION AND INSTRUMENT PARTIAL DELETION

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Music Information Retrieval

DAT335 Music Perception and Cognition Cogswell Polytechnical College Spring Week 6 Class Notes

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

TOWARD AN INTELLIGENT EDITOR FOR JAZZ MUSIC

MUSIC is a ubiquitous and vital part of the lives of billions

Beethoven, Bach, and Billions of Bytes

Measurement of overtone frequencies of a toy piano and perception of its pitch

HUMANS have a remarkable ability to recognize objects

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Music Complexity Descriptors. Matt Stabile June 6 th, 2008

Music Radar: A Web-based Query by Humming System

HUMAN PERCEPTION AND COMPUTER EXTRACTION OF MUSICAL BEAT STRENGTH

Voice & Music Pattern Extraction: A Review

MODELING RHYTHM SIMILARITY FOR ELECTRONIC DANCE MUSIC

Automatic Rhythmic Notation from Single Voice Audio Sources

Music Source Separation

Interacting with a Virtual Conductor

Analysis, Synthesis, and Perception of Musical Sounds

Experiments on musical instrument separation using multiplecause

Multiple instrument tracking based on reconstruction error, pitch continuity and instrument activity

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

EE391 Special Report (Spring 2005) Automatic Chord Recognition Using A Summary Autocorrelation Function

Topic 4. Single Pitch Detection

ON FINDING MELODIC LINES IN AUDIO RECORDINGS. Matija Marolt

ON THE USE OF PERCEPTUAL PROPERTIES FOR MELODY ESTIMATION

PULSE-DEPENDENT ANALYSES OF PERCUSSIVE MUSIC

A prototype system for rule-based expressive modifications of audio recordings

Automatic Construction of Synthetic Musical Instruments and Performers

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

Music Segmentation Using Markov Chain Methods

Acoustic Scene Classification

MUSICAL meter is a hierarchical structure, which consists

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Semi-automated extraction of expressive performance information from acoustic recordings of piano music. Andrew Earis

2 2. Melody description The MPEG-7 standard distinguishes three types of attributes related to melody: the fundamental frequency LLD associated to a t

Introductions to Music Information Retrieval

Music Representations

2. AN INTROSPECTION OF THE MORPHING PROCESS

HST 725 Music Perception & Cognition Assignment #1 =================================================================

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

A CHROMA-BASED SALIENCE FUNCTION FOR MELODY AND BASS LINE ESTIMATION FROM MUSIC AUDIO SIGNALS

Lecture 10 Harmonic/Percussive Separation

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

Topics in Computer Music Instrument Identification. Ioanna Karydi

Effects of acoustic degradations on cover song recognition

AN ARTISTIC TECHNIQUE FOR AUDIO-TO-VIDEO TRANSLATION ON A MUSIC PERCEPTION STUDY

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

Author Index. Absolu, Brandt 165. Montecchio, Nicola 187 Mukherjee, Bhaswati 285 Müllensiefen, Daniel 365. Bay, Mert 93

AUTOMATIC CHARACTERIZATION OF DIGITAL MUSIC FOR RHYTHMIC AUDITORY STIMULATION

CS229 Project Report Polyphonic Piano Transcription

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Investigation of Digital Signal Processing of High-speed DACs Signals for Settling Time Testing

A Beat Tracking System for Audio Signals

Krzysztof Rychlicki-Kicior, Bartlomiej Stasiak and Mykhaylo Yatsymirskyy Lodz University of Technology

Further Topics in MIR

Study of White Gaussian Noise with Varying Signal to Noise Ratio in Speech Signal using Wavelet

Lecture 9 Source Separation

Subjective Similarity of Music: Data Collection for Individuality Analysis

Music Representations

Semi-supervised Musical Instrument Recognition

MUSI-6201 Computational Music Analysis

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

POLYPHONIC TRANSCRIPTION BASED ON TEMPORAL EVOLUTION OF SPECTRAL SIMILARITY OF GAUSSIAN MIXTURE MODELS

APPLICATIONS OF A SEMI-AUTOMATIC MELODY EXTRACTION INTERFACE FOR INDIAN MUSIC

Onset Detection and Music Transcription for the Irish Tin Whistle

Tempo Estimation and Manipulation

Timing In Expressive Performance

Efficient Vocal Melody Extraction from Polyphonic Music Signals

Rhythm related MIR tasks

Music Information Retrieval with Temporal Features and Timbre

CONTENT-BASED MELODIC TRANSFORMATIONS OF AUDIO MATERIAL FOR A MUSIC PROCESSING APPLICATION

Violin Timbre Space Features

An Empirical Comparison of Tempo Trackers

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

Simple Harmonic Motion: What is a Sound Spectrum?

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Music Tempo Estimation with k-nn Regression

CHAPTER 4 SEGMENTATION AND FEATURE EXTRACTION

Improving Beat Tracking in the presence of highly predominant vocals using source separation techniques: Preliminary study

POLYPHONIC INSTRUMENT RECOGNITION USING SPECTRAL CLUSTERING

WE ADDRESS the development of a novel computational

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

AUTOREGRESSIVE MFCC MODELS FOR GENRE CLASSIFICATION IMPROVED BY HARMONIC-PERCUSSION SEPARATION

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Transcription An Historical Overview

Transcription:

Music transcription 1 Music transcription 2 Automatic music transcription Sources: * Klapuri, Introduction to music transcription, 2006. www.cs.tut.fi/sgn/arg/klap/amt-intro.pdf * Klapuri, Eronen, Astola: Analysis of the Meter of Acoustic Musical Signals, IEEE TASLP 2006. * Klapuri, Multiple fundamental frequency estimation by summing harmonic amplitudes, ISMIR 2006. * Ryynänen, Klapuri, Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music Journal, 2008. 1 Introduction to music transcription Contents: Introduction to music transcription Rhythm analysis Multiple-F0 analysis Acoustic and musicological models Vocals separation and lyrics Application to music retrieval Music transcription 3 / klap Music transcription 3 Music transcription Music transcription 4 Excerpt from Song #034 in the RWC popular music database: Figures top-down: 1. time-domain signal 2. spectrogram 3. musical notation 4. piano roll Complete vs. partial transcription complete transcription is sometimes impossible or irrelevant partial transcription: for example melody / bass line / percussions / chords etc. Applications and related areas music retrieval structured audio coding intelligent processing / effects stage lighting, automatic accompaniment etc. equipment computer games music perception Anything missing?

Perspectives on music transcription Music transcription 5 Perspectives on music transcription Acoustic and musicological models Music transcription 6 Music transcription is a wide topic It is useful to structure the problem by decomposing it into smaller and more tractable subproblems Speech recognition systems depend on language models e.g. probabilities of different word sequences (N gram models) Musicological information is equally important for transcription e.g. probabilities of tone sequences or combinations instrument models P( ) P( ) Acoustic signal Internal models Analysis Result Music transcription 7 Time structure analysis Music transcription 8 2 Onset detection and meter analysis Onset detection = Detection of the beginnings of sounds in an acoustic signal for example tapping foot to music (beat tracking) may include several time scales detect moments of musical stress in an audio signal and discover underlying periodicities in them Applications beat-synchronous feature extraction temporal framework for audio editing synchronization of audio/audio or audio/video

Music transcription 9 Music transcription 10 Measuring degree of change in music Characterizes the temporal regularity of the moments of stress Basic idea is to analyse the periodicity of the change signal Figure: Musical meter is hierarchical structure pulse sensations at different time scales tactus level is the most prominent ( foot tapping rate ) tatum: time quantum (fastest pulse) measure pulse: related to harmonic change rate Moments of change are important for onset detection and meter analysis change in the intensity, pitch or timbre of a sound moments of musical stress (accents) are caused by the beginnings of sound events, sudden changes in loudness or timbre, harmonic changes Perceptual change should be estimated to detect what humans detect and to ignore what humans ignore musically meaningful rhythmic parsing Measuring degree of change in music Music transcription 11 Measuring degree of change in music Music transcription 12 Time-domain signal some data reduction is needed But: the power envelope of a signal is not sufficient Frequency selectivity of hearing: audibility of a change at each critical band is only affected by the spectral components within the same band components within a single critical band may mask each other but this does not happen if the frequency separation is sufficiently large Measure change independently at critical bands, and then combine the results Scheirer: perceived rhythmic content of many music types remains the same if only the power envelopes of a few subbands are preserved and then used to modulate a white noise signal one band is not enough applies to music with strong beat

Music transcription 13 Measuring degree of change in music: In practice Measuring degree of change in music Degree of change at each band Music transcription 14 music signal Filterbank... Perceived change at subband d / dt xb ( n) x ( n) b d ln xb ( n) dt Filterbank: Fourier transforms in successive ~ 20ms time frames (50% overlap) in each frame n, measure the power x b (n) within b=1,2,...,36 triangular-response bandpass filters that are uniformly distributed on Mel-frequency scale (50Hz 20kHz) f Mel... Combine results output 2595log10(1 f Hz ) 700 Denote by x b (n) the power at critical band b=1,...,36 as a function of time (frame index) n How to measure the degree of change at subbands? Differential? For humans, the smallest detectable change in intensity, I, is approximately proportional to the intensity I of the signal, the same amount of increase being more prominent in a quiet signal. Audible ratio I / I is approximately constant Thus it is reasonable to normalize the differential of power with power: d / dt xb ( n) x ( n) b d ln dt Figure (piano onset): dashed line: (d/dt) x b (n) solid line: (d/dt) ln[x b (n)] x ( n) b Measuring degree of change in music Degree of change at each band A numerically robust way of calculating the logarithm is the µ-law compression, y b n ln 1 ln 1 x b n Music transcription 15 Measuring degree of change in music Summary music signal Filterbank x b (n)... Perceived change at subband... Music transcription 16 Combine results output v(n) constant determines the degree of compression for x b (n) ( =10...10 4 / x ) Differentiate, and retain only positive changes (HWR(x)=max(x, 0)): y b (n) = HWR{y b (n) y b (n 1)} power envelope -law compress d / dt, rectify Finally: sum across channels to estimate overall change 36 ( ) = '( ) =1

Measured change signals Music transcription 17 Degree of change ( accent ) Music transcription 18 v(n) v(n) v(n) signal level adaptation would be needed Accent signals (degree of change) Degree of accent as a function of time As described above Pulse strengths ( saliences ) Music transcription 19 Bank of comb filters Music transcription 20 Use bank of comb filters for periodicity analysis We used a = 0.5 where T is half-time in samples (3s) x(n) 1-a a z -k y(n) Metrical pulse saliences Strengths of different metrical pulses at time n (resonator energies) Use comb filters for period analysis Magnitude response: a = 0.9 k = 7 Impulse response:

Bank of comb filters Music transcription 21 Higher-level modeling Music transcription 22 Time-varying energies of each comb filter in the filterbank r(,n), input impulse train r(,n), input white noise (t, )= 1 t = -t +1 2 Ø º (t, ) ø ß Figure: r(,n), 1,2,...,100 for an impulse train (period 24 samples) and for white noise r(,n) can be further normalized to get rid of the trend (details are beyond the scope of this course) Meter tatum, tactus, measure Higher-level modeling Music transcription 23 Demonstrations Music transcription 24 Observed: (normalized) comb filter energies r(,n) http://www.cs.tut.fi/~klap/iiro/meter/ Prior probabilities (typical tempo values): log-normal distribution Temporal continuity constraints: p(next tempo / prev tempo)

Music transcription 25 Introduction Music transcription 26 3 Polyphonic pitch analysis Pitch information is an essential part of almost all Western music Extracting pitch information from recorded audio is hard spectrogram can be calculated straightforwardly piano-roll... more tricky Multiple F0 estimation = F0 estimation in polyphonic signals music variety of sources, wide pitch range, presence of drums A number of completely different approaches have been proposed in the literature Musical sounds Most Western instruments produce harmonic sounds Figure: trumpet sound (260Hz) in time and frequency domains period in time-domain: 1/F0 period in frequency-domain: F0 Music transcription 27 1/ F0 Music transcription 28 How about just autocorrelation function (ACF)? Autocorrelation function (ACF) based algorithms are among the most frequently used single-pitch estimators Usually the maximum value in ACF is taken as 1/F0 period Short-time ACF r( ) for a discrete time domain signal x(n): r( ) 1 N N n 1 n 0 x( n) x( n ) F0 Signal x(n): (vowel [ae]) Frequency (Hz) ACF:

Autocorrelation function Music transcription 29 Autocorrelation function Music transcription 30 Short-time ACF within a time frame of length N : N 1 r( ) xnxn ( ) ( ) n 0 Short-time ACF for real-valued signals can be computed via FFT as K /2 1 2 2 2 k 2 r( ) IDFT X k cos X k K K where IDFT is inverse Fourier transform and X(k) is DFT of x(n) (padding zeros so that FFT length is twice the length of x) The latter identity is true only for real-valued (audio) signals k 0 From the frequency-domain interpretation, we see at least three properties of ACF that make it non-robust for the period analysis of polyphonic audio the entire spectrum is used (weighting with values btw -1 and 1) all integer multiples of f s / are given the same (unity) weight squaring the spectrum emphasizes timbral properties (formants etc.) In the following, we propose a method which makes three basic modifications to ACF to enhance its robustness 1. sharper peaks (cf. comb filter); 2. weight harmonics ( 1 g(,m) More reliable method* Music transcription 31 Proposed method Summing harmonic amplitudes Music transcription 32 Starting point is conceptually very simple 1. Input signal is first spectrally flattened ( whitened ) to suppress timbral information 2. The salience (strength) of a F0 candidate is calculated as a weighted sum of the amplitudes of its harmonic partials ( t )= t, =1 ( ) ( t, ) where f,m = mf s / is the frequency of the m:th harmonic partial of a F0 candidate f s / f s is the sampling rate, and function g(,m) defines the weight of partial m of period in the sum Y(f) is the short-time Fourier transform of the whitened time-domain signal The basic idea of harmonic summation is intuitively appealing: pitch perception is closely related to time-domain periodicity of sounds Fourier theorem states that a periodic signal can be represented with spectral components at integer multiples of the inverse of the period Question of an optimal mapping of the Fourier spectrum to pitch spectrum (or, a piano roll) is closely related to these methods here, function g(,m) is learned by brute-force optimization ( 300Hz): * Klapuri, A., Multiple fundamental frequency estimation by summing harmonic amplitudes," 7th International Conference on Music Information Retrieval, Victoria, Canada, Oct. 2006. M s g, m Y f m 1, m g, m g1 g2 f, m fs / mf / s 1 m

Proposed method Spectral whitening Music transcription 33 Proposed method Calculation of the F0 salience function Music transcription 34 One of the big challenges in F0 estimation is to make systems robust for different sound sources A way to achieve this is to try to suppress timbral information prior to the actual F0 estimation Whitening 1. Calculate DFT X(k) of the input signal x(n) 2. Calculate standard deviations b (= sqrt(power)) within subbands in the frequency domain (square and sum frequency bins within bands, then sqrt) 3. Calculate bandwise compression coefficients b = b / b, where = 0.3 is a parameter determining the amount of spectral whitening 4. Whitened spectrum Y(k) is obtained by weighting each subband with its compression coefficent and then recombining the subbands Calculated as M s g, m max Y k m 1 k where the set,m defines a range of frequency bins in the vicinity of the m:th overtone of the F0 candidate f s / : where denotes rounding and denotes spacing between fundamental period candidates ( = 1 or 0.5) Weight function was found by optimisation (, m k t, = / t+dt /2 g, m g g f 1 2, m ( ),, /( t-dt /2) fs / mf / s 300Hz): Proposed method Predominant F0 estimation Music transcription 35 Maximum of the salience function s( ) is a quite robust indicator of one of the correct F0s in a polyphonic audio signal predominant F0 estimation: find one (any) of the correct F0s But the second or third-highest peak is often due to the same sound and located at that is half or twice the position of the highest peak Multiple-F0 estimation accuracy can be improved by an iterative estimation and cancellation scheme where each detected sound is cancelled from the mixture and s( ) is updated accordingly before deciding the next F0 Iterative estimation and cancellation Music transcription 36 Step 1: Residual spectrum Y R (k) is initialized to Y(k). A spectrum of detected sounds, Y D (k), is initialized to zero. Step 2: Fundamental period 0 is estimated using Y R (k) to compute s( ). The maximum of s( ) determines 0 Step 3: Harmonic partials of 0 are located at bins mk / 0 m=1,2,...m. Spectrum of the time-domain window function is translated to those frequencies, weighted by g(,m) and added to Y D (k). Step 4: The residual spectrum is updated as Y R (k) max(0, Y R (k) d Y D (k)) where d = 0.2 is a free parameter. Step 5: Return to Step 2. Y R (k)

Iterative estimation and cancellation Music transcription 37 F0 gram : piano-roll with confidence levels Music transcription 38 first,... second,... third,... fourth iteration: Music transcription 39 F0 gram : piano-roll with salience (RWC-P #25) Music transcription 40 F0 gram : piano-roll with salience (RWC-P #95)

Remarks Music transcription 41 Music transcription 42 The principle of summing harmonic amplitude is very simple, yet it suffices for predominant-f0 estimation in polyphonic signals, provided that the weight g(,m) are appropriate Iterative detection and cancellation helps to remove harmonics and subharmonics of already detected sounds and to reveal remaining sounds behind the most prominent ones Reasonably accurate for a wide range of instruments and F0s 4 Acoustic and musicological modeling Music transcription 43 Music transcription 44 Why acoustic modeling of notes? Acoustic modeling of notes Frame-wise F0 strengths must be processed to get discrete notes (MIDI, score) pitch quantization, onsets, offsets clean up frame-wise errors 1. Extract frame-wise F0 salience (strength) and its differential (here not doing peak-picking or iterative cancellation) Examples in the following Ryynänen, M. and Klapuri, A., Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music Journal, 32(3), Fall 2008. Ryynänen, Klapuri, WASPAA 2005. 2. Use training data (RWC Popular Music database) to learn acoustic models for note events (100 pieces with audio + time-aligned MIDI)

Music transcription 45 Music transcription 46 Music transcription system Music transcription system Figure: Acoustic model Musicological model: musical key estimation N-gram models for note sequences Combination of an acoustic model and a musicological model (HMMs) Music transcription 47 Music transcription 48 Transcription examples Case study: Singing transcription Complete polyphonic transcriptionhttp://www.cs.tut.fi/sgn/arg/matti/demos/polytrans.html Ryynänen, Klapuri, Modeling of note events for singing transcription, SAPA Workshop, 2004. Transcription of melody, bass, and chords: http://www.cs.tut.fi/sgn/arg/matti/demos/mbctrans/ acoustic signal Feature extraction pitch voicing, accent, meter Probabilistic models discrete note sequence Estimated pitch track has to be post-processed to get notes

Case study: Singing transcription Music transcription 49 Brother can you spare me a dime Pieni tytön tylleröinen