10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 10 Harmonic/Percussive Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research Center for IT Innovation, Academia Sinica
Syllabus W9: multiple instruments separation => dictionary based methods: nonnegative matrix factorization (NMF) and friends W10: harmonic/percussive separation (HPSS) => median filtering and friends W11: singing voice separation => low-rank based methods: robust principal component analysis (RPCA) and friends
Motivation: Drum Transcription The drum track in popular music conveys information about tempo, rhythm, style, and possibly the structure of a song From Wikipedia
Motivation: Beat, Tempo, Rhythmic Pattern The drum track in popular music conveys information about tempo, rhythm, style, and possibly the structure of a song Transcription and separation of drum signals from polyphonic music, TASLP 2008 Techniques for machine understanding of live drum performances, TR 2012
Motivation: Drum Pattern Analysis https://youtu.be/lm_oz7p8twe?t=12m28s A corpus-based study of rhythm patterns, ISMIR 2012 Drum transcription via Classification of bar-level rhythmic patterns, ISMIR 2014
Motivation: HPSS as a Pre-Processing Step (a) original (b) harmonic (c) percussive Figure from [Mueller, FPM, Chapter 8, Springer 2015]
Supervised NMF Approach ENST drum dataset http://www.tsi.telecom-paristech.fr/aao/en/2010/02/19/enstdrums-an-extensive-audio-visual-database-for-drum-signalsprocessing/ IDMT-SMT-Drums dataset http://www.idmt.fraunhofer.de/en/business_units/smt/drums. html Nonnegative matrix partial cofactorization for spectral and temporal drum source separation, JSTSP 2011
Unsupervised Median-Filtering Approach Simple DSP techniques; no ML Intuition stable harmonic or stationary components form horizontal ridges on the spectrogram percussive components form vertical ridges with a broadband frequency response Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram, EUSIPCO 2008 Harmonic/percussive separation using median filtering, DAFx 2010
Filtering Filtering: mean, max, median finite filter length e.g. filter of length 3 input: [0 0 1 0 1 0 1 1 0 1 1 1 0] median: [ 0 0 1 0 0 1 1 1 1 1 1 ] Pooling max: [ 1 1 1 1 1 1 1 1 1 1 1 ] filter length = number of samples e.g. [9 0 1 3 1 2 5] -> {mean=3, max=9, median=?} (note: to calculate median you need to sort the values)
HPSS via Median Filtering ideal harmonic signal ideal percussive signal violin Figure from [Mueller, FPM, Chapter 8, Springer 2015] castanets
HPSS via Median Filtering Figure from [Mueller, FPM, Chapter 8, Springer 2015]
HPSS via Median Filtering Figure from [Mueller, FPM, Chapter 8, Springer 2015]
HPSS via Median Filtering violin + castanets smaller filter length larger filter length Figure from [Mueller, FPM, Chapter 8, Springer 2015]
HPSS via Median Filtering violin + castanets binary mask soft mask Figure from [Mueller, FPM, Chapter 8, Springer 2015]
Parameters Window size, hop size Reconstruction method Filter length Q1: Given sampling rate = 44.1 khz, FFT window size = 4096 samples, hopsize = 1024 samples, what s the physical meaning of using a vertical (percussive) median filter length=17? Q2: What s the physical meaning of using a horizontal (harmonic) median filter length=17?
Implementation http://bmcfee.github.io/librosa/generated/librosa.dec ompose.hpss.html (previous page) Q1: 183 Hz Q2: 0.395 second
Limitation Supervised Method?
Extension: Adding a Residual Component http://www.audiolabserlangen.de/resources/2014 -ISMIR-ExtHPSep/ The harmonic component contains the violin, the percussive component contains the castanets, and the residual contains the applause Harmonic Residual Percussive Extending harmonic-percussive separation of audio signals, ISMIR 2014
Extension: Separating the Vocals V component is smooth in time, nonsmooth in frequency, on short-framed (but not long-framed) STFT domain
Extension: Separating the Vocals Singing voice: intermediate component between harmonic and percussive Perform the two HPSS on spectrograms with two different time-frequency resolutions Singing voice enhancement in monaural music signals based on two-stage harmonic/ percussive sound separation on multiple resolution spectrograms, TASLP 2014
Extension: Separating the Vocals
Extension: Smoothing Frame-level predic on note-level prediction Methods: median filtering, or HMM http://c4dm.eecs.qmul.ac.uk/ismir15-amt-tutorial/