Lecture 10 Harmonic/Percussive Separation

Size: px

Start display at page:

Download "Lecture 10 Harmonic/Percussive Separation"

Darren Chandler
5 years ago
Views:

10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 10 Harmonic/Percussive Separation Yi-Hsuan Yang Ph.D. http://www.citi.

1 10420CS 音樂資訊檢索 Music Information Retrieval Lecture 10 Harmonic/Percussive Separation Yi-Hsuan Yang Ph.D. Music & Audio Computing Lab, Research Center for IT Innovation, Academia Sinica

2 Syllabus W9: multiple instruments separation => dictionary based methods: nonnegative matrix factorization (NMF) and friends W10: harmonic/percussive separation (HPSS) => median filtering and friends W11: singing voice separation => low-rank based methods: robust principal component analysis (RPCA) and friends

3 Motivation: Drum Transcription The drum track in popular music conveys information about tempo, rhythm, style, and possibly the structure of a song From Wikipedia

4 Motivation: Beat, Tempo, Rhythmic Pattern The drum track in popular music conveys information about tempo, rhythm, style, and possibly the structure of a song Transcription and separation of drum signals from polyphonic music, TASLP 2008 Techniques for machine understanding of live drum performances, TR 2012

Motivation: Drum Pattern Analysis https://youtu.be/lm_oz7p8twe?

5 Motivation: Drum Pattern Analysis A corpus-based study of rhythm patterns, ISMIR 2012 Drum transcription via Classification of bar-level rhythmic patterns, ISMIR 2014

6 Motivation: HPSS as a Pre-Processing Step (a) original (b) harmonic (c) percussive Figure from [Mueller, FPM, Chapter 8, Springer 2015]

Supervised NMF Approach ENST drum dataset http://www.tsi.telecom-paristech.

7 Supervised NMF Approach ENST drum dataset IDMT-SMT-Drums dataset html Nonnegative matrix partial cofactorization for spectral and temporal drum source separation, JSTSP 2011

Unsupervised Median-Filtering Approach Simple DSP techniques; no ML Intuition stable harmonic or stationary components form horizontal ridges on the spectrogram percussive components form vertical

8 Unsupervised Median-Filtering Approach Simple DSP techniques; no ML Intuition stable harmonic or stationary components form horizontal ridges on the spectrogram percussive components form vertical ridges with a broadband frequency response Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram, EUSIPCO 2008 Harmonic/percussive separation using median filtering, DAFx 2010

9 Filtering Filtering: mean, max, median finite filter length e.g. filter of length 3 input: [ ] median: [ ] Pooling max: [ ] filter length = number of samples e.g. [ ] -> {mean=3, max=9, median=?} (note: to calculate median you need to sort the values)

10 HPSS via Median Filtering ideal harmonic signal ideal percussive signal violin Figure from [Mueller, FPM, Chapter 8, Springer 2015] castanets

11 HPSS via Median Filtering Figure from [Mueller, FPM, Chapter 8, Springer 2015]

12 HPSS via Median Filtering Figure from [Mueller, FPM, Chapter 8, Springer 2015]

13 HPSS via Median Filtering violin + castanets smaller filter length larger filter length Figure from [Mueller, FPM, Chapter 8, Springer 2015]

14 HPSS via Median Filtering violin + castanets binary mask soft mask Figure from [Mueller, FPM, Chapter 8, Springer 2015]

15 Parameters Window size, hop size Reconstruction method Filter length Q1: Given sampling rate = 44.1 khz, FFT window size = 4096 samples, hopsize = 1024 samples, what s the physical meaning of using a vertical (percussive) median filter length=17? Q2: What s the physical meaning of using a horizontal (harmonic) median filter length=17?

16 Implementation ompose.hpss.html (previous page) Q1: 183 Hz Q2: second

17 Limitation Supervised Method?

Extension: Adding a Residual Component http://www.audiolabserlangen.

percussive component contains the castanets, and the residual contains the applause

18 Extension: Adding a Residual Component -ISMIR-ExtHPSep/ The harmonic component contains the violin, the percussive component contains the castanets, and the residual contains the applause Harmonic Residual Percussive Extending harmonic-percussive separation of audio signals, ISMIR 2014

19 Extension: Separating the Vocals V component is smooth in time, nonsmooth in frequency, on short-framed (but not long-framed) STFT domain

20 Extension: Separating the Vocals Singing voice: intermediate component between harmonic and percussive Perform the two HPSS on spectrograms with two different time-frequency resolutions Singing voice enhancement in monaural music signals based on two-stage harmonic/ percussive sound separation on multiple resolution spectrograms, TASLP 2014

21 Extension: Separating the Vocals

22 Extension: Smoothing Frame-level predic on note-level prediction Methods: median filtering, or HMM

Lecture 9 Source Separation

10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research