Lecture 9 Source Separation

Similar documents
Lecture 10 Harmonic/Percussive Separation

Further Topics in MIR

Voice & Music Pattern Extraction: A Review

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

Audio. Meinard Müller. Beethoven, Bach, and Billions of Bytes. International Audio Laboratories Erlangen. International Audio Laboratories Erlangen

Music Information Retrieval

COMBINING MODELING OF SINGING VOICE AND BACKGROUND MUSIC FOR AUTOMATIC SEPARATION OF MUSICAL MIXTURES

LOW-RANK REPRESENTATION OF BOTH SINGING VOICE AND MUSIC ACCOMPANIMENT VIA LEARNED DICTIONARIES

AUTOMATIC CONVERSION OF POP MUSIC INTO CHIPTUNES FOR 8-BIT PIXEL ART

Deep learning for music data processing

POLYPHONIC PIANO NOTE TRANSCRIPTION WITH NON-NEGATIVE MATRIX FACTORIZATION OF DIFFERENTIAL SPECTROGRAM

THE importance of music content analysis for musical

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

Single Channel Vocal Separation using Median Filtering and Factorisation Techniques

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music

Lecture 15: Research at LabROSA

REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation

Singing Pitch Extraction and Singing Voice Separation

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

hit), and assume that longer incidental sounds (forest noise, water, wind noise) resemble a Gaussian noise distribution.

A Survey on: Sound Source Separation Methods

Effects of acoustic degradations on cover song recognition

EVALUATION OF A SCORE-INFORMED SOURCE SEPARATION SYSTEM

MUSI-6201 Computational Music Analysis

Singing Voice separation from Polyphonic Music Accompanient using Compositional Model

Introductions to Music Information Retrieval

Experiments on musical instrument separation using multiplecause

MUSICAL INSTRUMENT IDENTIFICATION BASED ON HARMONIC TEMPORAL TIMBRE FEATURES

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Research Article Score-Informed Source Separation for Multichannel Orchestral Recordings

Robert Alexandru Dobre, Cristian Negrescu

Music Emotion Recognition. Jaesung Lee. Chung-Ang University

Tempo and Beat Analysis

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

/$ IEEE

BETTER BEAT TRACKING THROUGH ROBUST ONSET AGGREGATION

MUSICAL INSTRUMENT RECOGNITION USING BIOLOGICALLY INSPIRED FILTERING OF TEMPORAL DICTIONARY ATOMS

A Survey of Audio-Based Music Classification and Annotation

SIMULTANEOUS SEPARATION AND SEGMENTATION IN LAYERED MUSIC

Short-Time Fourier Transform

Informed Source Separation of Linear Instantaneous Under-Determined Audio Mixtures by Source Index Embedding

Topic 10. Multi-pitch Analysis

TIMBRE REPLACEMENT OF HARMONIC AND DRUM COMPONENTS FOR MUSIC AUDIO SIGNALS

Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models

Music Source Separation

ON DRUM PLAYING TECHNIQUE DETECTION IN POLYPHONIC MIXTURES

Chord Classification of an Audio Signal using Artificial Neural Network

Tempo and Beat Tracking

Research on sampling of vibration signals based on compressed sensing

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. X, NO. X, MONTH 20XX 1

Music Synchronization. Music Synchronization. Music Data. Music Data. General Goals. Music Information Retrieval (MIR)

SINGING VOICE MELODY TRANSCRIPTION USING DEEP NEURAL NETWORKS

Wind Noise Reduction Using Non-negative Sparse Coding

Music Genre Classification and Variance Comparison on Number of Genres

Singer Traits Identification using Deep Neural Network

TIMBRE-CONSTRAINED RECURSIVE TIME-VARYING ANALYSIS FOR MUSICAL NOTE SEPARATION

MELODY EXTRACTION BASED ON HARMONIC CODED STRUCTURE

DAY 1. Intelligent Audio Systems: A review of the foundations and applications of semantic audio analysis and music information retrieval

NOTE-LEVEL MUSIC TRANSCRIPTION BY MAXIMUM LIKELIHOOD SAMPLING

Application Of Missing Feature Theory To The Recognition Of Musical Instruments In Polyphonic Audio

GENRE SPECIFIC DICTIONARIES FOR HARMONIC/PERCUSSIVE SOURCE SEPARATION

The Million Song Dataset

CS229 Project Report Polyphonic Piano Transcription

Topics in Computer Music Instrument Identification. Ioanna Karydi

Data Driven Music Understanding

Transcription and Separation of Drum Signals From Polyphonic Music

A prototype system for rule-based expressive modifications of audio recordings

Supervised Learning in Genre Classification

Subjective Similarity of Music: Data Collection for Individuality Analysis

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Automatic music transcription

PROFESSIONALLY-PRODUCED MUSIC SEPARATION GUIDED BY COVERS

Melody Extraction from Generic Audio Clips Thaminda Edirisooriya, Hansohl Kim, Connie Zeng

Semi-supervised Musical Instrument Recognition

A SCORE-INFORMED PIANO TUTORING SYSTEM WITH MISTAKE DETECTION AND SCORE SIMPLIFICATION

A. Ideal Ratio Mask If there is no RIR, the IRM for time frame t and frequency f can be expressed as [17]: ( IRM(t, f) =

Music Structure Analysis

Automatic Rhythmic Notation from Single Voice Audio Sources

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

WE ADDRESS the development of a novel computational

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Timbre Analysis of Music Audio Signals with Convolutional Neural Networks

MODAL ANALYSIS AND TRANSCRIPTION OF STROKES OF THE MRIDANGAM USING NON-NEGATIVE MATRIX FACTORIZATION

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

arxiv: v1 [cs.cv] 9 Apr 2018

ONSET DETECTION IN COMPOSITION ITEMS OF CARNATIC MUSIC

Topic 4. Single Pitch Detection

SINGING EXPRESSION TRANSFER FROM ONE VOICE TO ANOTHER FOR A GIVEN SONG. Sangeon Yong, Juhan Nam

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Data-Driven Solo Voice Enhancement for Jazz Music Retrieval

Week 14 Query-by-Humming and Music Fingerprinting. Roger B. Dannenberg Professor of Computer Science, Art and Music Carnegie Mellon University

Music Representations. Beethoven, Bach, and Billions of Bytes. Music. Research Goals. Piano Roll Representation. Player Piano (1900)

The Intervalgram: An Audio Feature for Large-scale Melody Recognition

Transcription of the Singing Melody in Polyphonic Music

CTP431- Music and Audio Computing Music Information Retrieval. Graduate School of Culture Technology KAIST Juhan Nam

A CLASSIFICATION-BASED POLYPHONIC PIANO TRANSCRIPTION APPROACH USING LEARNED FEATURE REPRESENTATIONS

arxiv: v2 [cs.sd] 18 Feb 2019

Transcription:

10420CS 573100 音樂資訊檢索 Music Information Retrieval Lecture 9 Source Separation Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw Music & Audio Computing Lab, Research Center for IT Innovation, Academia Sinica

Reference

Why Source Separation Because we are obsessed with this topic Complex and quaternionic principal component pursuit and its application to audio separation, SPL 2016 Informed monaural source separation of music based on convolutional sparse coding, ICASSP 2015 Vocal activity informed singing voice separation with the IKALA dataset, ICASSP 2015 Sparse modeling for artist identification: Exploiting phase information and vocal separation, ISMIR 2013 Low-rank representation of both singing voice and music accompaniment via learned dictionaries, ISMIR 2013 On sparse and low-rank matrix decomposition for singing voice separation, ACM MM 2012

Why Source Separation The two holy grails in MIR automatic transcription > source separation > Figures from [Mueller, FPM, Chapter 8, Springer 2015]

Application: Instrument Equalization Figure from [Mueller, FPM, Chapter 8, Springer 2015]

Application: Instrument Equalization (a) original (b) harmonic (c) percussive Figure from [Mueller, FPM, Chapter 8, Springer 2015]

Application: Audio Editing Figure from [Mueller, FPM, Chapter 8, Springer 2015]

Types of Separation Problems Type of sources separating multiple speakers (a.k.a. cocktail party effect) W9: separating multiple instruments (e.g., piano, violin) W10: separating harmonic/percussive components W11: separating singing voice from the accompaniments

Types of Separation Problems #sources vs. #channels overdetermined vs underdetermined single-channel vs. multi-channel Amount of side information blind source separation vs. guided source separation Online or offline

Why Source Separation is Difficult? Harmonic overlaps + underdetermined violin clarinet

Why Source Separation is Difficult? Harmonic overlaps + underdetermined

Approach Unsupervised: rule-based Supervised: learn from clean sources templates

Approach W9: multiple instruments separation => dictionary based methods: nonnegative matrix factorization (NMF) and friends W10: harmonic/percussive separation => median filtering and friends W11: singing voice separation => low-rank based methods: robust principal component analysis (RPCA) and friends

Nonnegative Matrix Factorization (NMF) Factorize (decompose) a matrix into two

NMF: Basic Idea Figure from [Mueller, FPM, Chapter 8, Springer 2015]

NMF: Basic Idea From Cédric Févotte s slides

NMF: Basic Idea From Cédric Févotte s slides

NMF for Music Audio

NMF for Music Audio Figure from [Mueller, FPM, Chapter 8, Springer 2015]

NMF for Music Audio

NMF for Face Images

NMF: Algorithm From Cédric Févotte s slides

NMF: Algorithm From Cédric Févotte s slides

NMF: Algorithm Cost function: Euclidean distance Fix W, update H: additive update hard to set the learning rate hard to ensure nonnegativity

NMF: Algorithm Cost function: Euclidean distance Fix W, update H: multiplicative update

NMF: Algorithm Fix W, update H: multiplicaitve update easily preserver nonnegativity easy to implement fast (of complexity O(FKN) per iteration) zeros remain zeros!

NMF: Algorithm Figure from [Mueller, FPM, Chapter 8, Springer 2015]

NMF for Music Audio Decomposition Figure from [Mueller, FPM, Chapter 8, Springer 2015]

NMF: Random Initialization initial W initial H learned W learned H Figure from [Mueller, FPM, Chapter 8, Springer 2015]

NMF: Harmonic Template Initialization zeros remain zeros! Figure from [Mueller, FPM, Chapter 8, Springer 2015]

NMF: Score-Informed Initialization zeros remain zeros! zeros remain zeros! Figure from [Mueller, FPM, Chapter 8, Springer 2015]

Dealing with Transients In acoustics and audio, a transient is a high amplitude, shortduration sound at the beginning of a waveform that occurs in phenomena such as musical sounds

NMF: Score-Informed Initialization + Onset Figure from [Mueller, FPM, Chapter 8, Springer 2015]

Unsupervised vs Supervised NMF Unsupervised: decompose the matrix itself, Supervised: use pre-trained templates Training phase min, Testing phase min, min mix,

NMF: Implementation Matlab Python Or, http://bmcfee.github.io/librosa/generated/librosa. decompose.decompose.html#librosa.decompose.d ecompose http://scikitlearn.org/stable/modules/generated/sklearn.deco mposition.nmf.html#sklearn.decomposition.nmf https://www.csie.ntu.edu.tw/~cjlin/nmf/

Toolboxes for NMF-based Separation Flexible Audio Source Separation Toolkit (FASST) http://bass-db.gforge.inria.fr/fasst/ implemented in C++, Matlab and python more sophisticated OpenBliSSART http://openblissart.github.io/openblissart/ implemented in C++, can be run on GPUs

Parameters Window size, hop size Number of templates Normalization of the templates Cost function of NMF Reconstruction method

Reconstruction Need to recover the time-domain signals magnitude

Reconstruction 1. Given a mixture y, compute the STFT Y 2. Decompose the magnitude Y into two matrices A and B (which are also real values) 3. Make A (or B) complex by adding the phase Y back 4. Do inverse STFT (ISTFT)

Reconstruction 1. Given a mixture y, compute the STFT Y 2. Decompose Y into A and B 3. Make A (or B) complex by adding the phase Y back 4. Do ISTFT https://www.ee.columbia.edu/~dpwe/resources/matlab/sgram/ myspecgram abs, angle ispecgram Y =abs(y), Y =angle(y) Y = Y.*cos( Y) + i* Y.*sin( Y);

Reconstruction: Wiener Filter (Binary) Y A B M A Use instead of in the ISTFT is referred to as a binary mask

Reconstruction: Wiener Filter (Soft) Y A B,,, M A Use instead of in the ISTFT c = 1 or 2 is referred to as a soft mask

Evaluation Source-to-distortion ratio (SDR) Source-to-interference ratio (SIR) Source-to-artifact ratio (SAR) true sources: a, b estimated sources: ae, be SDR(a): how ae is similar to a SIR(a): how ae is similar to b SAR(a): how ae is not similar to either a or b we can also compute SDR(b), SIR(b), SAR(b)

Evaluation BSS_Eval (Matlab) http://bass-db.gforge.inria.fr/bss_eval/bss_eval_sources.m

Evaluation mir_eval (python) http://labrosa.ee.columbia.edu/mir_eval/ http://craffel.github.io/mir_eval/#modulemir_eval.separation mir_eval can be used in most MIR tasks (chord recognition, onset detection, segmentation, etc)

Evaluation Source-to-distortion ratio (SDR) Source-to-interference ratio (SIR) Source-to-artifact ratio (SAR) true sources: a, b estimated sources: ae, be ae can be slightly shorter than a due to the windowing => chop off the end of a such that the length of a and ae are the same

Extension: Different Cost Functions* -divergence Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence, ICASSP 2014 Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis, Neural Computing 2009

Extension: Different Cost Functions* Euclidean distance KL divergence Algorithms for non-negative matrix factorization, NIPS 2000

Extension: Temporal Continuity & Sparsity squared difference usually implemented by the L1 norm Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, TASLP 2007

Extension: More Regularizers http://scikit-learn.org/stable/modules/generated/ sklearn.decomposition.nmf.html#sklearn.decomposition.nmf

Extension: Template Adaptation Pre-train the templates offline, but update them online according to the target signal Drum transcription using partially fixed non-negative matrix factorization with template adaptation, ISMIR 2015

Extension: Adding a Noise Dictionary To account for the possible noises in the signal W p W v W g W d W n piano violin guitar drum noise

Extension: Discriminative NMF Instead of training the dictionaries (templates) for different instruments separately; training them jointly to reduce the cross-talk Discriminative NMF and its application to single-channel source separation, ICASSP 2014

Extension: User-guided Separation user input Interactive refinement of supervised and semi-supervised sound source separation estimates, ICASSP 2013

Extension: Complex NMF and Friends Explicitly take phase into account Or, do things directly in the time-domain Complex NMF: A new sparse representation for acoustic signals, ICASSP 2009 Beyond NMF- time-domain audio source separation without phase reconstruction, ISMIR 2013 Informed monaural source separation of music based on convolutional sparse coding, ICASSP 2015 Multi-resolution signal decomposition with time-domain spectrogram factorization, ICASSP 2015 A score-informed shift-invariant extension of complex matrix factorization for improving the separation of overlapped partials in music recordings, ICASSP 2016

Extension: Time-domain Separation Informed monaural source separation of music based on convolutional sparse coding, ICASSP 2015

Extension: Tensor Decomposition

Extension: Dictionaries for Pitch Estimation Decompose the input as a linear combination of individual components templates of instruments => source separation templates of notes => multi-pitch estimation templates of chords => chord recognition Discriminative non-negative matrix factorization for multiple pitch estimation, ISMIR 2012

Extension: Voice Conversion

Extension: Audio Mosaicing Given a target and a source recording, the goal of audio mosaicing is to generate a mosaic recording that conveys musical aspects (like melody and rhythm) of the target, using sound components taken from the source https://www.audiolabs- erlangen.de/resources/mir/2015- ISMIR-LetItBee/ Let it Bee - Towards NMF-Inspired Audio Mosaicing, ISMIR 2015

Extension: Dictionaries for Classification codebook Music annotation and retrieval using unlabeled exemplars: correlation and sparse codes, SPL 2015 A systematic evaluation of the bag-of-frames representation for music information retrieval, TMM 2014