advanced spectral processing

Similar documents
Simple Harmonic Motion: What is a Sound Spectrum?

UNIVERSITY OF DUBLIN TRINITY COLLEGE

Experiments on musical instrument separation using multiplecause

Automatic music transcription

Digital music synthesis using DSP

Laboratory Assignment 3. Digital Music Synthesis: Beethoven s Fifth Symphony Using MATLAB

BBN ANG 141 Foundations of phonology Phonetics 3: Acoustic phonetics 1

PSYCHOACOUSTICS & THE GRAMMAR OF AUDIO (By Steve Donofrio NATF)

Music Source Separation

2. AN INTROSPECTION OF THE MORPHING PROCESS

Topic 10. Multi-pitch Analysis

Pitch-Synchronous Spectrogram: Principles and Applications

Lecture 9 Source Separation

Drum Source Separation using Percussive Feature Detection and Spectral Modulation

Further Topics in MIR

Tempo and Beat Analysis

Instrument Recognition in Polyphonic Mixtures Using Spectral Envelopes

MELODY EXTRACTION FROM POLYPHONIC AUDIO OF WESTERN OPERA: A METHOD BASED ON DETECTION OF THE SINGER S FORMANT

THE importance of music content analysis for musical

GCT535- Sound Technology for Multimedia Timbre Analysis. Graduate School of Culture Technology KAIST Juhan Nam

CSC475 Music Information Retrieval

Lecture 10 Harmonic/Percussive Separation

Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models

Voice & Music Pattern Extraction: A Review

Topic 11. Score-Informed Source Separation. (chroma slides adapted from Meinard Mueller)

Measurement of overtone frequencies of a toy piano and perception of its pitch

9.35 Sensation And Perception Spring 2009

Making music with voice. Distinguished lecture, CIRMMT Jan 2009, Copyright Johan Sundberg

Expressive Singing Synthesis based on Unit Selection for the Singing Synthesis Challenge 2016

ADSR AMP. ENVELOPE. Moog Music s Guide To Analog Synthesized Percussion. The First Step COMMON VOLUME ENVELOPES

Single Channel Vocal Separation using Median Filtering and Factorisation Techniques

Robert Alexandru Dobre, Cristian Negrescu

EE513 Audio Signals and Systems. Introduction Kevin D. Donohue Electrical and Computer Engineering University of Kentucky

A METHOD OF MORPHING SPECTRAL ENVELOPES OF THE SINGING VOICE FOR USE WITH BACKING VOCALS

2018 Fall CTP431: Music and Audio Computing Fundamentals of Musical Acoustics

Lab 5 Linear Predictive Coding

Data Driven Music Understanding

FPFV-285/585 PRODUCTION SOUND Fall 2018 CRITICAL LISTENING Assignment

I. LISTENING. For most people, sound is background only. To the sound designer/producer, sound is everything.!tc 243 2

Music Information Retrieval

A prototype system for rule-based expressive modifications of audio recordings

Lecture 15: Research at LabROSA

Note on Posted Slides. Noise and Music. Noise and Music. Pitch. PHY205H1S Physics of Everyday Life Class 15: Musical Sounds

Efficient Computer-Aided Pitch Track and Note Estimation for Scientific Applications. Matthias Mauch Chris Cannam György Fazekas

Audio Source Separation: "De-mixing" for Production

Pitch Perception and Grouping. HST.723 Neural Coding and Perception of Sound

Book: Fundamentals of Music Processing. Audio Features. Book: Fundamentals of Music Processing. Book: Fundamentals of Music Processing

MIE 402: WORKSHOP ON DATA ACQUISITION AND SIGNAL PROCESSING Spring 2003

An Overview of Lead and Accompaniment Separation in Music

Physics and Neurophysiology of Hearing

Keywords Separation of sound, percussive instruments, non-percussive instruments, flexible audio source separation toolbox

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Spectrum Analyser Basics

Supervised Musical Source Separation from Mono and Stereo Mixtures based on Sinusoidal Modeling

Linear Time Invariant (LTI) Systems

Experimenting with Musically Motivated Convolutional Neural Networks

Pitch. The perceptual correlate of frequency: the perceptual dimension along which sounds can be ordered from low to high.

Chapter 1. Introduction to Digital Signal Processing

Welcome to Vibrationdata

DTS Neural Mono2Stereo

/$ IEEE

Gaussian Mixture Model for Singing Voice Separation from Stereophonic Music

Module 8 : Numerical Relaying I : Fundamentals

S I N E V I B E S FRACTION AUDIO SLICING WORKSTATION

A Parametric Autoregressive Model for the Extraction of Electric Network Frequency Fluctuations in Audio Forensic Authentication

Analysis, Synthesis, and Perception of Musical Sounds

CHAPTER 20.2 SPEECH AND MUSICAL SOUNDS

1. Introduction NCMMSC2009

PHYSICS OF MUSIC. 1.) Charles Taylor, Exploring Music (Music Library ML3805 T )

A Matlab toolbox for. Characterisation Of Recorded Underwater Sound (CHORUS) USER S GUIDE

OBJECTIVE EVALUATION OF A MELODY EXTRACTOR FOR NORTH INDIAN CLASSICAL VOCAL PERFORMANCES

Getting Started with the LabVIEW Sound and Vibration Toolkit

CTP431- Music and Audio Computing Musical Acoustics. Graduate School of Culture Technology KAIST Juhan Nam

Automatic Construction of Synthetic Musical Instruments and Performers

Lecture 7: Music

Drum Sound Identification for Polyphonic Music Using Template Adaptation and Matching Methods

CTP 431 Music and Audio Computing. Basic Acoustics. Graduate School of Culture Technology (GSCT) Juhan Nam

Digital Signal. Continuous. Continuous. amplitude. amplitude. Discrete-time Signal. Analog Signal. Discrete. Continuous. time. time.

Tempo and Beat Tracking

Advanced Signal Processing 2

Video-based Vibrato Detection and Analysis for Polyphonic String Music

Analysis of the effects of signal distance on spectrograms

ECE438 - Laboratory 4: Sampling and Reconstruction of Continuous-Time Signals

Music 209 Advanced Topics in Computer Music Lecture 1 Introduction

Lecture 1: What we hear when we hear music

Music Representations

EVALUATION OF A SCORE-INFORMED SOURCE SEPARATION SYSTEM

ADDING (INJECTING) NOISE TO IMPROVE RESULTS.

Using the new psychoacoustic tonality analyses Tonality (Hearing Model) 1

Fourier Transforms 1D

Introduction! User Interface! Bitspeek Versus Vocoders! Using Bitspeek in your Host! Change History! Requirements!...

Creative Computing II

Musical Acoustics Lecture 15 Pitch & Frequency (Psycho-Acoustics)

Computational Models of Music Similarity. Elias Pampalk National Institute for Advanced Industrial Science and Technology (AIST)

TIMBRE REPLACEMENT OF HARMONIC AND DRUM COMPONENTS FOR MUSIC AUDIO SIGNALS

Recognising Cello Performers using Timbre Models

Onset Detection and Music Transcription for the Irish Tin Whistle

NanoGiant Oscilloscope/Function-Generator Program. Getting Started

6.5 Percussion scalograms and musical rhythm

Soundprism: An Online System for Score-Informed Source Separation of Music Audio Zhiyao Duan, Student Member, IEEE, and Bryan Pardo, Member, IEEE

Transcription:

advanced spectral processing Jordi Janer Music Technology Group Universitat Pompeu Fabra, Barcelona jordi.janer @ upf.edu CDSIM UPF May 2014 hkp://mtg.upf.edu/

Outline 1. IntroducNon to spectral processing 2. Decomposing sound signals

1- IntroducNon to spectral processing

Simple Periodic Waves (sine waves) Characterized by: period: T amplitude A phase φ Fundamental frequency in cycles per second, or Hz F 0 =1/T A 0.99 0 y(0)=a sin(φ) 0.99 0 0.02 Time (s) T y = A sin(2πf 0 t+φ) (Many slides come from materials from Dan Jurafsky) CDSIM UPF May 2014

Simple periodic waves Frequency: 5 cycles in.5 seconds = 10 cycles/second = 10 Hz Amplitude: 1 Phase: at time 0 seconds, y(0)=a sin(2π10t+φ)=sin(φ)=0 φ=πk, k! φ=0 Equation: y(t) = A sin(20πt) CDSIM UPF May 2014

(more) Basic facts about sound waves f = c/λ where c = speed of sound, and λ = wave length (longitud d ona) in meters λ c=3440 cm/s ( 345 m/s) at 21 degrees Celsius at sea level Example: with λ=10m, frequency f=34,5hz CDSIM UPF May 2014

Speech sound waves A likle piece from the waveform of a vowel Y axis: Amplitude = amount of air pressure at that Nme point PosiNve is compression Zero is normal air pressure, neganve is rarefacnon (expansion) X axis: Nme. CDSIM UPF May 2014

Fundamental frequency The fundamental frequency (or F0) is the lowest frequency of a periodic (voiced) waveform, produced by any particular instrument (our vocal folds are like a complicated instrument) It is also called the first harmonic, in comparison with its integer multiples called second, third, etc. harmonics Fundamental Frequency = first harmonic 2 nd harmonic 3 rd harmonic 4 th harmonic 5 th harmonic 6 th harmonic 7 th harmonic CDSIM UPF May 2014

Fundamental frequency In speech, see for example the waveform of a vowel The fundamental frequency could be computed as the number of repennons/second of the wave: Above vowel has 10 reps in.03875 secs - > freq. is 10/.03875 = 258 Hz This is the speed at which vocal folds move, hence voicing speed Each peak corresponds to an opening of the vocal folds CDSIM UPF May 2014

Pitch Pitch is defined as the perceived fundamental frequency of a sound F0 and pitch are different concepts: F0 corresponds to a physically measurable frequency Pitch corresponds to a perceivable frequency The relanonship between pitch and F0 is not linear human pitch perception is most accurate between 100Hz and 1000Hz. Linear in this range: At F0 1 =200Hz, if Pitch 2 =Pitch 1 /2 then F0 2 100Hz Logarithmic above 1000Hz: At F0 1 =5KHz if Pitch 2 =Pitch 1 /2 then F0 2 <2KHz SNll, in the literature many Nmes F0 and pitch are treated as the same CDSIM UPF May 2014

F0 tracking F0 can be computed using several techniques, and using tools like PRAAT CDSIM UPF May 2014

Frequency analysis Waves have different frequencies 0.99 0 100 Hz 0.99 0 0.02 Time (s) 0.99 0 1000 Hz 0.99 0 0.02 Time (s) CDSIM UPF May 2014

Frequency analysis Complex waves: Adding a 100 Hz and 1000 Hz wave together 0.99 0 0.9654 0 0.05 Time (s) CDSIM UPF May 2014

Spectrum Frequency components (100 and 1000 Hz) on x- axis Amplitude 100 Frequency in Hz 1000 CDSIM UPF May 2014

Fourier transform analysis Fourier analysis: any wave can be represented as the (infinite) sum of sine waves of different frequencies (amplitude, phase) For connnuous signals: For discrete signals: When N is finite (and relanvely short) we call the resulnng signal the short term spectrum (STFT) CDSIM UPF May 2014

Spectrum example 40 Magnitude (in db) 20 0 0 5000 Frequency (Hz) Spectrum of one instant in an actual soundwave: many components across the frequency range Each frequency component of the wave is separated CDSIM UPF May 2014

Formants Formants are defined as the spectra peaks of the sound spectrum envelope Formants are independent of the F0 frequency, as they are defined over the envelope of the spectrum They are created by the pass of the sound through the vocal tract CDSIM UPF May 2014

Seeing formants: the spectrogram CDSIM UPF May 2014

Example What about Helium voice? hkp://www.phys.unsw.edu.au/jw/speechmodel.html CDSIM UPF May 2014

1. IntroducNon to acousnc signals 2. Spectral analysis 3. ApplicaNons of spectral processing

Spectrogram CDSIM UPF May 2014

Spectrogram Time- frequency representanon Short- Nme windowing Fast Fourier Transform (FFT) Available tools: Sonic Visualizer (for music analysis) Praat (for speech analysis) Other resources: Live spectrogram: hkp://labrosa.ee.columbia.edu/expo/ CDSIM UPF May 2014

Window size Understanding Time- Frequency resolunon Long windows: good freq resolunon Short windows: good temporal resolunon CDSIM UPF May 2014

Observing test signals Two near tones Noise burst Chirp Pure tones Harmonic richness (square/saw) Low tone SonicVisualizer h.p://mtg.upf.edu/~jjaner/teaching/cdsim2014/test_various_signals.wav CDSIM UPF May 2014

ApplicaNons of spectral processing technologies for the analysis of sound and music technologies for the transforma9on of sound and music technologies for the synthesis of sound and music CDSIM UPF May 2014

Analysis Skore automanc singing voice ranng CDSIM UPF May 2014

Transforming signals Approaches for spectral transformanons: SMS: hkp://mtg.upf.edu/sms Phase Vocoder Basic transformanons Pitch transposinon Harmonic/noise decomposinon Time- stretching (Matlab internal MTG sosware) CDSIM UPF May 2014

Transforming signals Basic transformanons Original Pitch transposinon Harmonic/noise decomposinon Time- stretching (50x) CDSIM UPF May 2014

TransformaNon Time scaling DetecNon of transients RepeNNon/Removal of spectral frames Demo: Fast Remixing Original fast Nme- varying remix Swing detecnon Tempo detecnon at 8 th note level Change swing factor Demo: video CDSIM UPF May 2014

Synthesis Sample- based (Violin) Gesture modelling to provide a more realisnc synthesis Voice- driven synthesis Voice analysis is used to control the synthesis of a violin sound CDSIM UPF May 2014

2- Decomposing sound signals Signal decomposinon and Source separanon

source separanon CDSIM UPF May 2014

The objecnve Music is distributed as mixdowns in various formats Users aim to further manipulate music signals in mulnple applicanon contexts (karaoke, soloing, remixing, etc.) * from mulntrack originals

The problem Music signals are complex Variety of music styles and instrumentanons Modern producnon techniques go beyond linear combinanon of recorded acousnc sources (FX s, digital synth, etc.)

Background I ExisNng generic SS approaches: Spectral subtrac9on IntuiNve Well- studied (industrial interest) Good for speech/stanonary noise reducnon Less appropriate for music signals

Background II ExisNng music- specific approaches I: Pan- frequency masks o Assumes non- overlapping signals in Nme- frequency bins o Stereo signals are required o Amplitude rano between L and R FFT bins o 2D user interface Examples o Good for simple excerpts o Bad for complex mixes * Loses brightness, vocals less reduced due to reverb, flute is also removed,.,

Background III ExisNng music- specific approaches II: Non- nega9ve Matrix Factoriza9on (NMF) Magnitude spectrogram (non- neganve) DecomposiNon as matrix product W (spectral basis) and H (gain acivaions over Ime) Spectrum frame explained as linear combinanon of R basis. MinimizaNon problem that finds W and H: min(d (V, WH)) H W V

NMF details Non- nega9ve Matrix Factoriza9on I 3 spectral basis W 1 overlapping note H: acivaion gains

NMF details Non- nega9ve Matrix Factoriza9on I 3 spectral basis W 2 overlapping notes H: acivaion gains

NMF challenges Predominant instrument separanon (pitch/nmbre analysis) Completeness of instrument removal (akack/sustain, residual/breathing noise, unvoiced consonants, ) Percussive instruments separanon (Transient detecnon, wideband spectrum) Polyphonic instrument separanon (blind and score- informed)

Vocals/Background separanon Music print decomposinon: song containing a region without target (e.g. vocals), basis model learnt from the user- selected music- print Music print (without vocals) Region with vocals

Vocals/Background separanon Music print decomposinon: Demos: original mute Background excerpt Basis decomposinon W H W bgd Input Basis decomposinon [W bgd, W other ] [H bgd,h oth er ] Wiener filtering (W bgd, H bgd )/(W M) output mute

Vocals/Background separanon Music print decomposinon: Demos: original solo Background excerpt Basis decomposinon W H W bgd Input Basis decomposinon [W bgd, W other ] [H bgd,h oth er ] Wiener filtering (W other, H other )/(W M) output solo

Vocals/Background separanon Music print decomposinon: not always possible accompaniment (music print) changes throughout the song target always present in some secnons

Vocals/Background separanon Solu9on à Predominant Pitch detec9on e.g MELODIA (J. Salomon, MTG) SeparaNon à Binary mask from pitch informanon Simplest approach Nme- frequency mask 1 s at harmonic posinons, 0 s rest Can be combined with pan- frequency mask Demos Voice is properly removed/akenuated Bass guitar is comb- filtered, and horns akenuated Soloing produces more arnfacts original mute solo

Vocals/Background removal Advanced separa9on approaches Special treatment for vocals: source / filter models Breathiness residual (noise added on formant envelope) Demos: Original Solo version without residual Solo version with residual

Vocals/Background removal Advanced separa9on approaches Special treatment for vocals Breathiness residual (noise added on formant envelope) Unvoiced FricaIve modelling /s/, /f/, /sh/, supervised basis from solo phoneme recordings o Demos: Original Solo version /s/ are missing Solo version /s/ are present Spectrogram of the fricanve recording used to train the spectral basis.

Piano decomposinon/retouch Using instrument- specific NMF dicnonaries Piano model of 88 notes (W matrix is pre- learned). Retouch use- case: Amateur recording with errors. The user can select and correct individual notes aser decomposinon/ separanon. Original (ref) Original (played with errors) Separated notes Corrected remix

Score- informed separanon Mul9ple sources in an orchestral recording Score data is used to ininalize acnvanons matrix H Video Demo: Isolated instruments: violin, cello, oboe, bassoon, flute

Other potennal applicanons

Other potennal applicanons Singer replacement Original Vocals mute Vocaloid Clara Vocaloid Clara Mix Drums enhancement Original Drums+6dB Drums- 6dB Step- remixer for drums user- supervised transients (onsets Nme and instrument) Original All Drums Single Instrument

Other potennal applicanons (piano) Mono- to- stereo upmixing Input Mozart K331 recording (RWC dataset) Output Upmixing from Mono les/right hands are panned in stereo

Other potennal applicanons (piano) Automa9c accompaniment Input Mozart K331 recording (RWC dataset) Output automanc object detecnon String ensemble resynthesis synth solo (Kontakt) mixture

Thanks! Jordi Janer Music Technology Group Universitat Pompeu Fabra, Barcelona jordi.janer @ upf.edu CDSIM UPF May 2014 hkp://mtg.upf.edu/~jjaner