advanced spectral processing Jordi Janer Music Technology Group Universitat Pompeu Fabra, Barcelona jordi.janer @ upf.edu CDSIM UPF May 2014 hkp://mtg.upf.edu/
Outline 1. IntroducNon to spectral processing 2. Decomposing sound signals
1- IntroducNon to spectral processing
Simple Periodic Waves (sine waves) Characterized by: period: T amplitude A phase φ Fundamental frequency in cycles per second, or Hz F 0 =1/T A 0.99 0 y(0)=a sin(φ) 0.99 0 0.02 Time (s) T y = A sin(2πf 0 t+φ) (Many slides come from materials from Dan Jurafsky) CDSIM UPF May 2014
Simple periodic waves Frequency: 5 cycles in.5 seconds = 10 cycles/second = 10 Hz Amplitude: 1 Phase: at time 0 seconds, y(0)=a sin(2π10t+φ)=sin(φ)=0 φ=πk, k! φ=0 Equation: y(t) = A sin(20πt) CDSIM UPF May 2014
(more) Basic facts about sound waves f = c/λ where c = speed of sound, and λ = wave length (longitud d ona) in meters λ c=3440 cm/s ( 345 m/s) at 21 degrees Celsius at sea level Example: with λ=10m, frequency f=34,5hz CDSIM UPF May 2014
Speech sound waves A likle piece from the waveform of a vowel Y axis: Amplitude = amount of air pressure at that Nme point PosiNve is compression Zero is normal air pressure, neganve is rarefacnon (expansion) X axis: Nme. CDSIM UPF May 2014
Fundamental frequency The fundamental frequency (or F0) is the lowest frequency of a periodic (voiced) waveform, produced by any particular instrument (our vocal folds are like a complicated instrument) It is also called the first harmonic, in comparison with its integer multiples called second, third, etc. harmonics Fundamental Frequency = first harmonic 2 nd harmonic 3 rd harmonic 4 th harmonic 5 th harmonic 6 th harmonic 7 th harmonic CDSIM UPF May 2014
Fundamental frequency In speech, see for example the waveform of a vowel The fundamental frequency could be computed as the number of repennons/second of the wave: Above vowel has 10 reps in.03875 secs - > freq. is 10/.03875 = 258 Hz This is the speed at which vocal folds move, hence voicing speed Each peak corresponds to an opening of the vocal folds CDSIM UPF May 2014
Pitch Pitch is defined as the perceived fundamental frequency of a sound F0 and pitch are different concepts: F0 corresponds to a physically measurable frequency Pitch corresponds to a perceivable frequency The relanonship between pitch and F0 is not linear human pitch perception is most accurate between 100Hz and 1000Hz. Linear in this range: At F0 1 =200Hz, if Pitch 2 =Pitch 1 /2 then F0 2 100Hz Logarithmic above 1000Hz: At F0 1 =5KHz if Pitch 2 =Pitch 1 /2 then F0 2 <2KHz SNll, in the literature many Nmes F0 and pitch are treated as the same CDSIM UPF May 2014
F0 tracking F0 can be computed using several techniques, and using tools like PRAAT CDSIM UPF May 2014
Frequency analysis Waves have different frequencies 0.99 0 100 Hz 0.99 0 0.02 Time (s) 0.99 0 1000 Hz 0.99 0 0.02 Time (s) CDSIM UPF May 2014
Frequency analysis Complex waves: Adding a 100 Hz and 1000 Hz wave together 0.99 0 0.9654 0 0.05 Time (s) CDSIM UPF May 2014
Spectrum Frequency components (100 and 1000 Hz) on x- axis Amplitude 100 Frequency in Hz 1000 CDSIM UPF May 2014
Fourier transform analysis Fourier analysis: any wave can be represented as the (infinite) sum of sine waves of different frequencies (amplitude, phase) For connnuous signals: For discrete signals: When N is finite (and relanvely short) we call the resulnng signal the short term spectrum (STFT) CDSIM UPF May 2014
Spectrum example 40 Magnitude (in db) 20 0 0 5000 Frequency (Hz) Spectrum of one instant in an actual soundwave: many components across the frequency range Each frequency component of the wave is separated CDSIM UPF May 2014
Formants Formants are defined as the spectra peaks of the sound spectrum envelope Formants are independent of the F0 frequency, as they are defined over the envelope of the spectrum They are created by the pass of the sound through the vocal tract CDSIM UPF May 2014
Seeing formants: the spectrogram CDSIM UPF May 2014
Example What about Helium voice? hkp://www.phys.unsw.edu.au/jw/speechmodel.html CDSIM UPF May 2014
1. IntroducNon to acousnc signals 2. Spectral analysis 3. ApplicaNons of spectral processing
Spectrogram CDSIM UPF May 2014
Spectrogram Time- frequency representanon Short- Nme windowing Fast Fourier Transform (FFT) Available tools: Sonic Visualizer (for music analysis) Praat (for speech analysis) Other resources: Live spectrogram: hkp://labrosa.ee.columbia.edu/expo/ CDSIM UPF May 2014
Window size Understanding Time- Frequency resolunon Long windows: good freq resolunon Short windows: good temporal resolunon CDSIM UPF May 2014
Observing test signals Two near tones Noise burst Chirp Pure tones Harmonic richness (square/saw) Low tone SonicVisualizer h.p://mtg.upf.edu/~jjaner/teaching/cdsim2014/test_various_signals.wav CDSIM UPF May 2014
ApplicaNons of spectral processing technologies for the analysis of sound and music technologies for the transforma9on of sound and music technologies for the synthesis of sound and music CDSIM UPF May 2014
Analysis Skore automanc singing voice ranng CDSIM UPF May 2014
Transforming signals Approaches for spectral transformanons: SMS: hkp://mtg.upf.edu/sms Phase Vocoder Basic transformanons Pitch transposinon Harmonic/noise decomposinon Time- stretching (Matlab internal MTG sosware) CDSIM UPF May 2014
Transforming signals Basic transformanons Original Pitch transposinon Harmonic/noise decomposinon Time- stretching (50x) CDSIM UPF May 2014
TransformaNon Time scaling DetecNon of transients RepeNNon/Removal of spectral frames Demo: Fast Remixing Original fast Nme- varying remix Swing detecnon Tempo detecnon at 8 th note level Change swing factor Demo: video CDSIM UPF May 2014
Synthesis Sample- based (Violin) Gesture modelling to provide a more realisnc synthesis Voice- driven synthesis Voice analysis is used to control the synthesis of a violin sound CDSIM UPF May 2014
2- Decomposing sound signals Signal decomposinon and Source separanon
source separanon CDSIM UPF May 2014
The objecnve Music is distributed as mixdowns in various formats Users aim to further manipulate music signals in mulnple applicanon contexts (karaoke, soloing, remixing, etc.) * from mulntrack originals
The problem Music signals are complex Variety of music styles and instrumentanons Modern producnon techniques go beyond linear combinanon of recorded acousnc sources (FX s, digital synth, etc.)
Background I ExisNng generic SS approaches: Spectral subtrac9on IntuiNve Well- studied (industrial interest) Good for speech/stanonary noise reducnon Less appropriate for music signals
Background II ExisNng music- specific approaches I: Pan- frequency masks o Assumes non- overlapping signals in Nme- frequency bins o Stereo signals are required o Amplitude rano between L and R FFT bins o 2D user interface Examples o Good for simple excerpts o Bad for complex mixes * Loses brightness, vocals less reduced due to reverb, flute is also removed,.,
Background III ExisNng music- specific approaches II: Non- nega9ve Matrix Factoriza9on (NMF) Magnitude spectrogram (non- neganve) DecomposiNon as matrix product W (spectral basis) and H (gain acivaions over Ime) Spectrum frame explained as linear combinanon of R basis. MinimizaNon problem that finds W and H: min(d (V, WH)) H W V
NMF details Non- nega9ve Matrix Factoriza9on I 3 spectral basis W 1 overlapping note H: acivaion gains
NMF details Non- nega9ve Matrix Factoriza9on I 3 spectral basis W 2 overlapping notes H: acivaion gains
NMF challenges Predominant instrument separanon (pitch/nmbre analysis) Completeness of instrument removal (akack/sustain, residual/breathing noise, unvoiced consonants, ) Percussive instruments separanon (Transient detecnon, wideband spectrum) Polyphonic instrument separanon (blind and score- informed)
Vocals/Background separanon Music print decomposinon: song containing a region without target (e.g. vocals), basis model learnt from the user- selected music- print Music print (without vocals) Region with vocals
Vocals/Background separanon Music print decomposinon: Demos: original mute Background excerpt Basis decomposinon W H W bgd Input Basis decomposinon [W bgd, W other ] [H bgd,h oth er ] Wiener filtering (W bgd, H bgd )/(W M) output mute
Vocals/Background separanon Music print decomposinon: Demos: original solo Background excerpt Basis decomposinon W H W bgd Input Basis decomposinon [W bgd, W other ] [H bgd,h oth er ] Wiener filtering (W other, H other )/(W M) output solo
Vocals/Background separanon Music print decomposinon: not always possible accompaniment (music print) changes throughout the song target always present in some secnons
Vocals/Background separanon Solu9on à Predominant Pitch detec9on e.g MELODIA (J. Salomon, MTG) SeparaNon à Binary mask from pitch informanon Simplest approach Nme- frequency mask 1 s at harmonic posinons, 0 s rest Can be combined with pan- frequency mask Demos Voice is properly removed/akenuated Bass guitar is comb- filtered, and horns akenuated Soloing produces more arnfacts original mute solo
Vocals/Background removal Advanced separa9on approaches Special treatment for vocals: source / filter models Breathiness residual (noise added on formant envelope) Demos: Original Solo version without residual Solo version with residual
Vocals/Background removal Advanced separa9on approaches Special treatment for vocals Breathiness residual (noise added on formant envelope) Unvoiced FricaIve modelling /s/, /f/, /sh/, supervised basis from solo phoneme recordings o Demos: Original Solo version /s/ are missing Solo version /s/ are present Spectrogram of the fricanve recording used to train the spectral basis.
Piano decomposinon/retouch Using instrument- specific NMF dicnonaries Piano model of 88 notes (W matrix is pre- learned). Retouch use- case: Amateur recording with errors. The user can select and correct individual notes aser decomposinon/ separanon. Original (ref) Original (played with errors) Separated notes Corrected remix
Score- informed separanon Mul9ple sources in an orchestral recording Score data is used to ininalize acnvanons matrix H Video Demo: Isolated instruments: violin, cello, oboe, bassoon, flute
Other potennal applicanons
Other potennal applicanons Singer replacement Original Vocals mute Vocaloid Clara Vocaloid Clara Mix Drums enhancement Original Drums+6dB Drums- 6dB Step- remixer for drums user- supervised transients (onsets Nme and instrument) Original All Drums Single Instrument
Other potennal applicanons (piano) Mono- to- stereo upmixing Input Mozart K331 recording (RWC dataset) Output Upmixing from Mono les/right hands are panned in stereo
Other potennal applicanons (piano) Automa9c accompaniment Input Mozart K331 recording (RWC dataset) Output automanc object detecnon String ensemble resynthesis synth solo (Kontakt) mixture
Thanks! Jordi Janer Music Technology Group Universitat Pompeu Fabra, Barcelona jordi.janer @ upf.edu CDSIM UPF May 2014 hkp://mtg.upf.edu/~jjaner