advanced spectral processing

Size: px

Start display at page:

Download "advanced spectral processing"

Magdalen Edwards
5 years ago
Views:

1 advanced spectral processing Jordi Janer Music Technology Group Universitat Pompeu Fabra, Barcelona upf.edu CDSIM UPF May 2014 hkp://mtg.upf.edu/

2 Outline 1. IntroducNon to spectral processing 2. Decomposing sound signals

3 1- IntroducNon to spectral processing

4 Simple Periodic Waves (sine waves) Characterized by: period: T amplitude A phase φ Fundamental frequency in cycles per second, or Hz F 0 =1/T A y(0)=a sin(φ) Time (s) T y = A sin(2πf 0 t+φ) (Many slides come from materials from Dan Jurafsky) CDSIM UPF May 2014

5 Simple periodic waves Frequency: 5 cycles in.5 seconds = 10 cycles/second = 10 Hz Amplitude: 1 Phase: at time 0 seconds, y(0)=a sin(2π10t+φ)=sin(φ)=0 φ=πk, k! φ=0 Equation: y(t) = A sin(20πt) CDSIM UPF May 2014

6 (more) Basic facts about sound waves f = c/λ where c = speed of sound, and λ = wave length (longitud d ona) in meters λ c=3440 cm/s ( 345 m/s) at 21 degrees Celsius at sea level Example: with λ=10m, frequency f=34,5hz CDSIM UPF May 2014

7 Speech sound waves A likle piece from the waveform of a vowel Y axis: Amplitude = amount of air pressure at that Nme point PosiNve is compression Zero is normal air pressure, neganve is rarefacnon (expansion) X axis: Nme. CDSIM UPF May 2014

first harmonic, in comparison with its integer multiples called second, third, etc.

8 Fundamental frequency The fundamental frequency (or F0) is the lowest frequency of a periodic (voiced) waveform, produced by any particular instrument (our vocal folds are like a complicated instrument) It is also called the first harmonic, in comparison with its integer multiples called second, third, etc. harmonics Fundamental Frequency = first harmonic 2 nd harmonic 3 rd harmonic 4 th harmonic 5 th harmonic 6 th harmonic 7 th harmonic CDSIM UPF May 2014

9 Fundamental frequency In speech, see for example the waveform of a vowel The fundamental frequency could be computed as the number of repennons/second of the wave: Above vowel has 10 reps in secs - > freq. is 10/ = 258 Hz This is the speed at which vocal folds move, hence voicing speed Each peak corresponds to an opening of the vocal folds CDSIM UPF May 2014

10 Pitch Pitch is defined as the perceived fundamental frequency of a sound F0 and pitch are different concepts: F0 corresponds to a physically measurable frequency Pitch corresponds to a perceivable frequency The relanonship between pitch and F0 is not linear human pitch perception is most accurate between 100Hz and 1000Hz. Linear in this range: At F0 1 =200Hz, if Pitch 2 =Pitch 1 /2 then F Hz Logarithmic above 1000Hz: At F0 1 =5KHz if Pitch 2 =Pitch 1 /2 then F0 2 <2KHz SNll, in the literature many Nmes F0 and pitch are treated as the same CDSIM UPF May 2014

11 F0 tracking F0 can be computed using several techniques, and using tools like PRAAT CDSIM UPF May 2014

12 Frequency analysis Waves have different frequencies Hz Time (s) Hz Time (s) CDSIM UPF May 2014

13 Frequency analysis Complex waves: Adding a 100 Hz and 1000 Hz wave together Time (s) CDSIM UPF May 2014

14 Spectrum Frequency components (100 and 1000 Hz) on x- axis Amplitude 100 Frequency in Hz 1000 CDSIM UPF May 2014

15 Fourier transform analysis Fourier analysis: any wave can be represented as the (infinite) sum of sine waves of different frequencies (amplitude, phase) For connnuous signals: For discrete signals: When N is finite (and relanvely short) we call the resulnng signal the short term spectrum (STFT) CDSIM UPF May 2014

16 Spectrum example 40 Magnitude (in db) Frequency (Hz) Spectrum of one instant in an actual soundwave: many components across the frequency range Each frequency component of the wave is separated CDSIM UPF May 2014

17 Formants Formants are defined as the spectra peaks of the sound spectrum envelope Formants are independent of the F0 frequency, as they are defined over the envelope of the spectrum They are created by the pass of the sound through the vocal tract CDSIM UPF May 2014

18 Seeing formants: the spectrogram CDSIM UPF May 2014

Example What about Helium voice? hkp://www.phys.

19 Example What about Helium voice? hkp:// CDSIM UPF May 2014

20 1. IntroducNon to acousnc signals 2. Spectral analysis 3. ApplicaNons of spectral processing

21 Spectrogram CDSIM UPF May 2014

22 Spectrogram Time- frequency representanon Short- Nme windowing Fast Fourier Transform (FFT) Available tools: Sonic Visualizer (for music analysis) Praat (for speech analysis) Other resources: Live spectrogram: hkp://labrosa.ee.columbia.edu/expo/ CDSIM UPF May 2014

23 Window size Understanding Time- Frequency resolunon Long windows: good freq resolunon Short windows: good temporal resolunon CDSIM UPF May 2014

24 Observing test signals Two near tones Noise burst Chirp Pure tones Harmonic richness (square/saw) Low tone SonicVisualizer h.p://mtg.upf.edu/~jjaner/teaching/cdsim2014/test_various_signals.wav CDSIM UPF May 2014

25 ApplicaNons of spectral processing technologies for the analysis of sound and music technologies for the transforma9on of sound and music technologies for the synthesis of sound and music CDSIM UPF May 2014

26 Analysis Skore automanc singing voice ranng CDSIM UPF May 2014

27 Transforming signals Approaches for spectral transformanons: SMS: hkp://mtg.upf.edu/sms Phase Vocoder Basic transformanons Pitch transposinon Harmonic/noise decomposinon Time- stretching (Matlab internal MTG sosware) CDSIM UPF May 2014

28 Transforming signals Basic transformanons Original Pitch transposinon Harmonic/noise decomposinon Time- stretching (50x) CDSIM UPF May 2014

29 TransformaNon Time scaling DetecNon of transients RepeNNon/Removal of spectral frames Demo: Fast Remixing Original fast Nme- varying remix Swing detecnon Tempo detecnon at 8 th note level Change swing factor Demo: video CDSIM UPF May 2014

30 Synthesis Sample- based (Violin) Gesture modelling to provide a more realisnc synthesis Voice- driven synthesis Voice analysis is used to control the synthesis of a violin sound CDSIM UPF May 2014

31 2- Decomposing sound signals Signal decomposinon and Source separanon

32 source separanon CDSIM UPF May 2014

33 The objecnve Music is distributed as mixdowns in various formats Users aim to further manipulate music signals in mulnple applicanon contexts (karaoke, soloing, remixing, etc.) * from mulntrack originals

34 The problem Music signals are complex Variety of music styles and instrumentanons Modern producnon techniques go beyond linear combinanon of recorded acousnc sources (FX s, digital synth, etc.)

35 Background I ExisNng generic SS approaches: Spectral subtrac9on IntuiNve Well- studied (industrial interest) Good for speech/stanonary noise reducnon Less appropriate for music signals

Background II ExisNng music- specific approaches I: Pan- frequency masks o Assumes non- overlapping signals in Nme- frequency bins o Stereo signals are required o Amplitude rano

36 Background II ExisNng music- specific approaches I: Pan- frequency masks o Assumes non- overlapping signals in Nme- frequency bins o Stereo signals are required o Amplitude rano between L and R FFT bins o 2D user interface Examples o Good for simple excerpts o Bad for complex mixes * Loses brightness, vocals less reduced due to reverb, flute is also removed,.,

product W (spectral basis) and H (gain acivaions over Ime) Spectrum frame

37 Background III ExisNng music- specific approaches II: Non- nega9ve Matrix Factoriza9on (NMF) Magnitude spectrogram (non- neganve) DecomposiNon as matrix product W (spectral basis) and H (gain acivaions over Ime) Spectrum frame explained as linear combinanon of R basis. MinimizaNon problem that finds W and H: min(d (V, WH)) H W V

38 NMF details Non- nega9ve Matrix Factoriza9on I 3 spectral basis W 1 overlapping note H: acivaion gains

39 NMF details Non- nega9ve Matrix Factoriza9on I 3 spectral basis W 2 overlapping notes H: acivaion gains

40 NMF challenges Predominant instrument separanon (pitch/nmbre analysis) Completeness of instrument removal (akack/sustain, residual/breathing noise, unvoiced consonants, ) Percussive instruments separanon (Transient detecnon, wideband spectrum) Polyphonic instrument separanon (blind and score- informed)

41 Vocals/Background separanon Music print decomposinon: song containing a region without target (e.g. vocals), basis model learnt from the user- selected music- print Music print (without vocals) Region with vocals

42 Vocals/Background separanon Music print decomposinon: Demos: original mute Background excerpt Basis decomposinon W H W bgd Input Basis decomposinon [W bgd, W other ] [H bgd,h oth er ] Wiener filtering (W bgd, H bgd )/(W M) output mute

43 Vocals/Background separanon Music print decomposinon: Demos: original solo Background excerpt Basis decomposinon W H W bgd Input Basis decomposinon [W bgd, W other ] [H bgd,h oth er ] Wiener filtering (W other, H other )/(W M) output solo

44 Vocals/Background separanon Music print decomposinon: not always possible accompaniment (music print) changes throughout the song target always present in some secnons

45 Vocals/Background separanon Solu9on à Predominant Pitch detec9on e.g MELODIA (J. Salomon, MTG) SeparaNon à Binary mask from pitch informanon Simplest approach Nme- frequency mask 1 s at harmonic posinons, 0 s rest Can be combined with pan- frequency mask Demos Voice is properly removed/akenuated Bass guitar is comb- filtered, and horns akenuated Soloing produces more arnfacts original mute solo

46 Vocals/Background removal Advanced separa9on approaches Special treatment for vocals: source / filter models Breathiness residual (noise added on formant envelope) Demos: Original Solo version without residual Solo version with residual

Vocals/Background removal Advanced separa9on approaches Special treatment for vocals Breathiness residual (noise added on formant envelope) Unvoiced FricaIve modelling /s/, /f/, /sh/,

47 Vocals/Background removal Advanced separa9on approaches Special treatment for vocals Breathiness residual (noise added on formant envelope) Unvoiced FricaIve modelling /s/, /f/, /sh/, supervised basis from solo phoneme recordings o Demos: Original Solo version /s/ are missing Solo version /s/ are present Spectrogram of the fricanve recording used to train the spectral basis.

48 Piano decomposinon/retouch Using instrument- specific NMF dicnonaries Piano model of 88 notes (W matrix is pre- learned). Retouch use- case: Amateur recording with errors. The user can select and correct individual notes aser decomposinon/ separanon. Original (ref) Original (played with errors) Separated notes Corrected remix

49 Score- informed separanon Mul9ple sources in an orchestral recording Score data is used to ininalize acnvanons matrix H Video Demo: Isolated instruments: violin, cello, oboe, bassoon, flute

50 Other potennal applicanons

51 Other potennal applicanons Singer replacement Original Vocals mute Vocaloid Clara Vocaloid Clara Mix Drums enhancement Original Drums+6dB Drums- 6dB Step- remixer for drums user- supervised transients (onsets Nme and instrument) Original All Drums Single Instrument

52 Other potennal applicanons (piano) Mono- to- stereo upmixing Input Mozart K331 recording (RWC dataset) Output Upmixing from Mono les/right hands are panned in stereo

53 Other potennal applicanons (piano) Automa9c accompaniment Input Mozart K331 recording (RWC dataset) Output automanc object detecnon String ensemble resynthesis synth solo (Kontakt) mixture

54 Thanks! Jordi Janer Music Technology Group Universitat Pompeu Fabra, Barcelona upf.edu CDSIM UPF May 2014 hkp://mtg.upf.edu/~jjaner

Simple Harmonic Motion: What is a Sound Spectrum?

Simple Harmonic Motion: What is a Sound Spectrum? A sound spectrum displays the different frequencies present in a sound. Most sounds are made up of a complicated mixture of vibrations. (There is an introduction