Aalysis-ad-maipulatio approach to pitch ad duratio of musical istrumet souds without distortig timbral characteristics Takehiro Abe Katsutoshi Itoyama Kazuyoshi Yoshii Kazuori Komatai Tetsuya Ogata Hiroshi G. Okuo Departmet of Itelligece Sciece ad Techology, Kyoto Uiversity, Japa Natioal Istitute of Advaced Idustrial Sciece ad Techology (AIST) Demostratio: http://wiie.kuis.kyoto-u.ac.jp/~abe/dafx-8/ Active music listeig Selectig from users requiremet Chagig music to suit users feelig Active ad exploratory listeig user ca chage Istrumets Volume Timbre Drumix [Yoshii 7] Motivatio (Drums oly) Covetioal music listeig Selectig from limited playlist Oly listeig after pressig play Itoyama s EQ. [Itoyama 8] Passive ad limited listeig experiece Istrumet equalizers have bee developed (All) Our equalizer Replacig arbitrary part with users favorite timbre Demostratio (Trial equalize part buttos gere buttos Cotet midi soud sythesized piao soud Jazz soud (sythesis) Equalizer s souds are sythesized from real souds except midi souds Requiremets for our equalizer. Soud separatio from polyphoic audio to extract a musical istrumet soud that users wat to replace Well studied 2. Soud maipulatio from separated souds without timbral distortio to play arbitrary phrases The applicatio of separated souds is ot well studied Our research target Differece from the soud excited by real istrumet Objective Sythesizig mootoes excited by the same istrumet from multiple musical istrumet souds
Our defiitio of timbral features ASA s defiitio The quality of a soud that distiguishes it from others of the same pitch ad volume [ASA 6] Cocrete defiitio based o [Grey 77] Our defiitio The quality of a soud that cosists of three features except pitch ad volume.the relative amplitudes 3.Temporal evelopes of harmoic peaks 2.The iharmoic compoet We use the toal model that ca aalyze these features [Itoyama 8] Maipulatio of pitch ad duratio It is ot proper to achieve maipulatio without chagig the timbral features Seed, (44Hz) Timbre has pitch depedecy [Marozeau 3] Ref., (88Hz) Phase vocoder (88Hz) We use pitch-depedecy feature fuctio for the depedecy Attack, decay ad vibrato feature are i the same istrumet Seed, ref., (legth) Siusoidal model(legth4) attack segmet high frequecy vibrato feature We preserve attack, decay segmets ad vibrato feature Our method(88hz) Our method(legth4) Overview of our maipulatio method StepAalysis Separate harmoic ad iharmoic structures ad extract timbral features Step2Maipulatio Maipulate pitch, duratio, ad eergy of the iharmoic structure Step3Sythesis Sythesize harmoic ad iharmoic sigals ad add them Harmoic structure Frequecy Iharmoic structure Aalysis to obtai three features Toal model Feature2 Harmoic model Iharmoic model represets spectrogram of iharmoic compoet Frequecy Feature Frequecy Frequecy Spectral Power of structure harmoics is expressed as the Gaussia Mixture Model Temporal structure is expressed as the oparametric model Duratio Evelope Feature3 2
maipulatio Maipulatig the spectral evelope by multiplyig the pitch ( µ( ) by a desired ratio Obtai timbral features from pitch-depedet feature fuctio v v' µ ( µ( Frequecy µ '( µ' ( -depedet feature fuctio approximates timbral features over pitches by polyomial fuctio power of harmoics ( v ) the ratio of harmoic eergy to iharmoic eergy ( w / ) v of st. Power of th harmoics.8.6.4.2. 22 44 88 Fudametal Frequecy [Hz] v of 4th. Power of th harmoics.. 4. 22 44 88 Fudametal Frequecy [Hz] The ratio of harmoic e. to iharmoic e. w H / w I w H I 2. 6.... 22 44 88 Fudametal Frequecy [Hz] pitch [Hz] pitch [Hz] pitch [Hz] Power of harmoics Duratio maipulatio Maipulatig the temporal evelope ( E( ) by expadig or shrikig betwee oset ( r o ) ad offset ( r off ) de( detectio equatio: < ε, E( > Th dr Detect Detect Preserve Expad Preserve Temporal E ( E( evelope ro roff Preservig the vibrato ( µ( ) is aalyzed ad sythesized by siusoidal model Frequecy Origial Aalyze µ ( µ( Preserve Sythesize Preserve Smoothig Sythesized Sythesis from harmoics ad iharmoics Harmoic sigal ( ) usig siusoidal model s H s H s s I Equatios for harmoic sigal Harmoic sigal: s H ( t) = A ( t)exp[ jφ ( t)] Istace amplitude: A ( t ) w v ' E ( t ) Istace phase: = H φ ( t) = φ () + µ '( τ ) dτ t s Iharmoic sigal ( s I ) from iharmoic model weighted by iharmoic eergy ( w I ' ) Output sigal ( ) obtaied by addig these two sigals parameter is a maipulated parameter. w Harmoic eergy: H Power of harmoics: v ' Temporal evelope: : ( E µ' τ ) Evaluatio i pitch maipulatio Baselie method = Sophisticated siusoidal model Our method without pitch-depedet feature fuctio Criteria Spectral distace: evaluatio of harmoic compoet differece Mel-Frequecy Cepstrum Coefficiet (MFCC) distace: quatitative auditory measuremet evaluatio of harmoic ad iharmoic compoets differeces D = ( C ( f, C ( f, ) 2 / T f, t real Real soud sy C Spectrum or MFCC i Sythesis soud Frames Coditios 32 istrumets from RWC-MDB (forte, ormal articulatio) 3 idividuals for each istrumet -fold cross validatio (%:9% = [evaluatio data]:[learig data]) 3
Spectral distace MFCC distace Quality i pitch maipulatio Fagot for discussio musical istrumets Average 64.7% reduced 32.3% reduced There was good improvemet for the whole musical istrumets Discussio o good improvemet The result of the fagot Baselie Distaces icreased with the absolute maipulated semitoes Ours Distaces were stable Spectrum differece 4 2 2 4 Maipulated halftoes The result demostrated the validity of our method, which cosiderig pitch depedecy of timbre MFCC distace blueour method redbaselie method Maipulated semitoe low pitch high pitch Discussio o poor improvemet There was poor improvemet for istrumet souds that have a lot of iharmoic compoet i attack segmet The result of the madoli Spectral distace The relative amplitudes of harmoic peaks of a sythesized soud are to those of a real soud MFCC distace The distributio of the iharmoic compoet of a sythesized soud differs from that of a real soud. Oly w / w H I is isufficiet for pitch-depedecy of iharmoic compoet Spectral distace Spectrum differece MFCC distace blueour method redbaselie method MFCC differece.4.2.8.6.4.2 3 3 2 2 4 2 2 4 Maipulated halftoes 4 2 2 4 halftoes Maipulated semitoe low pitch high pitch Coclusio Objective Maipulatig pitch ad duratio of a musical istrumet soud usig multiple istrumet souds without distortig timbral characteristics Approach We defied ad aalyzed timbral features. I pitch maipulatio, we use pitch-depedecy of timbre as a pitch-depedet feature fuctio I duratio maipulatio, we preserve attack, decay ad the vibrato Future work Icorporatig other depedecies (e.g., volume) Evaluatig our method for duratio maipulatio Applyig our method to musical istrumet parts separated from the polyphoic audio sigals of commercial CD recordigs 4
maipulatio demo. for piao Real souds seed, (44Hz) Sythesized souds (88Hz) Phase vocoder STRAIGHT Duratio maipulatio demo. for violi Real soud Sythesized souds(legth4) seed, ref., (legth) Siusoidal model Our method ref., (88Hz) Siusoidal model Our method 2 from MARSYAS 2 do ot use soud Ref. as learig data maipulatio demo. for trumpet Real souds Sythesized souds (88Hz) seed, (44Hz) Phase vocoder STRAIGHT ref., (88Hz) Siusoidal model Our method 2 from MARSYAS 2 do ot use soud Ref. as learig data