SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS. Orion Hodson, Colin Perkins, and Vicky Hardman

SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS Oron Hodson, Coln Perkns, and Vcky Hardman Department of Computer Scence Unversty College London Gower Street, London, WC1E 6BT, UK. ABSTRACT Long lved audo streams, such as musc broadcasts, and small dfferences n clock rates lead to buffer underflow or overflow events n recevng applcatons that manfest themselves as audble nterruptons. We present a low complexty algorthm for detectng clock skew n network audo applcatons that functon wth local clocks and n the absence of a synchronzaton mechansm. A companon algorthm to perform skew compensaton s also presented. The compensaton algorthm utlses the temporal redundancy nherent n audo streams to make naudble playout adjustments. Both algorthms have been mplemented n a smulator and n a network audo applcaton. They perform effectvely over the range of observed clock rate dfferences and beyond. 1. INTRODUCTION When samplng an audo sgnal for dgtal transmsson, the sample clock wll dffer from ts nomnal rate due to varatons n the quartz crystal oscllators and the components that regulate ther frequency. Durng the mplementaton of a network audo system [3] for workstatons and personal computers we observed.5% varaton between nomnally smlar clocks. Ths has undesrable effects, snce when the sender s clock s faster than the the recever s, samples wll accumulate; consumng memory and ncreasng the delay. As the memory avalable for the audo buffer s exhausted nterruptons occur as frames are dropped from stream. Conversely, when the sender s clock s slower than that of the recever, audo played out at the recever becomes nterrupted as the playout buffer runs dry. We present two algorthms: one for detectng clock skew, and one for compensatng for ts effects. Due to the presence of loss and jtter n the packet arrval process, t s not suffcent to merely observe playout buffer occupancy to detect clock skew. Instead, obervaton of the packet arrval process s necessary, makng the detecton of clock skew a non-trval problem. Our compensaton algorthm uses a novel low complexty pattern matchng algorthm, to dentfy perods where adjustments may be performed. Exstng research on Internet audo applcatons has focused on ssues arsng from the best effort nature of network, such as loss concealment and forward error correcton [9], and the calculaton of sutable playout ponts n the absence of synchronzaton mechansms [7, 1]. Work has also been undertaken on the detecton of clock skew between hosts [6, 8], though ths s the frst work we are aware of that addresses the ssues of transparent skew adjustment for network audo applcatons. Ths paper s structured as follows: secton 2 concerns the detecton of clock skew n unsynchronzed audo applcatons; secton 3 descrbes the compensaton algorthm to be appled when Delay Adaptaton Buffer Playout Delay Fgure 1: Playout bufferng n a network audo applcaton (from [5]). skew s perceved and t s performance on voce, popular, and classcal musc streams; secton 4 presents some prelmnary results llustratng the performance of these algorthms. A secton on mplementaton ssues concludes the contrbutons of ths work. Fnally, we summarze and dscuss potental future work. 2. SKEW DETECTION ALGORITHM In systems wth unsynchronzed clocks, such as RTP [12] audo applcatons, an offset must be added to each arrvng packet to map t from ts source s tme nto local tme. An addtonal delay, the playout delay, s added to compensate for arrval tme varaton (jtter) and for varatons n host schedulng [4]. The playout delay and mappng offset are usually held constant over the duraton of the packet stream to avod nterrupton. Detecton of clock skew can be performed by mantanng a runnng estmate of the mappng offset, Ñ, and lookng for dvergence from the mappng offset assgned to the current packet stream, Ñ Ø Ú. Assumng the -th packet has a tmestamp ndcatng ts source tme,, and the recever records ts arrval tme,, these varables are calculated usng: Ñ (1) Ñ «Ñ ½ ½ «µñ (2) wth the ntal Ñ Ø Ú beng Ñ ¼. When Ñ and Ñ Ø Ú have dverged suffcently compensatng acton must be taken. We denote the dvergence as Æ Ñ Ø Ú Ñ. Each value of Ñ calculated ncludes a component attrbutable to the varable transt tme between the sender and recever. The responsveness to these varatons s determned by the parameter «n equaton 2. There s a trade-off between beng reslent to short term fluctuatons and the delay ntroduced by the exponental weghtng process. We have found by expermentaton a value of

½ ¾ to represent a good compromse value wth packet szes of 2, 4, and 8ms. It s desrable to lmt the number of compensatng actons to keep the computatonal cost low and to reduce senstvty to the transent transt varatons. Hgh and low water marks, Æ Ä and Æ À, are therefore employed. The placement of the water marks s nfluenced by mplementaton ssues that are deferred untl secton 5. We therefore arrve at the followng formulaton: SAMPLESTOCORRECT( Ñ Ø Ú, Ñ ) 1 Æ Ñ Ø Ú Ñ 2 f Æ Æ Ä or Æ Æ À then 3 return Æ 4 return The returned value s passed to the skew compensaton algorthm as the upper lmt on the number of samples to add or remove from the stream. A postve dvergence value ndcates the source s clock s faster than the recever and Æ samples should be deleted from the stream. Conversely, a negatve dvergence value ndcates the source s relatvely slow clock rate and Æ samples need to be nserted. When the compensaton algorthm, n the next secton, makes adjustments t changes the value of Ñ Ø Ú accordngly. Ths algorthm suffces for symmetrc dstrbutons of transt varatons. In realty, outlers exst n the dstrbuton of transt tmes, partcularly those arsng from packet compresson events [1] that adversely nfluence the mappng offset. An addtonal mechansm, derved from Ramjee et al s algorthm 4 [1], s therefore employed to dentfy perods of packet compresson. Durng these perods the estmate of the mappng offset s not updated. Compresson perods are typcally of the order of 1 second n duraton; a perod short enough not to have repercussons at the observed skew rates. The short-term memory of offset estmate means that varatons n the mean end-to-end delay are ndstngushable from clock skew. In contrast to prevous work [6, 8], our algorthm s not a robust skew estmator. Our goal s to mantan the buffer occupancy wthn a constraned regon. Both clock skew and slow evolutons of the mean transt tme manfest themselves as changes n the buffer occupancy, and t s these that the compensaton algorthm caters for. Our estmator comes at a reduced cost, Ç ½µ compared to Ç Òµ, owng to the smpler goal. 3. COMPENSATION ALGORITHM When the skew detecton algorthm ndcates that the number of samples n the recever s playout buffer requres adjustment, a compensaton mechansm s appled. We have expermented wth several schemes, and the mechansm we present here represents the most successful. Potentally smpler schemes, such as regular (and rregular) sample nserton and deleton, ntroduce audble dstorton that s attrbutable to phase dscontnuty. The approach we adopt utlzes the temporal redundancy nherent n audo sgnals to dentfy segments of audo that may be repeated or cut from the stream to adjust the playout buffer occupancy. A smple and low-cost heurstc s used to locate repettve segments wthn a frame: Øµ ½ Ä Ä ½ Û µ Ñ Øµ (3) ¼ where Øµ represents the ampltude of sample Ø n frame, Û s poston of the match wndow start, Ä s the comparson wndow wdth, and Ñ poston n the frame where the best match commences. match commences. When the smallest value of Øµ for a frame s below a threshold, Ì, the porton of audo between the wndow and the match locaton s deemed an acceptable for repeatng or croppng. The parameters Ì and Ä are chosen to be 12 (usng a 16-bt lnear representaton) and 8 samples respectvely after aural testng. The match wndow and regon searched do not overlap. The matchng process s llustrated n fgure 2. Earler work on reparng mssng segments from packetzed audo streams [2, 11, 13] scale the samples between the wndow and the search regon. The heurstc we present selects only those segments wth a relatvely statonary sgnal gan. Whlst ths reduces the number of segments avalable for croppng and repeatng compared the earler work cted, no costly gan scalng s requred to mask the nserton and deleton operatons. The locaton of comparson wndow depends on whether a segment s to be nserted or deleted. When a segment s to be deleted the match wndow begns at the start of the frame. Conversely, when samples are to be nserted the match wndow s placed at the end of the frame. The nserton or deleton s appled and the transton s masked usng a lnear blendng functon. Frames n the playout buffer after the adjustment pont have ther playout tme shfted to mantan contnuty. The mappng offset used, Ñ Ø Ú, s updated and subsequent packets have the updated offset appled. The wndow locatons and operatons are depcted n fgure 3. 4. RESULTS To evaluate the effectveness of the compensaton algorthm we appled artfcal clock skew rates of.5%, 2%, and 5% relatve to the ntended playback rate to three sample streams comprsng of voce, popular musc, and classcal musc (descrbed n appendx A). The skew compensated and orgnal samples were then played through a hgh qualty headset to 1 lsteners. Skew compensaton adjustments of.5% and 2% were not detected by any of the lsteners. At 5% compensatng adjustments were apparent to all lsteners of the classcal musc pece. However, at 5% none of the lsteners notced the adjustments to the pop musc track and only one lstener commented on the brevty of the pauses between speech segments n the voce track. The compensatng algorthm has dffculty wth audo sources contanng prolonged perods wthout statonary repettve segments. Sources wth rch and wde rangng harmonc content, such as classcal musc, exemplfy the problem. In the perods wthout statonary repettve segments the number of samples above or below the threshold ncreases. When a passage wth a statonary perod arses many adjustments occur n close successon ntroducng a warblng effect. Sources lke voce that have frequent low energy components and a hgh fracton of statonary repettve segments are able to have adjustments made more frequently. A comparson of the number of samples n excess when adjustments are made for a faster source s shown n fgure 4. The dstrbutons of the sze of adjustments (not shown) are near dentcal for each audo stream type, the adjustment szes are evenly dstrbuted across the avalable range. We have also mplemented the algorthm n an RTP audo applcaton [3] and montored t s behavour wth multcast sessons of terrestral rado staton programmng and uncast tests between UCL and other nsttutons. We have observed skew rates of

15 1 1.9.8 5.7 Ampltude, s(t) -5 Cumulatve Dstrbuton.6.5.4.3-1.2-15 12 Match Wndow 5 1 Best Tme, t Match 15.1 Voce Pop Musc Classcal Musc 2 4 6 8 1 12 14 Sample Surplus at Tme of Adjustment 1 8 Fgure 4: The cumulatve dstrbuton of the number of samples over those expected, from a faster clocked audo source source, before an adjustment s possble. Resdual, E(t) 6 4.5% and have not observed any ll effects arsng from compensatng actons. 2 Threshold 5 1 15 Tme, t Fgure 2: Identfcaton of best matchng audo segment wthn an audo frame of the voce test sample. The audo sgnal s depcted n the top graph together wth the match wndow and the best matchng segment. The resdual s depcted lower graph. Orgnal Buffer Shortened Buffer Orgnal Buffer Expanded Buffer Search Wndow - 1-1 (copy) Best Match Blended samples Dscarded Fgure 3: Deleton and nserton of repeated segments n the playout buffer. The arrows ndcate audo segments that are blended to conceal nserton or deleton operatons. 5. IMPLEMENTATION CONSIDERATIONS We now consder the practcal ssues that mplementng the detecton and compensaton algorthm present. The key ssue s the selecton of the low and hgh water marks that are used to determne an adjustment s necessary. The low water mark s used to trgger sample nserton when the source s deemed to have a relatvely slow clock. It s desrable to trgger sample nserton before the applcaton runs out of audo to play, and has to nvoke a last-chance loss concealment mechansm or playout slence. We have found that both concealment and slence substtuton perform poorly when compared to the skew compensaton algorthm presented. If t s not possble to perform skew compensaton, loss concealment s potentally less damagng than slence substtuton though ts performance s heavly dependent on the sgnal content. The low water mark s therefore determned by how many samples can be defcent before starvaton effects occur. Our present mplementaton uses a fracton of the playout delay ncurred due to network effects 1, but ths s an ad-hoc measure, and further work s needed to refne ts performance. As a refnement, when t s determned that a source s clock s consstently slow, a recever may add addtonal delay beyond the mnmum necessary to ensure correct playout. Ths prevents nterrupton durng perods that the skew compensaton algorthm cannot functon due to the lack of statonary segments n the audo stream. The crtera for the hgh water mark s dependent on how much delay and memory consumpton the recever s prepared to tolerate. The hgh water mark needs to be placed a good dstance from the low; suffcently far that oscllatons do not occur between postve and negatve playout adjustments. Our mplementaton presently uses a fxed value of 2ms whch s suffcently large to 1 As opposed to the delay ntroduced to compensate for schedulng varaton n the host system [4].

handle observed network transt delay varatons, and short enough that the addtonal playout delay whch may be ncurred s not an ssue for nteractve sessons. 6. SUMMARY AND FUTURE WORK We have presented a low complexty algorthm to detect clock skew between remote audo applcatons, and an algorthm to compensate by addng or removng statonary audo segments. We have found these to work well n smulaton and n a real applcaton. Our detecton algorthm s affected by slow varatons n endto-end delay, such as those arsng from demand drven varatons n router queue lengths. These changes manfest themselves n a smlar manner to clock skew, and our compensaton algorthm also adapts to these changes. In practce, we have the found combnaton of the two algorthms to work well. We beleve our compensaton algorthm could also be used to effect small changes n playout pont, f needed. Applcatons are typcally conservatve and choose a large ntal playout delay, snce they have nsuffcent nformaton about the transt tme varaton to be more aggresve. The compensaton algorthm presented could make adjustments to the playout pont for those streams whch do not have natural breaks. The use of statonary segments for the compensaton s both a strength and weakness of ths approach. It enables naudble adjustments but at a cost of delay when a sutable segments are not present n the audo stream. In future work we ntend to compare the compensaton scheme wth a resamplng algorthm that may conceal phase shfts better than ndvdual sample nserton and deleton schemes. 7. ACKNOWLEDGEMENTS We thank the members of our research group who partcpated n the lstenng tests used to assess the effectveness of the compensaton algorthm. We are ndebted to Jm Gemmell, Mark Handley, Isdor Kouvelas for feedback on ths work, and to Mark Handley and Jtendra Padhye for accounts on ther computng facltes. Fundng from Brtsh Telecommuncatons plc (ML72254) and the European Commsson Telematcs for Research project (RE47) facltated ths work. A. AUDIO SAMPLES Two mnute audo samples of the of the followng peces were used n evaluatng the algorthms presented: You ve got to buld bypasses, excerpt from Htch-Hker s Gude to the Galaxy, Douglas Adams, BBC Worldwde Ltd, 1978. Imagnary Frends, from Dzzy Heghts, The Lghtnng Seeds, Sony Musc Entertanment, 1996. Brandenburg Concerto No. 4 n B flat major, Johan Sebestan Bach, recordng of the Englsh Chamber Orchestra / Benjamn Brtten, Decca Record Company, 1969. B. REFERENCES [1] Jean-Chrysostome Bolot. Characterzng end-to-end packet delay and loss n the Internet. Journal of Hgh Speed Networks, 2(3):35 323, 1993. [2] Davd J. Goodman, Gordon B. Lockhart, Ondra J. Wasem, and Wa-Choong Wong. Waveform substtuton technques for recoverng mssng speech segments n packet voce communcatons. IEEE Transactons on Acoustcs, Speech, and Sgnal Processng, ASSP-34(6):144 1448, December 1986. [3] Oron Hodson and Coln Perkns. Robust audo tool (RAT) verson 4. Software avalable onlne at http://wwwmce.cs.ucl.ac.uk/multmeda/software/rat-4., December 1999. [4] Isdor Kouvelas and Vcky Hardman. Overcomng workstaton schedulng problems n a real-tme audo tool. In Proc. of Usenx Wnter Conference, Anahem, Calforna, January 1997. [5] Isdor Kouvelas, Vcky Hardman, and Anna Watson. Lp synchronsaton for use over the nternet: Analyss and mplementaton. In Proceedngs of the IEEE Conference on Global Communcatons (GLOBECOM), London, England, November 1996. [6] Sue Moon, Paul Skelly, and Don Towsley. Estmaton and removal of clock skew from network delay measurements. In Proceedngs of the Conference on Computer Communcatons (IEEE Infocom), New York, March 1999. [7] Sue B. Moon, Jm Kurose, and Don Towsley. Packet audo playout delay adjustment: performance bounds and algorthms. ACM/Sprnger Multmeda Systems, 5(1):17 28, January 1998. [8] Vern Paxson. On calbratng measurements of packet transt tmes. In Proceedngs of the ACM Sgmetrcs Conference on Measurement and Modelng of Computer Systems, pages 11 21, Madson, Wsconsn, June 1998. [9] Coln Perkns, Oron Hodson, and Vcky Hardman. A survey of packet loss recovery technques for streamng audo. IEEE Network, 12(5):4 48, September 1998. [1] Ramachandran Ramjee, Jm Kurose, Don Towsley, and Hennng Schulzrnne. Adaptve playout mechansms for packetzed audo applcatons n wde-area networks. In Proceedngs of the Conference on Computer Communcatons (IEEE Infocom), pages 68 688, Toronto, Canada, June 1994. IEEE Computer Socety Press, Los Alamtos, Calforna. [11] Hennng Sanneck, Alexander Stenger, Khaled Ben Younes, and Bernd Grod. A new technque for audo packet loss concealment. In Jon Crowcroft and Hennng Schulzrnne, edtors, Proceedngs of Global Internet, pages 48 52, London, England, November 1996. IEEE. [12] Hennng Schulzrnne, Steve Casner, Ron Frederck, and Van Jacobson. RTP: a transport protocol for real-tme applcatons. Request for Comments (Proposed Standard) 1889, Internet Engneerng Task Force, January 1996. [13] Ondra J. Wasem, Davd J. Goodman, Charles A. Dvorak, and Howard G. Page. The effect of waveform substtuton on the qualty of PCM packet communcatons. IEEE

Transactons on Acoustcs, Speech, and Sgnal Processng, 36(3):342 348, March 1988.