Synchronous Capture of Image Sequences from Multiple Cameras. P. J. Narayanan, Peter Rander, Takeo Kanade CMU-RI-TR PDF Free Download

Synchrnus Capture f Image Sequences frm Multiple Cameras P. J. Narayanan, Peter Rander, Take Kanade CMU-RI-TR-95-25 Carnegie Melln University The Rbtics Institute Technical Reprt 19960731 027 DISTRIBUTION STATEMENT 4 j Apprved fr public release; j Distributin Unlimited j

DISCLAIMER NOTICE THIS DOCUMENT IS BEST QUALITY AVAILABLE. THE PY FURNISHED TO DTIC NTAINED A SIGNIFINT NUMBER OF PAGES WHICH DO NOT REPRODUCE LEGIBLY.

REPORT DOCUMENTATION PAGE Frm Apprved OMB N. 0704-0188._ i.-.......ij :r. - -! - ' 3 instructins, searching existing data iou"«puwk recrtmg *.r«n fr this «««tw>0< '"^^ "1' t^, "?^ n%,^mn<0* Send cmments regarding th burden estimate r any ther»»ertf«..» gathering and m.inta.mng the data needed..»* < *'"9'Jf.,0 Xsh.ngtn Heaauarters Services. Directrate fr nfrm.tin Oeratins and Reprts U15 Jeffersn ~;;;;%T»*.*'$ä^ izu%& ffce< M?nW«and Budget. -aperwr«reductin Pr,ea,0704-0,88). Washingtn. DC ;0S03. 1. AGENCY USE ONLY (leave blank) 2. REPORT DATE December 1995 4. TITLE AND SUBTITLE Synchrnus capture f Image Sequences frm Multiple Cameras 3. REPORT TYPE AND DATES VERED technical S. FUNDING NUMBERS 6. AUTHOR(S) P. J. Narayanan, Peter Rander, Take Kanade 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) The Rbtics Institute Carnegie Melln University Pittsburgh, PA 15213 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION REPORT NUMBER CMU-RI-TR-95-25 10. SPONSORING/MONITORING i AGENCY REPORT NUMBER j 11. SUPPLEMENTARY NOTES 12a. DISTRIBUTION/AVAILABILITY STATEMENT 12b. DISTRIBUTION DE Apprved fr public release; Distributin unlimited 13. ABSTRACT '.Maximum 200 wrds) Several applicatins tday need t digitally capture every frame f a vide stream r a camera. These range frm psychlgical studies t surveillance t vide prcessing. Sme applicatins als need t capture the fames frm multiple vide streams synchrnusly and t crrelate them with ne anther. A vide stream f clr data, thugh, represents a sustained bandwidth f abut 26 Mbytes per secnd frm the digitizing hardware t a secndary strage device withut cmpressin. This rate is well beynd the capabilities f mst affrdable systems tday. We. present a system that can synchrnusly capture every frame--r any user-specified subset f them--frm multiple cameras and stre them in a regular secndary strage device. The utputs f the cameras are recrded an tape using ;nventinal VCRs. The Vertical Interval Time Cde (VITC) is inserted nt each stream befre recrding. Each tape is )layed back, repeatedly under cmputer cntrl f necessary, ff-line n an editing VCR t grab the frames using a cmmercial digitizer. The VITC data is used t directly identify the frames while the tape is played back. We believe Ms is the first system in the wrld that can capture every frame frm multiple vide streams synchrnusly, fully calable in the number f streams and the duratin f capture. Finally, the system is inexpensive, csting $500 per channel. We present the system, its cmpnents, and the prcess f verifying the capturing prcess. We then discuss a few cmputer visin research prjects made feasible by such a system. 14. SUBJECT TERMS 15. NUMBER OF PAGES IOEEL 16. PRICE DE 17. SECURITY CLASSIFITION OF REPORT unlimited 18. SECURITY CLASSIFITION OF THIS PAGE unlimited 19. SECURITY CLASSIFITION OF ABSTRACT unlimited 20. LIMITATION OF ABSTRACT j unlimited \ C,Nj 7 = in..t.-5n.^:.- WZ' A'>Ü ; Ü> sraarc : 0'm i'33

Synchrnus Capture f Image Sequences frm Multiple Cameras P. J. Narayanan, Peter Rander, Take Kanade CMU-RI-TR-95-25 The Rbtics Institute Carnegie Melln University Pittsburgh, Pennsylvania 15213 December 1995 1995 Carnegie Melln University Abstract Several applicatins tday need t digitally capture every frame f a vide stream r a camera. These range frm psychlgical studies t surveillance t vide prcessing. Sme applicatins als need t capture the frames frm multiple vide streams synchrnusly and t crrelate them with ne anther. A vide stream f clr data, thugh, represents a sustained bandwidth f abut 26 Mbytes per secnd frm the digitizing hardware t a secndary strage device withut cmpressin. This rate is well beynd the capabilities f mst affrdable systems tday. We present a system that can synchrnusly capture every frame r any user-specified subset f them ~ frm multiple cameras at full reslutin and stre them n a regular secndary strage device. The utputs f the cameras are recrded n tape using cnventinal VCRs. The Vertical Interval Time Cde (VITC) is inserted nt each stream befre recrding. Each tape is played back, repeatedly under cmputer cntrl if necessary, ff-line n an editing VCR t grab the frames using a cmmercial digitizer. The VITC data is used t directly identify the frames while the tape is played back. We believe this is the first system in the wrld that can capture every frame frm multiple vide streams synchrnusly, fully scalable in the number f streams and the duratin f capture. Finally, the system is inexpensive, csting $500 per channel. We present the system, its cmpnents, and the prcess f verifying the capturing prcess. We then discuss a few cmputer visin research prjects made feasible by such a system. S.-Ü t V> ifetidfaiiu r :.;T DISTRIBUTION STATEMENT A Apprved fr public release; Distributin Unlimited

1 Intrductin Several applicatins tday need t digitally capture every frame f a vide stream frm a camera. These range frm image/vide prcessing t psychlgical studies t surveillance. Many als need t capture the frames f multiple vide streams synchrnusly and crrelate the data frm multiple viewing angles. A vide stream frm a camera fllwing the NTSC standards prduces 30 frames f vide data per secnd, each f which is typically digitized int 480 rws f 640 pixels with gray level values represented using an 8-bit number fr mnchrme images and using three 8-bit values fr clr images. Thus, a vide stream represents a sustained bandwidth f 26 MBytes per secnd in clr (9 MBytes per secnd fr mnchrme) withut cmpressin. Thugh mst frame grabbers digitize in real time, this rate is well beynd the thrughput f mst secndary strage devices where the captured frames are t be stred, even with the best lss-less cmpressin. Mst image prcessing applicatins - such as the stere cmputatin needed in ur applicatin - cannt tlerate lssy cmpressin, being interested in the minute variatins f image structure. There are a few cmmercially available high-end systems with the capacity t grab frames at vide rate. They use expensive hardware and specialized disk systems t achieve the necessary bandwidth and thrughput. Fr example a system cntaining digitizers and a set f the MD1 family f digital image recrders frm Datacube, cnfigured t give apprximately 10 minutes f recrding time, csts apprximately $25,000 per vide stream with additinal streams csting nearly as much. An ff-line slutin t the prblem can be achieved by recrding each vide stream nt a laser vide-disc recrder after embedding its frames with unique time stamps. The high-end laser disc players can prvide frame-accurate readut f each vide stream while digitizing them ff-line. Hwever, such a recrder typically csts $15,000. A multi-stream system can get prhibitively expensive since each stream requires a separate recrder. Mrever, the system is designed fr visual reprductin and culd emply a lssy cmpressin prir t string that might have negative side effects in image prcessing applicatins. Frankel and Webb develped a scalable multi-camera interactive vide capture system using an iwarp, a general purpse massively parallel prcessr [1]. The digitized camera streams are fed t the internal data pathway f the iwarp and stred in the lcal memry mdules f the nde prcessrs. The primary memry capacity f the iwarp limited the number f frames that culd be held in memry. An iwarp system with 256 MBytes f memry can capture abut 33 secnds (1024 frames) f a single mnchrme vide stream data r 2 secnds (32 frames) f 16 vide streams. This system serves applicatins fr which the abve rates are sufficient and culd be ecnmical if an iwarp were already available. We present a system that can capture every frame - r any user-specified subset f them ~ f many cameras at full reslutin and stre them n a regular secndary strage device. It stres the image data frm each camera n an rdinary S-VHS (culd use any frmat) tape. The Vertical Interval Time Cde (VITC) is inserted nt each stream befre recrding. These tapes are played back ff-line n an editing VCR and digitized using an ff-the-shelf digitizer attached t a wrkstatin. Each tape is played back n the VCR repeatedly under cmputer cntrl until all the necessary frames are captured. The VITC is used t directly identify the frames as the tape is played. The time cde is als used t crrelate frames frm multiple cameras which are all synchrnized t a cmmn sync signal. In rder t digitize each vide tape autmatically, we use a cmputer-cntrllable VCR, althugh a manually-cntrlled VCR can be used fr interactive digitizatin. The strng pints f ur system are lng recrding capacity (limited nly by tape length) and lw cst per channel. The cst f the recrding setup - a VCR plus the VITC unit - is $500 per channel, scalable t any number f channels. The digitizing setup fr autmated image capture csts $5000 fr the Panasnic DS- 850 VCR in additin t the cst f a cmmercial frame grabber. The weak pints f ur system are the fllwing. One, the vide data is stred n cnventinal tapes using cmmercial VCRs befre digitizing, re-

ducing the visual quality. Tw, the prcess culd be time cnsuming fr large numbers f vide streams since each tape needs t be digitized independently and separately. In this paper, we present hw we recrd the utputs f the cameras synchrnusly and hw we digitize the tapes t recver the synchrnized vide streams. The VITC time cde plays an imprtant rle in ur system. We, therefre, briefly describe the VITC standards in the next sectin. Sectin 3 presents ur recrding setup and Sectin 4 presents the digitizing prcess. We describe ur prcedure t independently verify the cmpleteness and synchrnizatin f the prcess in Sectin 5. In Sectin 6 we sketch hw the system is used in a few cmputer visin research prjects. Thrughut the paper we use numbers specific t the NTSC standards (525 lines, 30 frames per secnd) but ur system is nt (cnceptually) limited t NTSC. 2 Vertical Interval Time Cde The Sciety fr Mtin Pictures and Televisin Engineers (SMPTE) has defined tw standards f time cdes that uniquely identify the frames f a vide stream: the Lngitudinal Time Cde (LTC) and the Vertical Interval Time Cde (VITC)[6]. Bth standards assign a number t each frame in the hurs-minutessecnds-frames frmat, while VITC als encdes the field 1 number and includes 8 bits fr errr detectin (using a Cyclic Redundancy Check, r CRC). The tw standards differ in hw the cde is stred. LTC, cnsisting f eighty bits f time cde fr each frame, is stred n an audi track simultaneusly with the vide data. VITC, cnsisting f ninety bits f time cde fr each field, is stred n tw hrizntal scan lines during the blanking perid f each field. VITC encdes the 1 bits in the data as shrt bright streaks and the 0 bits as dark nes. Fr each field, all ninety bits are stred n ne scan line and repeated n the scan line tw away frm it fr redundancy. The VITC inserts tw synchrnizatin bits befre each grup f data instead f all at the end f the cde as dne by the LTC. Recrding the LTC n an audi track by fanning the audi signal ut t every recrding device is a cnvenient methd t crrelate the frames f multiple vide streams that are electrnically synchrnized with ne anther. Hwever, LTC time cdes cannt be reliably read at slw speeds. VITC des nt have that drawback, being recrded as vide data in each field. The time cde can be read prperly even in still mdes. The extra sync bits and the CRC bits make VITC a mre reliable time cde standard. Anther feature f VITC makes it attractive frm ur pint f view: the time cde can be acquired as bright and dark streaks in by the frame grabber as part f the captured image. It can be interpreted directly while digitizing r at a later time frm the stred image. LTC wuld require additinal hardware t cnvert the audi signal int smething the cmputer can interpret, which wuld require extra synchrnizatin between the vide and LTC signals. A line f VITC data cnsists f nine grups f 10 bits, each cntaining 2 sync bits and 8 bits f data. 32 User Bits, preset by the user at start, identify the run and are repeated n each field unless the user changes them. The hurs f the time cde (ranging frm 00 thrugh 23) are stred as tw decimal digits using 6 bits (2 fr the tens digit + 4 fr the units digit). Minutes and secnds (00 thrugh 59) are each stred as tw decimal digits using 7 bits (3 fr the tens digit + 4 fr the units). The frame number (00 thrugh 29) is stred as tw decimal digits using 6 bits (2 fr the tens + 4 fr the units). Singular flag bits identify ther infrmatin such as whether the field is dd r even. The last grup f 8 data bits prvide a cyclic redundancy check t validate the data. Figure 1 shws the time cde digitized as a series f bright and dark streaks at the tp f a frame. Cnsecutive lines f time cde data are frm different fields f the same frame. The time cde infrmatin is repeated in each field after a gap f ne scan line. Each bit f the data, whether 1 r 0, has a fixed duratin. The ttal time fr the 90-bits f ne time-cde line is 50.286 micrsecnds. 1. NTSC vide cntains 30 frames, r images, per secnd, with each frame cmpsed f tw interlaced fields. Field 1 cntains the dd lines f the frame, while field 2 cntains the even lines.

pa c Ü I c c s s u c >> e tu 1 * E 0- O s u e c «1 3 - c Ü s CD 5 ca S e Tt en VI C O a 1 C O Ü u a 2 Eg "a» c 4-» s u c >> 3 i 3 C»n c \ r- 00 Ü a Ü O I 'S c 1 «a 1 3 i 5 t-h 5 s c >> c 3 e 5 5 c >> c 3 O X s ta 5 c 3 >. O c X 5 5 c u U Figure 1: Vertir.al blanking prtin f a frame cntaining VITC data We make the VITC part f the digitized image by cnfiguring the digitizer t grab the scan lines f the blanking perid. This key feature enables us t grab all frames f the vide stream reliably as explained in later sectins. We extract the time cde frm the white and black streaks f encded VITC data. Each bit f the time cde spans apprximately 7 digitized pixels 1. The synchrnizatin bits at the start f each grup are used t identify the grups f time-cde bits. 3 Synchrnus Multi-stream Vide Recrding Our apprach t synchrnusly recrding analg vide frm multiple cameras can be brken dwn int three steps: synchrnize the vide cameras themselves using a cmmn sync signal s that they acquire image frames simultaneusly, embed a time stamp within each frame in all vide streams, and finally recrd the vide streams t standard Vide Cassette Recrders (VCRs). By synchrnizing the individual cameras, the images in the vide streams are acquired at the same "abslute" time. By recvering the unique time stamp embedded in every field f each vide stream, we can easily crrelate the frames frm multiple streams. By string the vide fr later use, we can perfrm slwer, ff-line digitizatin yet ensure the recvery f every frame f each stream. A blck diagram f ur multi-camera synchrnus recrding system is shwn in Figure 2. An external sync signal is supplied t each camera as well as t a time-cde generatr, synchrnizing them t ne anther. The time cde generatr creates a unique time stamp in the Lngitudinal Time Cde (LTC) frmat, synchrnized with the cntrl signal. This time cde is fed t each time-cde inserter which inserts the VITC int the vide stream. Finally, standard VCRs recrd the vide stream in real time nt tapes. 4 Synchrnus Multi-stream Digitizatin A set f frames, ne frm each stream, recrded at the same time instant cmprises a synchrnus snap-sht f the multi-stream data. A number f such snap-shts, 30 in the NTSC system, are taken each secnd. The task f synchrnus multi-stream digitizatin is t prvide all the snapshts the user requests. We can achieve this by synchrnusly prcessing the tapes, recrded as described in the previus sectin. It is cmmn in image prcessing t digitize images frm vide tapes playing in a VCR. The user pauses the VCR at each f the required frames and instructs the cmputer t grab the frame.(the prcess f pausing and grabbing the required frames given a manually selected starting pint culd be autmated using a cmputer cntrlled VCR.) This methd has tw drawbacks frm ur pint f view. One, we have fund that An NTSC vide frame is typically digitized t have 640 pixels hrizntally. Since there are 90 VITC bits in each scan line, we get a bit time f 640/90 = 7 per pixel.

Caml SYNC VIDEO ^- VIDEO Cam 2 SYNC 1 VIDEO Cam> 'SYNC Svnc G enerati SYNC ^ _ ^ VITC Inserter (1) LTC VITC Inserter (2) A LTC VITC Inserter (N) LTC Time Cde Generatr VIDEO+VITC VIDEO+VITC VIDEO+VITC VCR (1) VCR (2) VCR (N) Figure 2: Blck diagram f the synchrnus, multi-channel, vide recrding system. mst editing VCRs are nt frame-accurate. The frame number they think they are stpped at may be ff by ne r tw frames frm the real ne, even n sphisticated VCRs. This makes it impssible t accurately crrelate frames frm tw different tapes using the VCR t select time cdes. The secnd reasn cncerns the reslutin f the image utput by the VCR in freeze r pause mde. Vide "cnsumer" devices such as a mnitr r a frame grabber cmbine alternate fields supplied t it int a frame. VCRs utput the dd and even fields stred n tape alternately when playing, ensuring prper frame cmpsitin. Mst VCRs, hwever, psitin the head ver a single field when paused. That field is therefre utput as bth dd and even field's f the vide data, reducing the effective vertical reslutin by a factr f tw. Figure 3 shws the drp in the vertical reslutin n a prtin f the image digitized in play speed and while paused. Finally, mst VCRs d nt allw the user t select the field n which t freeze, making it impssible t capture every frame f even ne vide stream in full reslutin. We vercme the frame inaccuracy by directly interpreting the time cde frm the image. We capture in full reslutin by digitizing when the VCR is playing at the nrmal speed. A blck diagram f the setup is shwn in Figure 4. We cnfigure the frame grabber a cmmercially available K 2 T V300 digitizer fr Sbus develped at CMU t grab the blanking perid cntaining the VITC as part f the image. Figure 1 shws the VITC prtin f a frame grabbed by the system. The black and white streaks f VITC data are interpreted by the cmputer n the fly t btain the time cde fr each field as the tape is playing. If the current frame/field is ne the user needs, the digitizer is instructed t freeze it s that it can be transferred frm the digitizer's memry t the prcessr memry and/r the secndary strage. The tape cntinues t play and the system captures as many unread frames frm the user's list as pssible. When the tape ges beynd the last frame f interest t the user, it is rewund t the starting frame f the user's interest under cmputer cntrl. This cntinues until all f the desired frames/fields are captured. (a) (b) Figure 3: An edge digitized (a) in play speed (b) while paused. Ntice the reductin in vertical reslutin in (b).

1 1 4 Cntrl j/\ Cntrl VCR (Panasnic DS850) VIDEO +VITC Digitizer (V300) VIDEO +VITC Cmputer (Sparc 20) Figure 4: Blck diagram i f the analg-t-digital cnversin system The ability t grab a specific frame by directly interpreting the time cde frm the image reduces the bandwidth requirement frm 26 Mbytes per secnd per channel t whatever the system can supprt. The lwer the bandwidth the higher the number f passes required t get all necessary frames. 5 Prcess Verificatin In rder t guarantee the prper synchrnizatin f the cameras, we must have a methd f cmparing the cameras t ne anther independent f the time cdes inserted in their vide streams. We accmplish this task by pinting the cameras at the display f a custm-built cunter that is synchrnized t the Sync Generatr f Figure 2. This device cunts the number f fields in the sync signal, then displays the number cnverted int secnds (up t 59), frames (up t 29), and fields (0 r 1, represented by an LED either n r ff). Because the cunter display changes synchrnusly with the master sync fr the recrding system, each field f vide in each stream shuld recrd nly a single state f the cunter. If the cameras and the cunter are nt synchrnized, they will recrd a cmbinatin f tw states, s ne test is t cnfirm that a single state is recrded in each field. The state in ne vide stream can als be cmpared t the state recrded in the ther streams. If the systems are fully synchrnized, then the cunter display will be the same fr all cameras, given a VITC time cde t cmpare. Anther way t perfrm this crss-camera check is t cmpare the difference between the VITC and the cunter in ne vide stream t the difference in the thers. Again, in a fully synchrnized system, this ffset shuld remain cnstant - this is ur secnd test. The first test abve implicitly assumes that each field f the vide stream represents the infrmatin accumulated frm n mre than ne field time (l/60th secnd in NTSC). Real cameras, hwever, frequently have tw mdes f accumulatin, frame and field. Frame accumulatin mde (r just Frame mde) effectively keeps the camera's shutter pen fr tw cnsecutive fields (ne frame), while field accumulatin mde (r just Field mde, the ne implicitly assumed abve) keeps the shutter pen nly fr ne field. If the camera is in Frame mde, then the field n lnger cntains a single state f the cunter, but nw cntains the cmbinatin f tw cnsecutive states. This cnditin results in tw trublesme phenmena. First, recall that the cunter displays the field by turning n and ff a single LED, changing state nce every field. The accumulatin f tw cnsecutive fields in Frame mde results in the LED appearing t be n all the time, s we n lnger have field accuracy in the bservatins f ur cunter. Secnd, nce every frame the cunter will increment the displayed numbers representing the number f frames. One field ut f every tw, then, will see the cunter state befre and after the transitin, resulting in an image cmbining the views f the tw states. Frtunately, the secnd phenmena is much easier t deal with, s we did nt need a slutin fr it. The first prblem, hwever, must be addressed t be able t guarantee field-accurate synchrnizatin. In Frame mde the shutter is pen fr a full frame (l/30th secnd in NTSC). In rder t see an LED flashing, it shuld be ff fr at least that lng during each cycle. By turning the LED n r ff nce each frame (half frame rate) we satisfy this cnditin. The physical sequence f the light, then, is ON-ON-OFF-OFF, with each step representing ne field. The image sequence f the LED has the sequence ON-ON-ON-OFF, since

the LED will appear ff nly when it is physically ff fr the previus tw fields. As a result, a single LED is insufficient fr unique field identificatin in Frame mde. Frtunately, we can achieve this gal by using 2 LEDs, with exactly ne n at a time. The resulting image sequence pattern in (ON,ON)-(ON,OFF)- (ON,ON)-(OFF,ON). With this pattern, we can identify Field 1 by bth LEDs n and Field 2 by nly 1 LED n. The actual cunter, then, cntains 4 large 7-segment digit displays t shw the secnds and the frames, 2 LEDs flashing at half frame rate each and with exactly ne turned n at a time, and 1 LED flashing at frame rate, as shwn in Figure 5. Figure 6 shws the timing f the cunter display fr the LEDs and f the images f the LEDs cllected in Field mde. LEU! is flashing at frame rate -- n fr ne field and ff fr the next. The image f LEDi is represented by I-LED^ which can be used t distinguish the fields: n during ne field, ff during the ther field. Figure 7 shws the timing f the cunter display fr the LEDs and f the images f the LEDs cllected in Frame mde. I-LEDj, again the image f LEU! flashing at frame rate, is seen t be n all the time, even thugh the LED is ff during half the fields. LED 2 and LED 3 are flashing at half frame rate, with exactly ne n at any time, and their images are shwn as I-LED 2 and I-LED 3. Nte the 4-step repeating pattern: n-n, n-ff, n-n, ff-n. Secnds Frames 1 LED, frame rate flashing 2 LEDs, half frame rate flashing, exactly ne n Figure 5: Graphical representatin f cunter used in synchrnizatin tests. LED! ; Fld-1 ; Fld-2 ; Fld-1 ; Fld-2 ; Fld-1 ; Fld-2 ; Fld-1 ; Fld-2 ; Off On LED 2 LED 3 I-LED j I-LED 2 I-LED 3 Frame N Frame N+l '. Frame N+2 : Frame N+3 : Figure 6: LED peratin in Field integratin mde Figure 8 cntains 4 cnsecutive fields (mving left t right in the figure) frm each f tw cameras watching the synchrnizatin cunter. The images in the tp rw f the figure cme frm a camera in Field mde, while the bttm images cme frm a camera in Frame mde. Each image cntains tw numbers, the 4- digit number frm the cunter as described abve and an 8-digit number, beginning with the letter T, representing the interpreted VITC time cde. Fr the Field mde camera, the nly difference between the fields f the same frame is the frame rate LED, which is n fr Field 1 and ff fr Field 2. In the Frame mde camera, tw differences are apparent. As desired, Field 1 has bth f the half frame rate LEDs n, while

Fld-1 Fld-2 Fld-1 Fld-2 Fld-1 Fld-2 Fld-1 Fld-2 LED! LED 2 Off On LED 3 I-LEDj On I-LED 2 '> I-LED3 ' < Frame N : Frame N+l : Frame N+2 : Frame N+3 Figure 7: LED peratin in Frame integratin mde Field 2 has nly a single 15-Hz LED n. In additin, the numbers in the 7-segment displays are blurred during Field 1 because that field captures the transitin between numbers. Fr the first image in the lwer rw, the transitin is between 0928 and 0929. Fr the third image in that rw, the transitin is ne f the wrst: 0929 t 1000, which blurs all 4 digits. In bth Field 2 images, thugh, the numbers are stable and therefre can be read crrectly, allwing accurate determinatin f the synchrnizatin. In this case, the system is fully synchrnized. IJ'l >-"'- m l«.-g^-^hr-:yss»k'ii \O c J'2 c l''' :. t-' ' lli : :D B;ay^y>'"^^^»*'-',^;'!r;^ *w «A Ot» t ses ä S3 «=8 EKEES'S *=? *? ^ T 83 83 = 83: *rr SDB K3t - KX S3S - KS «31 f.""^«ff H: r.»sli :»: ~f ~tr K3B STS ^ m 77 = E3B Figure 8: Verificatin Images. The tp rw cntains 4 cnsecutive fields frm a camera in Field mde, while the bttm rw cntains the crrespnding fields frm a camera in Frame mde. In an actual crss-camera synchrnizatin test, we d nt require simultaneus views f the cunter, but rather we cmpare its secnds and frames cunt t the secnds and frames f the VITC. The ffset shuld remain cnstant s lng as the system is fully synchrnized, s ur test is t cmpare this ffset acrss all cameras. Since the self-synchrnizatin f the camera - that is, vide t VITC ~ has already been cmpleted in the first test, this type f test is sufficient t guarantee synchrnizatin. Figure 9 shws single fields frm three different cameras during the same synchrnizatin test. In this example, the cunter is always 1 frame advanced relative t the VITC, verifying system synchrnizatin.

Figure 9: Images used t validate system synchrnizatin. 6 Applicatins as an Enabling Technlgy In certain dmains, applying ur system des little mre than ease cnstraints such as shrt recrding length. Mre interesting, thugh, are the applicatins that were effectively infeasible withut the technlgy represented by ur system. One such applicatin is Virtualized Reality (V-ized Reality) [3] [4]. The fundamental gal f V-ized Reality is t immerse the user in a full 3D visual recnstructin f real events. Imagine, fr example, being able t watch the NBA champinships frm any seat in the arena ~ r even frm the'curt itself ~ r a medical student bserving a V-ized Reality replay f the latest surgical technique t repair damaged heart muscle. These are just tw f the many ptential applicatins f such a system. V-ized Reality seeks t autmatically cnstruct mdels f the real wrld that allw accurate recnstructin f any view, even views fr which n physical camera exists. Virtual Reality (VR), n the ther hand, uses hand-made artificial wrlds that rarely bare much resemblance t the real wrld. In bth cases, the mdels f the wrld are used t cnstruct synthetic images frm a "sft" camera whse psitin is cntrlled by the user. Because V-ized Reality relies n real images fr recnstructin, the visual realism in the synthetic images is far superir t thse in VR systems. In additin, V-ized Reality easily prcesses dynamic scenes, prducing natural, cnsistent views f bjects in mtin. VR, n the ther hand, has great difficulty accurately prtraying mtin because f the need fr the dynamic mdels f the mving bjects. T achieve the accurate visual reprductin f real scenes, V-ized Reality uses a large number f synchrnized image sequences f the scenes. Given a single time instant, the system recnstructs the shape f the scene by applying a multi-baseline stere algrithm [5]. With the image sequences and with the cmputed scene structure, V-ized Reality uses simple cmputer graphics techniques t generate synthetic views f the scene. The scene structure is rendered as a 3D triangle mesh, while the images are texture mapped n the mesh t give visual realism t the rendered scenes. Having many views f the scene allws the system t mre accurately reprduce the realistic interactin f sft camera psitin with scene structure and lighting. In rder t wrk prperly, the V-ized Reality system requires synchrnized image sequences sampled up t frame rate fr a large number f cameras. The stere cmputatin requires the synchrnized images f the scene. Withut prper synchrnizatin, the crrespndences amng the images are meaningless, since the structure f the scene may have changed between the samples. Secnd, dynamic scenes at times require acquisitin f every frame in the vide streams t accurately capture the mtin. With t few samples, scene mtin appears mre like a cllectin f randm images than like the dynamic scene frm which they came. Finally, the realism f V-ized Reality images increases as the number f cameras increases, s it is desirable t have a large number f cameras. Withut enugh cameras, much f the scene may be ccluded, ptentially leaving large gaps in the scene mdels. The Virtualizing Studi nw cntains 51 cameras munted n a dme and pinted inside the dme, as shwn in Figure 10. As discussed earlier, ther methds f synchrnus image sequence capture d exist, but even at $15,000 per channel, the recrding system alne wuld cst mre than $750,000. While such a recrding system was technically feasible, the cst was t high t be practical, especially fr a research

prject with n guarantee f success. The develpment f a lw-cst alternative, then, allwed the V-ized Reality cncept t cme ff the drawing bard and int a real research lab. H I *,*-"i Figure 10: The V-ized Reality research platfrm, cnsisting f a dme with 51 cameras munted n it and 51 VCRs that recrd the vide frm the cameras. Several ther research prjects have als benefited frm ur recrding system. One prject at the University f Maryland uses 3D mdels f humans t track and t recgnize human mtin [2]. They merge infrmatin frm multiple views f dynamic scenes t find the 3D bdy pse at each time instant. This prcess requires synchrnized image sequences and capture f every frame in the vide streams in rder t perfrm prperly. A grup at Carnegie Melln University seeks t develp assembly plans fr rbts t fllw by bserving a human perfrm the same actin. Again, the system needs dense sampling f synchrnized vide streams in rder t crrelate the infrmatin frm multiple views. This grup has replicated the recrding system in their wn lab, using the digitizing hardware in ur system as needed ~ an effective way t reduce csts in multi-prject envirnments. A third prject, jintly run at University f Pittsburgh and Carnegie Melln University, is develping methds t recgnize and analyze human facial expressin. Multiple cameras prvide infrmatin n the shape, size, and pse f the persn's head in additin t multiple perspectives f the expressins. Withut synchrnizatin, the expressins in the images wuld nt crrelate, and may even cnflict. This grup has als duplicated the recrding hardware in its lab and uses ur digitizing system as needed. 7 Cnclusins We presented a nvel and ecnmical system t synchrnusly capture every user specified frame r field f an arbitrary number f vide streams. Each stream is synchrnized t a cmmn signal and stred n a vide tape with VITC time cde inserted int each field. These tapes are digitized ff-line, identifying the needed frames by interpreting the VITC time cde n-line using a cmmercial frame grabber. The userspecified frames can be grabbed autmatically using a cmputer-cntrlled VCR. The autmatic digitizing system csts $7000 fr the first vide stream f which $5000 is fr the VCR. A system that relies n manual cntrl f the VCR culd use a lw-cst VCR like thse in the recrding system, r culd even re-use ne f the recrding channels fr playback during digitizatin. Additinal recrding channels cst $500, fr a VCR and the VITC inserting equipment. Finally, the system in enabling research int applicatins cnsidered infeasible withut lw-cst synchrnized recrding.

8 References 1 M. J. Frankel and J. A. Webb. Design, Implementatin and Perfrmance f a Scalable Multi-Camera Interactive Vide Display System, Prceedings f Cmputer Architectures fr Machine Perceptin, Cm, Italy, September 1995. 2 D.M. Gavrila and L.S. Davis. 3-D Mdel-based Tracking f Human Upper Bdy Mvement: a Multi- View Apprach, IEEE Sympsium n Cmputer Visin, Cral Gables, U.S.A., Nv 1995. 3 T. Kanade, P. J. Narayanan, and P. W. Ränder. Virtuak'zed Reality: Cncept and Early Results, IEEE Wrkshp n the Representatin f Visual Scenes, Bstn, June, 1995. 4 T. Kanade, P. J. Narayanan, and P. W. Rander. Virtuah'zed Reality: Being Mbile in a Visual Scene, Internatinal Cnference n Artificial Reality and Tele-Existence and Cnference n Virtual Reality Sftware and Technlgy, Japan, Nv 1995. 5 M. Okutmi and T. Kanade. A multiple-baseline stere, IEEE Transactins n Pattern Analysis and Machine Intelligence, 15(4):353-363,1993. 6 Sciety f Mtin Picture and Televisin Engineers. American Natinal Standard fr Televisin - Time and Cntrl Cde, SMPTE Jurnal, June 1986. 10

Synchronous Capture of Image Sequences from Multiple Cameras. P. J. Narayanan, Peter Rander, Takeo Kanade CMU-RI-TR-95-25