Region-based Temporally Consistent Video Post-processing

Region-based Temporally Consisen Video Pos-processing Xuan Dong Tsinghua Universiy dongx1@mails.singhua.edu.cn Boyan Bonev UC Los Angeles bonev@ucla.edu Yu Zhu Norhwesern Polyechnical Universiy zhuyu1986@mail.nwpu.edu.cn Alan L. Yuille UC Los Angeles yuille@sa.ucla.edu Absrac We sudy he problem of emporally consisen video pos-processing. Previous pos-processing algorihms usually eiher fail o keep high fideliy or fail o keep emporal consisency of oupu videos. In his paper, we observe experimenally ha many image/video enhancemen algorihms enforce a spaially consisen prior on he enhancemen. More precisely, wihin a local region, he enhancemen is consisen, i.e., pixels wih he same RGB values will ge he same enhancemen values. Using his prior, we segmen each frame ino several regions and emporallyspaially adjus he enhancemen of regions of differen frames, aking ino accoun fideliy, emporal consisency and spaial consisency. User sudy, objecive measuremen and visual qualiy comparisons are conduced. The experimenal resuls demonsrae ha our oupu videos can keep high fideliy and emporal consisency a he same ime. 1. Inroducion The consumpion of videos is increasing dramaically in video sreaming and surveillance sysems. This resuls in mass demands for video enhancemen of exposure, color, conras, ec. In compuer vision, here exis many image enhancemen algorihms such as exposure correcion [22], color grading [4], ec. Their enhancemen effecs are very impressive, and hey are used in many video applicaions and sysems such as video ediing sofwares like Adobe Premiere (Pr), mobile phone apps like Insagram, ec. However, here are usually significan flickering arifacs when performing video enhancemen, or image enhancemen mehods frame by frame for videos, due o lack of buil-in emporal consisency. To remove hese arifacs is non-rival because hey have a profound effec on he visual qualiy. In addiion, in pracical sysems, we usually only have access o he inpu videos and he original enhancemen videos (wih flickering arifacs), and do no know or canno have access o he enhancemen algorihms. For example, 1) he enhancemen algorihms of indusrial sofwares are no known o he public, like Pr and Insagram. 2) For embedded/hard-ware enhancemen algorihms, he device may no provide inerfaces o revise he algorihms for emporal consisency. 3) In pracical developmen of a sofware or an applicaion for video ediing, several enhancemen algorihms may be required which are all differen. So designing a emporally consisen mehod for each separae algorihm will be ime-consuming. In such cases, i is desirable o do emporally consisen enhancemen as pos-processing, by simply analyzing he inpu videos and original enhancemen videos. In his paper, we sudy he problem of emporally consisen video pos-processing when he original inpu and enhancemen videos are available. The goal is o keep boh emporal consisency and fideliy of he oupu videos. 1) Temporal consisency means ha for he same objecs in d- ifferen frames, he enhancemen should be consisen. 2) Fideliy means ha he final resuls should have similar effecs as he original enhancemen videos. In he oher words, he oupu frames should be similar wih he original enhancemen resuls a non-flickering frames, and he objecs in flickering frames should be adjused referred o he corresponding objecs in non-flickering frames. The challenges of he problem include: 1) he original enhancemen mehods are unknown and canno be revised a all, 2) moions of videos are complicaed, 3) he mehod should be able o remove flickering arifacs caused by differen enhancemen mehods. We discover experimenally he spaially consisen enhancemen (SCE) prior which is valid for many leading image enhancemen mehods, including Pr auo color, auo level, auo conras, exposure correcion [22], and color grading [2] [4]. The prior is based on he observaion ha in a local region, image enhancemen mehods end o 1

Original Inpu Frames {I} and Original Enhancemen Frames {E} Region-Based Temporally Consisen Pos-Processing Region Segmenaion Esimae he Correspondence beween Regions Adjus he Enhancemen of Regions Oupu Frames {O} Figure 1. Pipeline of our region-based emporally consisen video pos-processing. keep he enhancemen values consisen for pixels wih he same rgb values. Based on his prior, we propose a regionbased emporally spaially consisen adjusmen mehod. The pipeline is shown in Fig. 1. The inpus include he original inpu frames and he original enhancemen frames. 1) Based on he prior, each frame is segmened ino several regions. 2) Corresponding regions beween differen frames are esimaed. 3) a Markov Random Field (MRF) opimizaion model is used o adjus he enhancemen of regions of all frames. The advanage of he proposed algorihm includes ha: 1) i can pos-process flickering resuls of any enhancemen mehod as long as i enforces he SCE prior, 2) our resuls can keep high fideliy and 3) emporal consisency. The conribuions in his paper are as follows: 1) The SCE prior is discovered and experimenally proved. 2) A region-based emporally spaially enhancemen algorihm is proposed o pos-process videos wih flickering arifacs. Experimenal resuls show ha our proposed algorihm performs beer han he frame-wise enhancemen algorihms including Pr auo color, auo level, auo conras, exposure correcion [22], color grading [2][4] and he emporally consisen video adjusmen algorihms including [4] [6] [8] [11] in user sudy, objecive and visual qualiy comparisons. 2. Relaed Works For energy funcion based image enhancemen mehods, in [16] [12], a emporal erm is added ino he original energy funcion for emporal consisency. In [15], properly designed filers are used o subsiue for he energy minimizaion process, so as o accelerae opimizaion driven mehods. [5] exends he 2-D image filer o a 3-D emporalspaial filer. The limiaion of hese mehods is ha hey have o revise he original enhancemen algorihm, so hey canno be direcly used for his paper s problem. [4] [18] [14] [13] [1] [9] [3] firs enhance videos frame by frame. Then, based on he characerisics of he known enhancemen algorihms, hey propose differen mehods o deec and remove he flickering frames. The limiaion of hem is ha hey assume he enhancemen is a global ransformaion. So, hey canno keep fideliy of he oupu videos for local enhancemen algorihms. [21] do no use he global enhancemen assumpion. They firs esimae correspondence beween frames, and hen emporal- Average region area Averege area of segmened regions wih differen segmenaion hreshold T 7 Exposure correcion ECCV212 Color grading SIGGRAPH213 6 Color grading SIGGRAPH26 Premiere auo level Premiere auo color Premiere auo conras 5 4 3 2 1 35 36 37 38 39 4 41 42 Segmenaion hreshold T PSNR Averege reconsrucion qualiy wih differen segmenaion hreshold T 46 44 42 4 38 36 Exposure correcion ECCV212 Color grading SIGGRAPH213 Color grading SIGGRAPH26 Premiere auo level Premiere auo color Premiere auo conras 34 35 36 37 38 39 4 41 42 Segmenaion hreshold T Figure 2. Average square roo of he area and he reconsrucion qualiy of segmened regions wih differen hreshold T.The enhancemen algorihms include Pr auo level, auo color, auo conras, exposure correcion [22], color grading [2] [4]. ly filering mached pixels. For unmached pixels, a reflecance compleion algorihm is proposed o blend hose pixels wih neighboring mached pixels. The limiaion is ha he reflecance compleion algorihm is specifically designed for he problem of separaing images ino shading and reflecance layers, and canno be direcly used for oher enhancemen algorihms. There are some emporally consisen pos-processing mehods for unknown original enhancemen algorihms, including [8],[19],[6],[11]. All of hem only use he original enhancemen videos wih flickering arifacs for emporally consisen enhancemen. Bu we propose o make use of boh original inpu and enhancemen videos. They firs propose differen algorihms o find sparse correspondence beween frames. For mached pixels, he pixel values are emporally filered. For unmached pixels, global ransformaion is used according o mached pixels. Because hey are designed for global enhancemen mehods, heir resuls will have low fideliy and canno keep emporal consisency perfecly if he enhancemen mehods are local. In addiion, in [6], he errors of enhancemen are accumulaed over ime, and he key frames selecion is no adapive. In [11], he enhancemen curve is esimaed by a smooh piecewisequadraic spline wih 7 knos a (,.2,.4,.6,.8,1). When he esimaion of he 7 knos has some errors, he spline will enlarge he errors o he whole dynamic range. The comparisons of [8],[4],[6],[11] wih our proposed algorihm are shown in Sec. 5. 3. Spaially Consisen Enhancemen Prior In his secion, firs, we mahemaically describe he SCE prior, i.e., wihin a local region, he enhancemen of pixels is consisen. Second, since he regions ha enforce prior vary a lo in size, shape, and locaion for differen images and differen enhancemen algorihms, we propose a region segmenaion mehod o ge he regions wih he SCE propery. Third, he prior is experimenally verified by he segmenaion resuls.

Figure 3. Example of region based reconsrucion. Lef o righ: inpu map I, original enhancemen resul E using exposure correcion [22], super pixels segmenaion resul [1], regions merging resul, reconsrucion resul R, and absolue difference beween E and R (enlarged in 5 imes). 3.1. Descripion of he SCE prior For an original enhancemen mehod F o enhance an inpu image I, if a segmened region i is given, he o- riginal enhancemen resul can be wrien as E i (x) = F (x, I(x)),x i, where E i is he original enhancemen resul of region i. x is he pixel belonging o region i. The SCE prior is based on he observaion ha a local regions wihin he same objec/scene, many image enhancemen mehods end o keep he enhancemen values he same or very similar for pixels wih he same rgb values. In oher words, wihin a local region i, here will exis an enhancemen curve α i o reconsruc he region i. α i s independen variable is only he inensiy of he pixels and he reconsrucion resuls should be very similar wih he original enhancemen resuls, i.e., E i α i (I(x)),x i. We define R i (x) =α i (I(x)),x i, (1) where R i is he reconsrucion resul of region i using α i. Borrowing he concep of reconsrucion qualiy from video coding, we use Peak Signal-o-Noise Raio (PSNR) o measure he similariy beween R i and E i, i.e., RQ i = PSNR(R i,e i ), (2) where RQ i is he reconsrucion qualiy of region i. According o he prior, he reconsrucion qualiy RQ i should be very high for good enhancemen. 3.2. Region segmenaion Since he regions ha enforce he SCE prior vary a lo in size, shape, and locaion for differen images and differen enhancemen algorihms, we propose a region segmenaion mehod o find hese regions. Firs, for a given region i, we verify wheher he region enforces he SCE prior. To do so, we use sandard hisogram maching [2] o esimae he curve α i for region i. Inhisogram maching, he hisograms H I and H E of I and E i wihin region i are compued. Then, using H I and H E he cumulaive, disribuion funcions C I and C E are compued. Nex, for each gray level G 1 [, 255], we find he gray level G 2,forwhichC I (G 1 )=C E (G 2 ), and his is he resul of hisogram maching funcion: M(G 1 )=G 2.RGB channels are compued respecively o form he reconsrucion funcion α i. If he enhancemen is consisen wihin he region i, hisogram maching can ge he correc esimaion of he ruh enhancemen curve. Thus, using Eq. (1) and Eq. (2), we can ge high RQ i. Here, we se a hreshold T. If RQ i >T, he region is seen as enforcing he SCE prior. Second, we propose o merge neighboring regions o see how large he region can be wih RQ i >T. To begin wih, we segmen he inpu images ino a se of super pixels using he SLIC algorihm [1] because of is simpliciy and speed. Then, we ry o merge each pair of neighboring super pixels. For neighboring super pixels i and j, we esimae he enhancemen curve of heir merged region i j using hisogram maching, reconsruc he enhancemen resul using Eq.(1), and compue he reconsrucion qualiy using Eq. (2). If RQ i j >T, hese wo super pixels will be merged ogeher. Oherwise, hey will no be merged. The merging is done ieraively unil none of he neighboring regions can be merged ogeher. An example is shown in Fig. 3. The super pixels segmenaion resul is obained by segmening he original inpu image using SLIC [1]. The segmenaion figure shows ha mos of he regions cover a large area and he reconsrucion resul looks very similar o he original enhancemen resul. Even if we enlarge he difference beween he original enhancemen resul and he reconsrucion resul in 5 imes, he difference is sill small. 3.3. Verificaion of he SCE prior We collec a se including 3 inpu and original enhancemen resul pairs from image and video resuls. The images are from published paper s experimenal resuls and search engines. The videos are from he mos popular movies. The enhancemen mehods we es include Pr auocolor, auo-level, auo-conras, exposure correcion [22], color grading [2] and [4]. The images/videos are resized o 64 48. To verify how good he prior is, we use he proposed segmenaion mehod wih differen T o segmen he images. Then, we compue he average area and reconsrucion qualiy for all regions of he images. Figure 2 shows he resul. We can see ha wih he increase of T, he average area for each region will decrease and he average reconsrucion qualiy will increase due o he increase of he reconsrucion qualiy requiremen. The average area and PSNR of Pr enhancemen algorihms is larger han he oher hree algo-

rihms when T is small. The reason is ha Pr algorihms are more like global adjusmen mehod while he oher hree algorihms are relaively more local. Alhough he average area varies, all he algorihms have high average area (more han 6 6) for mos T. This gives very srong evidence for he SCE prior. A he same ime, all of he algorihms can ge reconsrucion qualiy higher han 37 db. This is a relaively high reconsrucion qualiy and he visible loss is no big. 4. Region-based Temporally Spaially Consisen Adjusmen We segmen each frame ino several regions and esimae he original enhancemen curves of he regions using he SCE prior, esimae correspondence relaionships beween regions in differen frames, opimize he original enhancemen curves of regions o ge emporally consisen enhancemen curves of regions, and reconsruc emporally consisen frames using he opimized curves. The moivaion of our algorihm are ha 1) according o he SCE prior, each segmened region can be reconsruced wih an enhancemen curve, and he reconsrucion can keep high fideliy. In addiion, 2) correspondence relaionships of regions in differen frames are easier o be found because hey only need correspondence of sparse pixels insead of dense correspondence. This can reduce he requiremen of moion esimaion accuracy. 4.1. Region segmenaion For each frame, we propose o segmen each frame ino differen regions wih he principle ha 1) here should exis an enhancemen curve for each region o reconsruc he region wih high reconsrucion qualiy, 2) he regions area should be as large as possible so as o reduce he requiremen of moion esimaion requiremen. We use he segmenaion mehod in Sec. 3 because i has been verified o be good for differen enhancemen algorihms. The se of segmened regions for all frames are denoed as Ω R, and Ω R = {i : =1..M, i =1..N()}, where i is he region i of frame, M is he oal frame number, N() is he oal region number in frame. Afer geing Ω R, he enhancemen curve of each region is esimaed using hisogram maching. 4.2. Esimaing correspondence beween regions We firs compue dense correspondence of pixels beween neighboring frames using SIFT Flow [17] because of is accuracy. Then, we link he corresponding pixels over d- ifferen frames o ge he moion of a scene poin over ime, and call i a moion pah. Any wo pixels along he same moion pah are seen as corresponding pixels. Alhough SIFT Flow is designed for dense correspondence esima- Figure 4. Opical flow esimaion among all frames. Figure 5. Temporal-spaial belief propagaion. The proposed objecive funcion akes ino accoun he emporal erm, spaial erm, and daa erm. ion, due o incorrec esimaions and occlusions, he moion esimaion resuls of some pixels are ouliers. To keep he accuracy of esimaed moion pahs, we propose o measure he confidence of he pixels along he moion pahs. I is measured by he disance in SIFT field. If a pixel has he confidence value larger han a hreshold T SIFT, i is deeced as ouliers and he moion pah will sop a his pixel. As a resul, he correspondence pixels beween wo frames become sparse. To avoid cumulaive errors during linking moion vecors frame by frame, for each frame, as shown in Fig. 4, we esimae is correspondence wih no only he neighboring frames, bu also he frames whose inervals are k. When wo frames have large ime inervals, he linking can skip k 1 frames. Afer he esimaion of sparse pixels correspondence beween frames, we esimae corresponding regions in differen frames. If wo regions in differen frames have corresponding pixels, hey are marked as corresponding regions. Oherwise, hey are marked as un-corresponding regions. { 1, if i 1,j 2 have corresponding pixels χ(i 1,j 2 )= (3), oherwise χ defines wheher he region i of frame 1 and he region j of frame 2 are corresponding regions. In addiion, for each region, we could find he se of is corresponding regions in all frames, i.e., C(i 1 1 )={i : χ(i 1 1,i )=1,i Ω R }, where i 1 1 is he region i 1 of frame 1, C(i 1 1 ) defines he se of is corresponding regions in differen frames. Ω R is he se of all of he regions in all frames. 4.3. Region-based emporally spaially opimizaion We propose a region-based emporal spaial opimizaion mehod o adjus he enhancemen curves of regions. Our goal is ha 1) for regions in non-flickering frames, he adjused enhancemen curves should be he original enhancemen curves α. And 2) for regions in flickering frames, he

adjused enhancemen curves should be one of he curves of he regions in non-flickering frames. To achieve he goal, for each region, we le i pick one enhancemen curve from is corresponding regions and iself, i.e., he soluion space for each region is he curves of is corresponding regions and iself, i.e., u(i ) C(i ). No maer wheher he region belongs o flickering or non-flickering frames, he desired curve is wihin he soluion space. The number of corresponding regions for each region depends on he video conens. In our experimens, he corresponding regions for each region is abou several hundred on average. The soluion space is no big and i could help avoid unnaural enhancemen curves. We define he soluion space for all regions of all frames as U, and U = {u(i ):i Ω R,u(i ) C(i )}, where u(i ) is he picked corresponding region of i. The adjusmen of enhancemen curves of regions is modeled as a MRF problem, as shown in Fig. 5. The n- odes of he MRF are he regions of differen frames. The opimizaion problem is o selec one of he corresponding regions for each region so ha he enhancemen curve of he seleced region can help ge he minimum energy coss under he MRF consrains. Afer geing he opimized corresponding regions relaionships u (i ) for all regions in all frames, we ge he opimized enhancemen curve αi of any region i as α u(i ), and use hem o reconsruc each frame using Eq. (1). The opimal soluion U of he M- RF is obained by U =argmine(u), and he objecive U funcion E is defined as: E(U) = [E daa (u(i )) + λ 1 E emporal (u(i )) i Ω R (4) +λ 2 E spaial (u(i ))], where he variable u(i ) is he picked corresponding region of i, he daa erm E daa aims a keeping fideliy of regions in non-flickering frames, he emporal erm E emporal aims a keeping emporal consisency of regions in flickering regions. Alhough he neighboring regions have differen enhancemen curves, we propose he spaial erm E spaial o keep he difference of heir enhancemen consisen, so as o avoid spaial inconsisen enhancemen. λ 1 and λ 2 are he weighs of he emporal and spaial erms, respecively. In deail, E daa (u(i )) = α u(i ) α i 2 2, (5) where α i is he original enhancemen curve for region i. By making he opimized curves as similar as he original enhancemen curves, he daa erm can keep he opimal enhancemen as similar as he original enhancemen for nonflickering frames so as o keep high fideliy. E emporal (u(i )) = i 1 1 C(i ) w emporal i,i 1 1 α u(i ) α i1 1 2 2, (6) where i 1 1 is he corresponding regions of region i. w emporal i,i 1 1 is he emporal weigh beween region i and i 1 1. And w emporal CP(i i,i 1 1 =,i 1 1 ), where i C(i ) CP(i,i ) CP(i,i 1 1 ) is he number of corresponding pixels beween region i and i 1 1. In he emporal erm, we make each pair of corresponding regions similar wih each oher o keep emporally consisen enhancemen. This can achieve emporally consisen adjusmen of regions curves in flickering frames. E spaial (u(i )) = j Ω N (i ) w spaial i,j β u(i ) β u(j ) 2 2, (7) where region j is he neighboring region of region i, Ω N (i ) is he se of neighboring regions of region i, and β (i ) is he fied global curve of frame, and he spaial weigh w spaial i,j = RA(j ) RA(j 1 ), where RA(j ) is he j 1 ΩN (i ) area of he region j. For each frame, we fi a global curve β o measure he enhancemen. For neighboring regions, we keep he frames curves of seleced regions as similar as possible so as o keep he enhancemen difference of neighboring regions consisen, even if he frame curve iself may have a low reconsrucion qualiy. We use Loopy Belief Propagaion [7] o esimae U for he MRF. Afer geing he opimized corresponding regions relaionships u (i ) for all regions in all frames, we can ge he opimized enhancemen curve αi of any region i as α u(i ), and use hem o reconsruc each frame using Eq. (1). 5. Experimenal Resuls There are 6 original enhancemen algorihms including Pr auo color, auo level, auo conras, exposure correcion [22], color grading [2] [4]. In he original enhancemen of color grading [4], we only use he image enhancemen par in [4] o produce he original enhancemen resuls. Besides he original enhancemen resuls of he 6 algorihms, we also compare wih he resuls of he emporally consisen enhancemen algorihms including he mehods in [6], [4], [8], and [11]. Here, we use boh he image and video enhancemen pars in [4] o produce he resuls of he mehod. For he algorihm in [6], since he key frame selecion mehod is flexible, in our experimen, we have wo choices: 1) single key frame, where he firs frame of he video is chosen as he key frame, and 2) muliple key frames, where he firs frame

Pr auo level Pr auo color Pr auo conras 18 16 14 12 1 8 6 4 2 9 8 7 6 5 4 3 2 1 1% Frame wise enhancemen Ours No preference 1% Frame wise enhancemen Ours No preference 8 % 8 % 6 % 6 % 4 % 4 % 2 % 2 % % % Exposure correcion 12 Color grading 6 Color grading 13 9% Comparison algorihm Ours No preference 8% 7% 6% 5% 4% 3% 2% 1% % ICCP213 SIG213 Hacohen SIG213 Bonneel SIG211 single SIG211 muliple Figure 6. User sudy resuls. Pairwise comparison of ours agains original enhancemen, i.e., Pr auo color, auo level, auo conras, exposure correcion [22], color grading [2], color grading [4], and video pos-processing algorihms, i.e., [6],[4],[8],[11]. Each color bar shows he average percenage of favored video. (a) Two example frames from a video. Top o boom: original inpu frames, original enhancemen resuls, enhancemen resuls of he algorihm in [8], and our resuls. Firs column: a non-flickering frame of he video. Second column: a flickering frame of he video. The region in he red box shows unwaned enhancemen. In he 3h and 4h rows, we also show he enlarged difference beween he reference frame and he enhancemen frame of he red box region(enlarged in 5 imes). Original enhancemen SIG213 Hacohen 2 4 6 8 1 12 14 (b) Objecive resuls of he same video. Figure 8. Comparison wih he algorihm in [11] inprauolevel. (a) Two example frames from a video. Top o boom: original inpu frames, original enhancemen resuls, enhancemen resuls of he algorihm in [8], and our resuls. Firs column: a nonflickering frame of he video. Second column: a flickering frame of he video. The region in he red box shows unwaned enhancemen. Frame wise ICCP213 2 4 6 8 1 12 14 16 18 2 (b) Objecive resuls of he same video. Figure 7. Comparison wih he algorihm in [8] in exposure correcion enhancemen [22]. The video is in he supplemenary maerial. of every 3 frames is chosen as he key frame. There are 1 inpu videos for each original enhancemen algorihm, i.e., 6 videos in oal. The inpu videos are from movie clips and everyday life videos aken by our friends. Please see he enhancemen videos of differen enhancemen algo- rihms and our algorihm in he supplemenary maerial. 5.1. User sudy We invied 12 voluneers (7 males and 5 females) o perform pairwise comparison beween our resul and he original enhancemen resul as well as he resul of he emporally consisen enhancemen algorihms. For each pairwise comparison, he subjec had hree opions: beer, or worse, or no preference. Subjecs were allowed o view each video clip pair back and forh for he comparison. To avoid he subjecive bias, he order of pairs, and he video order wihin each pair were randomized and unknown o each subjec. This user sudy was conduced in he same seings (room, ligh, and monior). The user sudy resuls are summarized in Fig. 6. Each color bar is he averaged percenage of he favored video over all 12 subjecs. From he resuls, we can see ha hey prefer our resuls o he resuls of he original enhancemen and he emporally consisen enhancemen algorihms in [6], [4], [8], and [11]. 5.2. Visual qualiy and objecive comparisons In order o objecively measure he performance of differen emporally consisen algorihms, we selec a small

12 1 8 6 4 2 7 6 5 4 3 2 1 (a) Two example frames from a video. Top o boom: original inpu frames, original enhancemen resuls, enhancemen resuls of he algorihm in [4], and our resuls. Firs column: a nonflickering frame of he video. Second column: a flickering frame of he video. The region in he red box shows unwaned enhancemen. (a) Two example frames from a video. Top o boom: original inpu frames, original enhancemen resuls, enhancemen resuls of he algorihm in [4], and our resuls. Firs column: a nonflickering frame of he video. Second column: a flickering frame of he video. The region in he red box shows unwaned enhancemen. Frame wise SIG213 Bonneel Original Enhancemen SIG213 Bonneel 5 1 15 2 25 (b) Objecive resuls of he same video. 5 1 15 2 25 3 (b) Objecive resuls of he same video. Figure 9. Comparison wih he algorihm in [4] in color grading enhancemen [4]. [8] [11] [4] [6] single [6] muliple μ μ our 1.14 1.23 1.53 7.7 3.5 3.8 7.43 38.8 8 4 σ 2 σ 2 our Table 1. The average raios of he mean of, i.e., μ our,and σ he variance of, i.e., 2, beween he comparison mehods σ 2 our and our mehod of all videos. The comparison mehods include he emporally consisen enhancemen algorihms in [8], [11], [4], and [6] wih single key frame and muliple key frames. daa se of 3 sequences. Wihin each seleced sequence, here are some objecs which exis in mos of he frames. We selec one non-flickering frame as he reference frame, and in he reference frame we manually mark ou hose objecs ha exis in mos frames. Then we align each frame o he reference frame and compue he difference beween he reference frame and he aligned frame wihin he marked region, quanified by mean squared error (). The mean of can indicae he performance for fideliy and he variance of can indicae he performance for emporal consisency. For each video, he raio of he mean and variance of beween he comparison mehod and our mehod is compued. The average raios of he mean of μ Figure 1. Comparison wih he algorihm in [4] in Pr auo color enhancemen. μ σ, i.e., μ our, and he variance of, i.e., 2 σ,ofall 2 our videos are shown in Table 1. As shown, he average raios of mean and variance of beween he comparison algorihm and our mehod are always higher han 1, which indicae our resuls have beer fideliy and emporal consisency. The videos and more comparisons are shown in he supplemenary maerial. Fig. 7 shows he comparison of our algorihm and he algorihm in [8]. The big disconinuiies of he original enhancemen in he objecive resul indicae ha here are many flickering arifacs. The algorihm in [8] does no remove hem perfecly, because hey esimae a global adjusmen for he original resul, bu he exposure correcion [22] is a local algorihm. Our mehod performs well because he reconsrucion is region-based and does no require he o- riginal enhancemen algorihm o be global. Fig. 8 shows he comparison of our algorihm and he algorihm in [11]. The algorihm in [11] does no remove flickering arifacs perfecly, since heir adjusmen curve is a spline wih 7 knos and when he 7 knos has some errors, he errors will be enlarged o he whole dynamic range. Alhough he difference is no very large, for videos, he difference can be easily noiced due o emporal changes of he

12 1 8 6 4 2 7 6 5 4 3 2 1 (a) Two example frames from a video. Top o boom: original inpu frames, original enhancemen resuls, enhancemen resuls of he algorihm in [6] wih muliple key frames, and our resuls. Firs column: a non-flickering frame of he video. Second column: a flickering frame of he video. The region in he red box shows unwaned enhancemen. (a) Two example frames from a video. Top o boom: original inpu frames, original enhancemen resuls, enhancemen resuls of he algorihm in [6] wih single key frame, and our resuls. Firs column: a non-flickering frame of he video. Second column: a flickering frame of he video. The region in he red box shows unwaned enhancemen. Frame wise SIG211 muliple Original Enhancemen SIG211 single 2 4 6 8 1 12 14 16 18 (b) Objecive resuls of he same video. 2 4 6 8 1 12 14 (b) Objecive resuls of he same video. Figure 11. Comparison wih he algorihm in [6] wih muliple key frames mehod in color grading enhancemen [2]. same objecs in very shor ime. The comparison wih he algorihm in [4] isshownin Fig. 9 and 1. Their algorihm fails o remove he longerm flickering arifacs due o heir assumpion ha in he flickering periods he changes of he original enhancemen curves are always big. Our mehod can perform well in hese cases because he correspondence beween differen frames is used for emporal consisency. We compare wih he algorihm in [6] wih single key frame and muliple key frames in Fig. 12 and 11. Resuls of he algorihm in [6] wih single key frame have big accumulaed errors since each frame only consider he correspondence wih is previous frame and errors of one frame will affec all he following frames. The algorihm in [6] wih muliple key frames can reduce he accumulaed errors. Bu how o selec good key frames adapively is no well solved and he simple mehod in our experimens will someimes choose a flickering frame as he key frame. 6. Conclusions In his paper, he SCE prior is discovered and experimenally verified ha he enhancemen of many leading algorihms is consisen in a local region. And a region-based Figure 12. Comparison wih he algorihm in [6] wih single key frame mehod in Pr auo conras enhancemen. pos-processing algorihm for emporal consisency is proposed, by aking ino accoun fideliy, emporal consisency and spaial consisency. User sudy, objecive and visual qualiy comparisons demonsrae ha we can keep boh he fideliy and emporal consisency of he oupu videos. 7. Acknowledge The work is suppored by ONR N14-12-1-883 and NIH 5R1EY22247-3. References [1] R. Achana, A. Shaji, K. Smih, A. Lucchi, P. Fua, and S. Sussrunk. Slic superpixels compared o sae-of-he-ar superpixel mehods. IEEE Transacions on Paern Analysis and Machine Inelligence, 34(11):2274 2282, 212. 3 [2] S. Bae, S. Paris, and F. Durand. Two-scale one managemen for phoographic look. ACM Trans. on Graph., 25(3):637 645, 26. 1, 2, 3, 5, 6, 8 [3] R. Boiard, K. Bouaouch, R. Cozo, D. Thoreau, and A. Gruson. Temporal coherency for video one mapping. Proc. SPIE 8499, Applicaions of Digial Image Processing. 2

[4] N. Bonneel, K. Sunkavalli, S. Paris, and H. Pfiser. Examplebased video color grading. ACM Trans. on Graph., 32(4):1 11, 213. 1, 2, 3, 5, 6, 7, 8 [5] Y. Chang, S. Saio, and M. Nakajima. Example-based color ransformaion of image and video using basic color caegories. IEEE Transacions on Image Processing, 16(2):329 336, 27. 2 [6] Z. Farbman and D. Lischinski. Tonal sabilizaion of video. ACM Trans. on Graph., 3(4):1 9, 211. 2, 5, 6, 7, 8 [7] P. F. Felzenszwalb and D. P. Huenlocher. Efficien belief propagaion for early vision. CVPR, 16(2):261 268, 24. 5 [8] M. Grundmann, C. McClanahan, S. Kang, and I. Essa. Pos-processing approach for radiomeric self-calibraion of video. In. Conf. Compuaional Phoography. 2, 5, 6, 7 [9] B. Guhier, S. Kopf, M. Eble, and W. Effelsberg. Flicker reducion in one mapped high dynamic range video., 211. Proceedings of he IS&T/SPIE Elecronic Imaging (EI) on Color Imaging XVI: Displaying, Processing, Hardcopy, and Applicaions. 2 [1] Y. Hacohen, E. Shechman, D. Goldman, and D. Lischinsky. Non-rigid dense correspondence wih applicaions for image enhancemen. ACM Trans. Graph., 3. 2 [11] Y. Hacohen, E. Shechman, D. Goldman, and D. Lischinsky. Opimizing color consisency in phoo collecions. ACM SIGGRAPH. 2, 5, 6, 7 [12] N. K. Kalanari, E. Shechman, C. Barnes, S. Darabi, D. B. Goldman, and P. Sen. Pach-based high dynamic range video. ACM Trans. on Graph., 32(6):1 8, 213. 2 [13] S. B. Kang, M. Uyendaele, S. Winder, and R. Szeliski. High dynamic range video. ACM Trans. on Graph., 22(3):319 325, 23. 2 [14] C. Kiser, E. Reinhard, M. Tocci, and N. Tocci. Real ime auomaed one mapping sysem for hdr video. IEEE Inernaional Conference on Image Processing, pages 2749 2752, 212. 2 [15] M. Lang, O. Wang, T. Aydin, A. Smolic, and M. Gross. Pracical emporal consisency for image-based graphics applicaions. ACM Trans. on Graph., 31(4):1 8, 212. 2 [16] C. Lee and C. Kim. Gradien domain one mapping of high dynamic range videos. IEEE Inernaional Conference on Image Processing, 3:461 464, 27. 2 [17] C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. Freeman. Sif flow: Dense correspondence across difference scenes. Proceedings of he European Conference on Compuer Vision, 3:28 42, 28. 4 [18] R. Maniuk, S. Daly, and L. Kerofsky. Display adapive one mapping. ACM Trans. on Graph., 27(3):1 1, 28. 2 [19] T. Oskam, A. Hornung, R. W. Sumner, and M. Gross. Fas and sable color balancing for images and augmened realiy. Inernaional Conference on 3D Imaging, Modeling, Processing, Visualizaion & Transmission, pages 49 56, 212. 2 [2] D. Shapira, S. Avidan, and Y. Hel-Or. Muliple hisogram maching. IEEE Inernaional Conference on Image Processing. 3 [21] G. Ye, E. Garces, Y. Liu, Q. Dai, and D. Guierrez. Inrinsic video and applicaions. ACM Trans. on Graph. 2 [22] L. Yuan and J. Sun. Auomaic exposure correcion of consumer phoographs. Proceedings of he 12h European Conference on Compuer Vision, pages 771 785, 212. 1, 2, 3, 5, 6, 7