THE INCREASING demand to display video contents

Similar documents
Adaptive Down-Sampling Video Coding

MULTI-VIEW VIDEO COMPRESSION USING DYNAMIC BACKGROUND FRAME AND 3D MOTION ESTIMATION

-To become familiar with the input/output characteristics of several types of standard flip-flop devices and the conversion among them.

TRANSFORM DOMAIN SLICE BASED DISTRIBUTED VIDEO CODING

DO NOT COPY DO NOT COPY DO NOT COPY DO NOT COPY

A Turbo Tutorial. by Jakob Dahl Andersen COM Center Technical University of Denmark

Region-based Temporally Consistent Video Post-processing

Measurement of Capacitances Based on a Flip-Flop Sensor

Overview ECE 553: TESTING AND TESTABLE DESIGN OF. Ad-Hoc DFT Methods Good design practices learned through experience are used as guidelines:

10. Water tank. Example I. Draw the graph of the amount z of water in the tank against time t.. Explain the shape of the graph.

4.1 Water tank. height z (mm) time t (s)

2015 Communication Guide

Video inpainting of complex scenes based on local statistical model

Lab 2 Position and Velocity

First Result of the SMA Holography Experirnent

A ROBUST DIGITAL IMAGE COPYRIGHT PROTECTION USING 4-LEVEL DWT ALGORITHM

Truncated Gray-Coded Bit-Plane Matching Based Motion Estimation and its Hardware Architecture

Coded Strobing Photography: Compressive Sensing of High-speed Periodic Events

Nonuniform sampling AN1

Workflow Overview. BD FACSDiva Software Quick Reference Guide for BD FACSAria Cell Sorters. Starting Up the System. Checking Cytometer Performance

Real-time Facial Expression Recognition in Image Sequences Using an AdaBoost-based Multi-classifier

CE 603 Photogrammetry II. Condition number = 2.7E+06

UPDATE FOR DESIGN OF STRUCTURAL STEEL HOLLOW SECTION CONNECTIONS VOLUME 1 DESIGN MODELS, First edition 1996 A.A. SYAM AND B.G.

Video Summarization from Spatio-Temporal Features

MELODY EXTRACTION FROM POLYPHONIC AUDIO BASED ON PARTICLE FILTER

Computer Vision II Lecture 8

Computer Vision II Lecture 8

Evaluation of a Singing Voice Conversion Method Based on Many-to-Many Eigenvoice Conversion

BLOCK-BASED MOTION ESTIMATION USING THE PIXELWISE CLASSIFICATION OF THE MOTION COMPENSATION ERROR

SC434L_DVCC-Tutorial 1 Intro. and DV Formats

Removal of Order Domain Content in Rotating Equipment Signals by Double Resampling

Physics 218: Exam 1. Sections: , , , 544, , 557,569, 572 September 28 th, 2016

application software

Telemetrie-Messtechnik Schnorrenberg

R&D White Paper WHP 120. Digital on-channel repeater for DAB. Research & Development BRITISH BROADCASTING CORPORATION.

Source and Channel Coding Issues for ATM Networks y. ECSE Department, Rensselaer Polytechnic Institute, Troy, NY 12180, U.S.A

application software

Automatic Selection and Concatenation System for Jazz Piano Trio Using Case Data

EX 5 DIGITAL ELECTRONICS (GROUP 1BT4) G

Drivers Evaluation of Performance of LED Traffic Signal Modules

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

A Delay-efficient Radiation-hard Digital Design Approach Using CWSP Elements

G E T T I N G I N S T R U M E N T S, I N C.

Automatic location and removal of video logos

(12) (10) Patent N0.: US 7,260,789 B2 Hunleth et a]. (45) Date of Patent: Aug. 21, 2007

Personal Computer Embedded Type Servo System Controller. Simple Motion Board User's Manual (Advanced Synchronous Control) -MR-EM340GF

Solution Guide II-A. Image Acquisition. Building Vision for Business. MVTec Software GmbH

Besides our own analog sensors, it can serve as a controller performing variegated control functions for any type of analog device by any maker.

The Art of Image Acquisition

Solution Guide II-A. Image Acquisition. HALCON Progress

Enabling Switch Devices

Computer Graphics Applications to Crew Displays

The Art of Image Acquisition

Diffusion in Concert halls analyzed as a function of time during the decay process

LATCHES Implementation With Complex Gates

A Methodology for Evaluating Storage Systems in Distributed and Hierarchical Video Servers

THERMOELASTIC SIGNAL PROCESSING USING AN FFT LOCK-IN BASED ALGORITHM ON EXTENDED SAMPLED DATA

AN ESTIMATION METHOD OF VOICE TIMBRE EVALUATION VALUES USING FEATURE EXTRACTION WITH GAUSSIAN MIXTURE MODEL BASED ON REFERENCE SINGER

Supercompression for Full-HD and 4k-3D (8k) Digital TV Systems

Communication Systems, 5e

On Mopping: A Mathematical Model for Mopping a Dirty Floor

AUTOCOMPENSATIVE SYSTEM FOR MEASUREMENT OF THE CAPACITANCES

Student worksheet: Spoken Grammar

Digital Panel Controller

SMD LED Product Data Sheet LTSA-G6SPVEKT Spec No.: DS Effective Date: 10/12/2016 LITE-ON DCC RELEASE

Monitoring Technology

Advanced Handheld Tachometer FT Measure engine rotation speed via cigarette lighter socket sensor! Cigarette lighter socket sensor FT-0801

LOW LEVEL DESCRIPTORS BASED DBLSTM BOTTLENECK FEATURE FOR SPEECH DRIVEN TALKING AVATAR

The Impact of e-book Technology on Book Retailing

Hierarchical Sequential Memory for Music: A Cognitive Model

Singing voice detection with deep recurrent neural networks

IN THE FOCUS: Brain Products acticap boosts road safety research

MELSEC iq-f FX5 Simple Motion Module User's Manual (Advanced Synchronous Control) -FX5-40SSC-S -FX5-80SSC-S

LABORATORY COURSE OF ELECTRONIC INSTRUMENTATION BASED ON THE TELEMETRY OF SEVERAL PARAMETERS OF A REMOTE CONTROLLED CAR

SAFETY WITH A SYSTEM V EN

TUBICOPTERS & MORE OBJECTIVE

Trinitron Color TV KV-TG21 KV-PG21 KV-PG14. Operating Instructions M70 M61 M40 P70 P (1)

LCD Module Specification

Determinants of investment in fixed assets and in intangible assets for hightech

Type: Source: PSU: Followspot Optics: Standard: Features Optical Fully closing iris cassette: Long lamp life (3000 h) Factory set optical train:

Sustainable Value Creation: The role of IT innovation persistence

Circuit Breaker Ratings A Primer for Protection Engineers

VECM and Variance Decomposition: An Application to the Consumption-Wealth Ratio

United States Patent (19) Gardner

TEA2037A HORIZONTAL & VERTICAL DEFLECTION CIRCUIT

DIGITAL MOMENT LIMITTER. Instruction Manual EN B

USB TRANSCEIVER MACROCELL INTERFACE WITH USB 3.0 APPLICATIONS USING FPGA IMPLEMENTATION

ZEP - 644SXWW 640SX - LED 150 W. Profile spot

Mean-Field Analysis for the Evaluation of Gossip Protocols

AN-605 APPLICATION NOTE

LCD Module Specification

Q = OCM Pro. Very Accurate Flow Measurement in partially and full filled Pipes and Channels

TLE Overview. High Speed CAN FD Transceiver. Qualified for Automotive Applications according to AEC-Q100

Novel Power Supply Independent Ring Oscillator

Flo C. Compact W MSR. Followspot

SAFETY WARNING! DO NOT REMOVE THE MAINS EARTH CONNECTION!

Press Release

SOME FUNCTIONAL PATTERNS ON THE NON-VERBAL LEVEL

TLE6251D. Data Sheet. Automotive Power. High Speed CAN-Transceiver with Bus Wake-up. Rev. 1.0,

Press Release. Dear Customers, Dear Friends of Brain Products,

Transcription:

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014 797 Compressed-Domain Video Reargeing Jiangyang Zhang, Suden Member, IEEE, Shangwen Li, Suden Member, IEEE, andc.-c.jaykuo,fellow, IEEE Absrac In his paper, we presen a compressed-domain video reargeing soluion ha operaes wihou compromising he resizing qualiy. Exising video reargeing mehods operae in he spaial (or pixel) domain. Such a soluion is no pracical if i is implemened in mobile devices due o is large memory requiremen. In he proposed soluion, each componen of he reargeing sysem is designed o exploi he low-level compressed domain feaures exraced from he coded bi sream. For example, moion informaion is obained direcly from moion vecors. An efficien column shape mesh deformaion is employed o solve he difficuly of sophisicaed quad-shape mesh deformaion in he compressed domain. The proposed soluion achieves comparable (or slighly beer) visual qualiy performance as compared wih several sae-of-he-ar pixel-domain reargeing mehods a lower compuaional and memory coss, making conen-aware video resizing boh scalable and pracical in real-world applicaions. Index Terms Video reargeing, compressed domain processing, conen-aware, cropping, column mesh, warping. I. INTRODUCTION THE INCREASING demand o display video conens on devices wih differen resoluions and aspec raios calls for new soluions o video resizing. Tradiional resizing echniques are incapable of meeing his requiremen as hey eiher discard imporan informaion (e.g. cropping) or inroduce visual arifacs by over-squeezing he conen (e.g. homogeneous rescaling). The goal of conen-aware video resizing (or video reargeing) is o change he aspec raio and resoluion of videos while preserving he visually imporan conen and avoiding noiceable arifacs. Recenly, many pixel-domain soluions have been proposed for video reargeing, which conducs resizing eiher hrough nonuniform warping [1] [4] or ieraively removing unimporan conens [5], [6]. Despie heir promising resuls and real-ime performance in he spaial domain [2], [4], such soluions are sill impracical as hey only operae on raw video daa. Since mos real-world video conens are sored and ransmied only in compressed forma, spaial domain reargeing echniques are ineviably encapsulaed by addiional overheads of decompression and recompression. In his work, we aemp o address his issue wih a novel framework ha performs conen-aware resizing direcly on he compressed video bisream. Processing video con- Manuscrip received April 5, 2013; revised Sepember 9, 2013 and November 19, 2013; acceped December 4, 2013. Dae of publicaion December 11, 2013; dae of curren version January 9, 2014. The associae edior coordinaing he review of his manuscrip and approving i for publicaion was Dr. Charles Creusere. The auhors are wih he Ming Hsieh Deparmen of Elecrical Engineering, Universiy of Souhern California, Los Angeles, CA 90089 USA (e-mail: jiangyaz@usc.edu; shangwel@usc.edu; cckuo@sipi.usc.edu). Color versions of one or more of he figures in his paper are available online a hp://ieeexplore.ieee.org. Digial Objec Idenifier 10.1109/TIP.2013.2294541 ens direcly in he compressed domain has many advanages in erms of speed, sorage efficiency and qualiy. Firs, he compuaional ime is significanly reduced as a few ime-consuming modules (such as moion esimaion) can be effecively avoided. Second, he daa rae is highly reduced in he compressed domain, leading o significan memory saving. Lasly, exra video qualiy degradaion can be avoided as he quanizaion and ransform seps are no performed in he final re-encoding sage. One of he main conribuions of his work is he formulaion of he reargeing problem using compressed domain feaures and operaions. Performing video reargeing in he compressed domain is fundamenally differen from ha in he pixel domain, since he former is limied by many consrains. For example, all pixel-level informaion (e.g., color, gradien, saliency), which has been exensively used for spaial-domain reargeing, is unavailable in he compressed domain. Insead, wha we have is he block-level Discree Cosine Transform (DCT) coefficiens, which are no direcly correlaed wih pixel-level feaures. Thus, his consrain demands us o ake a very differen pah o he soluion. Specifically, our soluion compues he visual imporance map using compresseddomain feaures obained from he compressed video bisream. The moion informaion is obained direcly from moion vecors in he coded file, and here is no need o perform expensive compuaion o exrac his informaion again (e.g., opical flow as conduced in previous spaialdomain mehods [2] [4]). In addiion, exising warping-based approaches [2] [4], [7] ofen adop quad-shape mesh deformaion. However, such geomerical modificaion is difficul o perform in he compressed domain. To overcome his limiaion, we develop a novel column mesh deformaion ha is compaible wih compressed domain operaions wihou compromising he qualiy of resizing resuls. Afer performing video reargeing, due o he change in he frame size and he aspec raio, moion vecors and predicion modes of he oupu macroblocks have o be recompued. Anoher main conribuion of his work is he developmen of new mode decision and moion esimaion modules in he re-encoding sage o allow compuaional saving. By exploiing block correspondences before and afer reargeing, we provide a fas ye effecive mehod o esimae moion vecors and predicion modes of he reargeed video. This saves effors in going hrough he ime-consuming mode decision and moion esimaion procedures of a sandard video encoder. The proposed fas soluion can achieve a speed up facor of 90 (compared wih full moion search) and 30 (compared wih fas moion search) in he encoding sage. For visual qualiy performance evaluaion, we repor a subjecive user es consising of 56 subjecs ha compare our reargeing resuls 1057-7149 2013 IEEE. Personal use is permied, bu republicaion/redisribuion requires IEEE permission. See hp://www.ieee.org/publicaions_sandards/publicaions/righs/index.hml for more informaion.

798 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014 wih hose of sae-of-he-ar spaial-domain video reargeing mehods. Alhough he exemplary video coding sandard used in his work is H.264/AVC, he proposed echniques can be applied o oher video coding sandards in a similar fashion. The res of his paper is organized as follows. Relaed previous work is reviewed in Secion II. The sysem overview is presened in Secion III. The hree key componens of he sysem; namely, parial decoding, compressed domain re-sizing and re-encoding, are deailed in Secions IV, V-E and VI, respecively. Experimenal resuls are shown in Secion VII. Finally, concluding remarks are given in Secion VIII. II. RELATED WORK A. Image Reargeing Conen-aware image resizing echniques can be generally classified ino wo caegories: he discree and he coninuous approaches [8]. For he discree approach, conen-aware resizing of an image is achieved hrough idenifying and removing unimporan image conens. The cropping-based mehod [9] idenifies he mos prominen componens in an image hrough saliency-based measures and cus ou a recangular region as he reargeing resul. The seam carving mehod [10] resizes an image hrough coninuously removing pahs of pixels wih he leas amoun of energy. Realizing ha no single reargeing operaor could perform well on all images, he muli-operaor mehod [11] ha combines hree differen operaors (namely, scaling, cropping and seam carving) was proposed. For he coninuous approach, he reargeing problem [1], [12] is formulaed as nonlinear warping in which shapes of salien regions should be well preserved while hose of non-salien regions are allowed o be squeezed or sreched. Homogeneous image scaling and cropping in he compressed domain has been sudied in lieraure [13], [14]. By exploring he disribuive propery of he uniary orhogonal ransform, image resizing mehods ha achieve DCT domain down-sizing/up-scaling by a facor of wo were proposed in [13]. In [14], ransform domain image resizing is furher exended by allowing resizing wih an arbirary raio. Mos recenly, conen-aware image resizing in he compressed domain has been sudied. For example, Fang e al. [15] proposed a JPEG image reargeing scheme guided by a saliency map compued using DCT coefficiens. Alhough compressed domain feaures are used in his mehod, he seam removal procedure sill requires full decoding of he JPEG image. In addiion, as he overhead of compression and decompression on images is much lower han videos, he advanage of reargeing in he compressed domain is less obvious. B. Video Reargeing Video reargeing is differen from image reargeing as emporal coherency and objec moions are addiional facors o consider. Mos video reargeing mehods exend imagebased reargeing mehods by adding addiional consrains ha enforce emporal-adjacen regions o undergo similar ransforms [1], [2], [5]. Based on he seam carving image reargeing mehod [10], video reargeing is formulaed as a graph-cu problem in a hree-dimensional spaial-emporal cube [5]. In [6], he removed seams are allowed o be unconneced, and i was shown ha disconinuous seams ouperform coninuous ones under cerain scenarios. In [3], cropping is furher inroduced ino he video reargeing framework as i could ouperform warping especially when he video is over-populaed wih visually imporan conens. The issue of scalabiliy is addressed in [4], which opimizes saliencybased resizing and emporal coherency separaely o achieve reducion in boh compuaional and memory requiremens. Mos recenly, a video reargeing soluion ha uilizes compressed-domain feaures was proposed in [16]. Based on he spaial-domain seam carving approach [10], his mehod uses compressed-domain feaures (e.g. DCT coefficiens, moion vecors) o compue he opimum seams for removal. However, his mehod does no work fully in he compresseddomain, as he resizing sep sill operaes in he pixel-domain and requires full-decoding of each video frame. To he bes of our knowledge, here is no exising soluion o video reargeing in he compressed-domain. Our work provides a novel and pracical soluion for compressed domain video reargeing ha is applicable o he grea majoriy of oday s video. III. SYSTEM OVERVIEW The proposed sysem akes he H.264/AVC encoded video bisream as he inpu, conducs reargeing direcly on parially decoded DCT coefficiens, and oupus he H.264/AVC-complian bisream of he reargeed video. The proposed sysem consiss of hree separae sages (or modules) as shown in Fig. 1. They are: 1) he parial decoding sage; 2) he compressed domain video resizing sage; and 3) he re-encoding sage. In he parial decoding sage, we decode he video bisream parially o reconsruc he non-inverse-quanized DCT coefficiens of each frame. In he resizing sage, we perform video analysis and conen-aware resizing direcly based on compressed domain feaures and operaions. The oupu of he resized image is re-encoded o be he oupu bisream in he las sage. Deails of each module will be described in he following secions. IV. PARTIAL DECODING In his sage, we parially decode he inpu bisream o he proper form of compressed daa such ha i can be furher uilized in he resizing sage. Alhough we ry o limi he amoun of operaions conduced in his sage, a cerain amoun of decoding is sill required. The inpu bisream is firs enropy decoded ino residual DCT blocks, which are hen used o compue reconsruced DCT blocks of he original video frame. Since our sysem operaes direcly in he DCT domain, no inverse DCT is required as well. To furher reduce he overhead of he decoding sage, we avoid he inverse quanizaion sep as well. Therefore, he oupu

ZHANG e al.: COMPRESSED-DOMAIN VIDEO RETARGETING 799 Fig. 1. The block diagram of he proposed compressed-domain video reargeing sysem ha consiss of hree sages (or modules): 1) he parial decoding sage, 2) he compressed domain video resizing sage, and 3) he re-encoding sage. Fig. 2. The block-diagram of he compressed-domain video resizing sage. This module akes reconsruced DCT coefficiens as inpu and oupus DCT coefficiens of he reargeed video. of he parial decoding sage are non-inverse-quanized DCT block coefficiens. To compue reconsruced DCT coefficiens from he residual daa, we make use of he ransform domain predicion echniques proposed in [17], [18]. As he H.264/AVC sandard suppors boh inra and iner predicion modes, wo ypes of predicion are conduced here. For he iner-predicion mode, we use he macroblockwise inverse moion compensaion (MBIMC) scheme proposed by Porwal e al. [17]. In his scheme, he prediced DCT block of he curren frame is esimaed using DCT coefficiens of nine spaially-adjacen blocks in he previous frame. Alhough originally proposed for he MPEG sandard, his mehod can be easily applied o he iner predicion mode of H.264/AVC as well. For he inra-predicion case, differen siuaions have o be considered as H.264/AVC suppors nine inra-predicion modes for 4 4 sub-blocks and four inra-predicion modes for 16 16 macroblocks. Here, we use he mehod in [18] o compue he DCT coefficiens of inra-prediced blocks. By combining he inra/iner-prediced DCT coefficiens and he parially decoded residual DCT coefficiens, we obain reconsruced DCT coefficiens, which will be uilized in he resizing sage. Anoher imporan ask o be performed in his sage is exracing moion vecors from he inpu bisream. The moion vecors are emporarily sored, and hey will be processed and uilized in boh he resizing and he re-encoding sages. V. C OMPRESSED D OMAIN V IDEO R ESIZING In his sage, we perform conen-aware resizing using compressed-domain feaures and operaions wih muliple seps. The inpu video is firs segmened ino differen scenes and each scene is processed separaely. Then, we analyze he imporance of each scene using hree differen measures: saliency, moion and exure. Guided by he imporance map, he inpu video is parially resized hrough opimum cropping, followed by he column-mesh-based warping procedure o reach is desired size. Finally, we compue DCT coefficiens of he reargeed resul. The block-diagram of his procedure is depiced in Fig. 2. All seps will be deailed below.

800 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014 Fig. 3. Top: he manual scene segmenaion resul for a 1200-frame segmen of he big buck bunny sequence. Boom: he percenage of block change T c of each frame. The sharp peaks in he T c curve closely mach he manual segmenaion resul. A. Scene Change Deecion We use a pair-wise macroblock comparison mehod o deec scene changes beween consecuive frames. For each video frame, we compare he DCT coefficiens of each block wih he average coefficien values of he same block in previous frames. The conen difference for block k is compued as η k = 1 N N ck (i) c k T (i) N 2 max{c k (i), c k T (i)} i=1 where N is he size of block, c k (i) is he DCT coefficien of block k, and c k T (i) is he average DCT coefficiens of block k of he previous T frames (in our case, T = 5). If his difference exceeds a prese hreshold, we claim ha here is a change for block k.weused k o denoe his change: { 1 ηk >η D k = hre 0 oherwise Le K be he oal number of blocks in a frame. For all blocks in he curren frame, when he percenage of changed blocks Kk=1 D exceeds a prese hreshold τ; namely, T c = k K >τ,a scene change is deeced, and he enire daa of his scene will be loaded for furher processing. Fig. 3 illusraes he scene change deecion resuls for a segmen of 1200 frames of he big buck bunny sequence. The sharp peaks in he T c curve indicaes he occurrences of scene changes. B. Visual Imporance Analysis The visual imporance map is used o guide he reargeing process, since we would like o preserve he imporan conen as much as possible while allowing he unimporan conen undergo more deformaion in an ideal reargeed resul. Once a scene change is deeced, daa of he enire scene would be processed ogeher for visual imporance analysis. The challenge of conducing analysis in he compressed-domain is ha he original pixel values are unknown and we need o rely on compressed-domain feaures (e.g., DC, AC coefficiens, moion vecor, ec.) only. In his work, we perform conen analysis based on hree feaures: saliency, exure and moion. Each analysis generaes one single map, which will evenually be combined o form he final visual imporance map. 1) Saliency Map: Saliency is used o deec he region of ineres in images, and has been widely used o guide boh image and video reargeing [2] [4], [7], [12]. While mos saliency deecion mehods operae in he pixel-domain [19], [20], here is recen work on compressed-domain saliency deecion for images [15] and videos [21]. In he proposed sysem, we adop he specral-residual visual aenion model [19] for saliency map compuing. In [19], he saliency map of an image is calculaed using he specral residual signal, derived from analyzing he log specrum of he image. Alhough his saliency deecion mehod operaes in he pixel-domain, we can modify i so ha i can be used in he DCT domain as well. Insead of downsampling he inpu image o a smaller size as done in [19], we direcly use he DC coefficien of each DCT block and apply he same saliency deecion algorihm o his DC-based image. For he 4 4 DCT ransform, he DC coefficiens of he enire image yield an equivalen image obained by downsampling he original image o is original size by a facor of 4 4. For improved emporal coherency, he visual saliency map is emporally filered wih is neighboring T frames (in our sysem, T = 5). 2) Texure Map: Fine srucures, such as exures and edges, need special reamen in video reargeing. One of he limiaions of seam carving [5] is ha, when he removed seams pass hrough edge regions, noiceable arifacs would occur. In addiion, he effec of exure regulariy on he reargeing resul was sudied in [22], and i is observed ha sochasic exures are less suscepible o large deformaion han regular exures. The saliency map generaed using [19] conains limied amoun of exure informaion, as only DC coefficiens were used for compuaion. Exracing exures and edges normally require pixel-level processing, ye i is possible o obain some level of exure and edge informaion hrough frequency analysis on DCT coefficiens as exures and edges correspond o mid-o-high frequency componens in he DCT domain. In he proposed sysem, we use differen frequency componens of DCT coefficiens o generae he feaure vecor for each block. To classify each block ino one of he hree caegories (exure, edge and smooh region), we compue he disance beween he feaure vecor of a given block wih a group of prese feaure vecors, obained by raining on a se of es sequences from he public daabase [23]. The likelihood of a block belonging o a paricular caegory (k = ex, edge, smooh) is given by: L k = e 1+d 1 k, k e 1+d 1 k where d k is he Euclidean disance beween he feaure vecor and he prese cenroid vecor for caegory k. In his work, we pay special aenion o he exure map L ex. The exure degree of a block also provides a measure on he reliabiliy of moion vecors. I is well known ha low-exured regions end o yield larger encoding maching errors [24]. For each macroblock, we can compue a confidence score for he corresponding moion vecor using is exure degree, which will be elaboraed in he nex secion. 3) Moion Map: The saliency map generaed by [19] mainly capures he visual aracive pars in an individual frame, bu

ZHANG e al.: COMPRESSED-DOMAIN VIDEO RETARGETING 801 Fig. 4. Illusraion of moion map generaion using moion vecors from sequence coasguard. From lef o righ: one video frame, he original moion vecor map, he compensaed objec moion, and he final moion map (afer applying emporal filering on he compensaed moion map). fails o consider he moion informaion, which is anoher criical facor for imporan conen deecion in video. For example, a fas moving objec migh be non-salien in a single frame, bu an imporan conen in a video sequence. Mos spaial-domain video reargeing mehods [2] [4], [7] have incorporaed moion deecion echniques based on he SIFT feaure [25] or opical flow [26]. However, hese moion deecion mehods are no applicable in he compressed domain. Our goal here is o deec moving regions in he video sequence using he moion vecors embedded in he video bisream. There are wo challenges for using moion vecors direcly for moving objec deecion: 1) I is common ha he video includes various ypes of camera moions (e.g. zoom, pan, il) and hey need o be excluded o reflec rue objec moion. 2) Some moion vecors are unreliable as hey do no agree wih he rue moion. In he proposed sysem, he camera moion is esimaed using a four-parameer global moion model [27]. In his model, he relaionship beween pixels of consecuive frames can be wrien as: ( x = ȳ) ( ) ( ) z r x pr +, (1) r z)( y p d where x and y are coordinaes of he curren frame, x = x mv x and ȳ = y mv y are coordinaes of he previous frame, z, r, p r and p d are he four unknown camera parameers represening zoom, roae, pan righ and pan down, respecively. To esimae he four camera parameers, we can re-wrie Eq. (1) as an over-deermined linear sysem [27] and compue he leas-square esimaor of he four camera parameers: X LS = [ zr p r p d ] T = (H T H) 1 H T Y, (2) where Y is he observaion column vecor and H is he spaial locaion marix. As done in [24], we weigh he rows of Y and H using he confidence measure compued from he exure map so ha unreliable moion vecors would have a minimum impac on he camera moion esimaion resul. The esimaion process given above assumes ha objec moion does no fi ino he camera model in Eq. (1) and becomes ouliers in he leas square esimaion. The esimaed objec moion is hen compued as mv x (1) mv y (1) MV obj = mv x (2) mv y (2) = Y H X LS,... where mv x (i) and mv y (i) are he compensaed moion vecor componens of block i. The final moion map is compued using he magniude of he compensaed moion vecor and applying a emporal filer over he neighboring T frames (in our case, T = 5). We show he moion vecors afer moion compensaion and he final moion map for sequence coasguard in Fig. 4. The original moion vecors include boh he objec moion (ship) and he camera moion (righ pan). Afer moion compensaion, we eliminae he camera moion from he original moion vecors, leaving only he objec moion. The wo camera parameers in Eq. (2), p r and p d will be used in he mesh deformaion sage as described in Secion V-D. 4) Visual Imporance Map: For each video frame, he final visual imporance map denoed by I is compued by combining all hree maps (see Fig. 5) generaed from he above analysis: I = I s I I m, (3) where I s, I and I m are he saliency, exure and moion map, respecively. The values of all hree maps are normalized o he range of [0.10, 1.00]. Alhough here are oher ways (e.g., weighed sum) o fuse all hree maps ino he final imporance map, we find ha he muliplicaion-based fusion generaes a more saisfying resul. C. Opimum Cropping The imporance of incorporaing cropping ino video reargeing was exensively discussed in [3]. When he video sequence is densely populaed wih salien conens, i is difficul o preserve all salien conens while mainaining emporal coherence, because he reargeing resul would be close o uniform scaling in his scenario. Insead of performing nonuniform warping on he enire frame, a beer soluion is o allow some of non-salien regions o be discarded. In he proposed sysem, we parially resize he video hrough cropping firs, and hen perform warp-based deformaion o resize he video o is desired size. We define he

802 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014 Fig. 5. Illusraion of he visual imporance analysis procedure: frames of he enire scene are analyzed and hree maps (saliency, exure and moion) are generaed. Being fused ogeher, hey form he final visual imporance map. Fig. 6. Lef: he average column imporance curve for differen cropping facors in he cropping range. Each poin in his curve corresponds o he bes window of a given lengh. The opimum cropping facor and is corresponding window maximize he average column imporance wihin he cropping window. Righ: he opimum cropping window and he average visual imporance for each column. Columns marked by blue color represen he region ha fall inside he cropping window while columns marked by red color would be discarded afer cropping. cropping facor as he percenage of block columns o discard during cropping. The opimum cropping facor, which balances he amoun of cropping and warping, needs o be deermined a he firs place. Since he imporance map is compued a he DCT block level, we perform cropping hrough discarding unimporance columns of DCT blocks (size of 4 4 in our case). In he following, we assume he resizing is conduced along he horizonal direcion. The same process can be applied o resizing in he verical direcion. To compue he opimum cropping facor, we firs deermine he minimum and he maximum cropping facors, which are precompued values based on he desired resizing facor. For example, if he resizing facor is 0.50 (resized o half widh), he minimum cropping facor is se o be 0.50 + (1.0 0.50) 20% = 0.60, while he maximum cropping facor is 0.50 + (1.0 0.50) 80% = 0.90. Wihin his cropping range, we compue he average visual imporance value for all possible cropping opions. Our goal is o find a recangular region in he inpu video ha conains he maximum average visual imporance value. Since we are only dealing wih a limied number of opions, we use exhausive search o compue he bes cropping window of a given window lengh. Afer compuing he bes cropping window for each window lengh (see Fig. 6), we look for he opimum window lengh ha conains he maximum average visual imporance value. Once we have deermined he opimum cropping window of he given video scene, we discard he columns of DCT blocks ha fall ouside of his window. This cropping window will be applied o all he frames of he same scene. D. Column Mesh Deformaion Afer parially resizing hrough cropping, he video will be furher resized o is desired resul hrough nonuniform warping. For spaial domain reargeing mehods [2] [4], [7], a quad-shape mesh is ofen used o guide he warping operaion. However, conducing a quad-o-quad deformaion is a challenging ask in he compressed domain since i is difficul o compue he corresponding DCT coefficiens of a block afer i is warped o an arbirary-shape quadrilaeral. Insead of adoping he convenional quad-shape mesh, we use a column-shape mesh o guide he warping process. For mesh warping, we exend he formulaion in [22] and adjus i o our proposed column-mesh srucure. In addiion, wo new energy erms are added o preserve moion and emporal coherency. Consider a column mesh, represened by M = V, C, as shown in Fig. 7, where C = {c1, c2,..., cn } denoes he se of columns, and V = v 0, v 1, v 2,..., v n is he se of verex posiions, wih v i represening he horizonal coordinae of verex i. The widh beween consecuive mesh verices is se o 4, which is he same as he lengh of ransform block. We place such a mesh on each inpu video frame. We ake he iniial verex posiions of each frame, V, as he inpu and solve for heir new posiions Vnew = v 0, v 1, v 2,..., v n by minimizing an objecive funcion as described below. 1) Shape Deformaion: During he resizing process, we wan o preserve he shape of columns wih high saliency while allowing columns of lower saliency o be squeezed or sreched more. For each each frame, we measure he amoun of shape

ZHANG e al.: COMPRESSED-DOMAIN VIDEO RETARGETING 803 Fig. 7. The column mesh used for compressed-domain video resizing. The mesh M = {V, C} includes a se of verices V = v 0, v 1, v 2,..., v n and a se of columns C = {c1, c2,..., cn }. deformaion of column i as 2 ) li (v i v i 1 ), i = 1, 2, 3, n, D (ci ) = (v i v i 1 where li is he opimum scaling facor for Ci and is updaed a each ieraion as: l i (k) li = (k) (k) v i v i 1 where l i is he original widh of column i before deformaion. (k) v i is he verex posiion of column i a ieraion k. The shape deformaion energy of all columns is given by I (ci ) D (ci ), (4) Ed = ci C where I (ci ) is he average visual imporance of column ci a frame. 2) Verex Order Preservaion: I is possible ha some verices may flip over each oher afer mesh deformaion, leading o unwaned arifacs. To avoid i, we preserve heir relaive posiions wih respec o heir immediae neighboring verices by mainaining heir relaive barycenric coordinae. For verices v i, we minimize: 2 v Ev = m v (5) ij j, i i v j N(v i ) where m i j is he barycenric coordinae of v j wih respec o v i, and N(v i ) represens he se of neighboring verices of v i. 3) Temporal Coherency: To preserve emporal coherency and avoid jiering arifacs, we enforce he emporal smoohness of verex posiions across neighboring frames. Specifically, we ry o minimize 2 +1 Ec = (6) v i v i i where v i are verex posiions of he previous frame. 4) Moion Preservaion: As noed in [3], simply enforcing per-pixel smoohing along he emporal dimension, which does no ake objec or camera moion ino accoun, yields poor re-sizing resuls. Under his scenario, an objec ha moves from he lef o righ of he frame may be resized differenly hroughou he whole scene. For example, as shown in Fig. 8 (op), he scene consiss of camera panning from righ Fig. 8. The impac of using moion preservaion in he column mesh deformaion. Top: resizing resuls wihou considering moion preservaion and he corresponding column verex movemen pahs. The ree is resized inconsisenly a differen frames. Boom: resizing resuls ha considers moion preservaion and he corresponding column verex movemen pah. The ree size undergoes more consisen ransformaion hroughou he enire video sequence. o lef and he ree size has changed across he frames wihou moion preservaion. To accoun for objec moion and camera moion, we exploi he camera parameers esimaed in Secion V-B. Specifically, we uilize he camera righ panning parameer, pr, since we resize he inpu video along he horizonal direcion. Similarly, he down panning parameer, pd, will be used if we perform resizing along he verical direcion. To achieve moion-aware resizing, we minimize he following energy: 2 +1 +1 ) (u i u i 1 ), (7) Em = (v i v i 1 i where u i = v i +1 pr, and pr is he righ panning parameer of frame. u i represens he corresponding posiion of u i afer mesh deformaion. Since u i may no align wih any of v i, we represen i wih a linear combinaion of he column mesh verices in is immediae viciniy as u i = m i j v j, v j N(u i ) where m i j is he barycenric coordinaes of u i w.r.. column verices v j of is immediae viciniy N(u i ). Noe ha we only consider columns whose corresponding posiions in he previous frames are sill wihin he frame boundary. For oher columns, heir emporal coherency will be preserved by he emporal coherency energy defined in Eq. (6). As shown in Fig. 8 (boom), afer incorporaing he energy funcion for moion preservaion, he ree size has been preserved hroughou all frames. 5) Join Opimizaion for Column Mesh Deformaion: Combining all energy erms in Eqs. (4) (7), we solve for he deformed column mesh by minimizing he following objecive funcion: E = Ed + α Ev + β Ec + γ Em, (8)

804 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014 subjec o he boundary consrain. The weighing coefficiens are empirically se o α = 1.0, β = 50.0 andγ = 10.0 in our experimens. We can represen he objecive funcion in Eq. (8) and is consrain in marix forma as: DV l(v) }{{ 2 + CV } 2 + TV S }{{}}{{} 2 Eq.(4) Eq.(5) Eq.(6) + MV N }{{} 2 + PV Q 2, Eq.(7) (9) where he las erm, PV Q 2, denoes he posiion consrain imposed by he arge video size. The marix expression given above can be rewrien as where min V AV b(v) 2, (10) D l(v) C A = T M, b(v) = 0 S N. P Q The nonlinear leas-squares opimizaion problem in Eq. (10) can be solved hrough an ieraive Gauss-Newon mehod. The verex posiions are iniialized wih a homogenous resizing condiion and updaed ieraively via V (k) = (A T A) 1 A T b(v (k 1) ) = H b(v (k 1) ), (11) where V (k) is he vecor of verex posiions afer he k-h ieraion. As H is only dependen on A, i can be precompued and say fixed during he ieraion process. E. Transform Domain Block Resizing Once he deformed mesh is compued, we use his informaion o resize he video frame in he compressed domain. We employ he DCT domain resizing mehod in [14], which suppors resizing wih arbirary facors. In [14], each block in he resized frame is downsized from a recangular region (called he supporing area) in he original frame as illusraed in Fig. 9. The resizing is conduced in wo separae seps: 1) exracing he supporing area from he original frame, and 2) downsizing he supporing area o a square-size oupu block. In he proposed sysem, he block resizing ask is conduced a he ransform block level. Wih he informaion of he deformed mesh, we firs compue he supporing area of each oupu macroblock hrough reverse mapping. Specifically, for every verex v i in he reargeed frame, we compue is corresponding coordinaes in he original frame hrough inerpolaion. Then, every N N block (N = 4 in our case) in he resized frame is reverse-mapped o an N N block in he original frame. The heigh of he supporing area equals o he oupu block as we only consider resizing along he horizonal direcion. I should be noed ha he supporing area may cover muliple ransform blocks in he original frame, and some blocks may be only parially covered by he supporing area. Fig. 9. Illusraion of he supporing area, which each macroblock of he oupu video frame is resized from is corresponding supporing area in he original frame. By following [14], he supporing area is resized o he oupu block via B = [ I N 0 ] [ ] IN N N M L B i M R, (12) 0 N N B i S( B) where I N is an ideniy marix of size N N, andb i are DCT coefficiens of blocks ha are covered by he supporing area of B, M L and M R are he DCT ransforms of shifing marices, and S( B) denoes he supporing area of block B. VI. RE-ENCODING In he las sage, we re-encode he block DCT coefficiens of he reargeed resul o an H.264/AVC-complian bisream. The re-encoding sep is essenially he reverse process of ha in he parial decoding sage. I should be noed ha quanizaion is no required here since we did no perform inverse quanizaion in he parial decoding sage. The encoder forms a predicion of each macroblock based on previouslycoded daa eiher from he curren frame using inra predicion [18] or from oher frames ha have already been coded [17]. The predicion is hen subraced from he curren macroblock o form a residual and expored o he oupu bisream. However, since each macroblock is modified during he reargeing process, here are wo addiional issues o be addressed: 1) macroblock ype selecion, and 2) moion vecor re-esimaion. We will describe our soluions below. A. Macroblock Type Selecion For selecing he macroblock ypes, we employ he MTSS scheme [28], which was originally proposed for he compressed-domain video downsizing sysem. This scheme can be modified o fi our reargeing framework. For he H.264/AVC video coding sandard, a macroblock in a P frame can be inra-coded, forward prediced, or skipped. A prediced B-frame macroblock can be inra-code, forward, backward, or bi-direcionally prediced, or skipped. We use he area proporion of each macroblock w.r.. he supporing area o deermine he predicion mode of oupu macroblocks. A reargeed macroblock is inra-coded if and only if more han 50% of is supporing area covers macroblocks ha are originally inra-coded. For B frames, a reargeed macroblock is o be forward prediced if more han 70% of is supporing area covers original macroblocks ha have forward predicion.

ZHANG e al.: COMPRESSED-DOMAIN VIDEO RETARGETING 805 TABLE I SCENE CHANGE DETECTION RESULTS ON TWO TEST SEQUENCES:BIG BUCK BUNNY AND ELEPHANTS DREAM as blocks wih zero moion vecors. The effeciveness of our moion vecor refinemen scheme is validaed hrough experimens in he nex secion. VII. EXPERIMENTAL RESULTS Fig. 10. Moion vecor of a reargeed block, mv, is esimaed using he moion vecors in is supporing area: mv 1, mv 2 and mv 3. Similarly, i is backward prediced if 70% of is supporing area covers original macroblocks ha have backward predicion. In he case where he supporing area covers boh forward and backward prediced macroblocks while boh are lower han 70%, hen he predicion ype ha has a higher percenage will deermine he reargeed macroblock ype. In case of a ie (50% backward, 50% forward) he macroblock is bi-direcionally prediced. B. Moion Vecor Refinemen The convenional way o generae an H.264/AVC bisream of a resized video requires decompressing i and hen applying a spaial-domain moion esimaion echnique o recompue moion vecors in he pixel domain. However, recompuing moion vecors is a compuaionally inensive procedure and i ypically akes 60% or higher of he workload of a video encoder [29]. To remedy his, we propose a moion vecor refinemen echnique ha works direcly in he compressed domain and re-esimae he new moion vecor using moion vecors of he original macroblocks. The proposed refinemen echnique is inended only for iner-frame coding, as inra frames are coded independenly and do no conain any moion informaion. Consider he case of resizing a supporing area of size N N (N = L 1 + L 2 + L 3 ) o he oupu macroblock of size N N, as shown in Fig. 10. The supporing area covers hree macroblocks in he original frame wih moion vecors mv 1, mv 2 and mv 3, respecively. The moion vecor mv of he resized macroblock is esimaed as mv = N i mv i L i N i L, (13) i where mv i is he moion vecor of original macroblock i and L i is he lengh of macroblock i in he supporing area. I should be noed ha inra macroblocks are considered In his secion, we demonsrae he effeciveness of he proposed compressed-domain video reargeing soluion by comparing is resuls wih previous spaial domain video reargeing mehods [2], [4], [5]. We begin wih evaluaing he scene change deecion mehod in our proposed sysem, followed by visually comparing our reargeing resuls wih oher spaial-domain echniques. The effeciveness of our moion vecor refinemen echnique is hen validaed, followed by compuaion complexiy analysis on he proposed sysem versus spaial-domain reargeing mehods. Finally, we presen subjecive qualiy evaluaion resuls conduced on 56 subjecs. A. Scene Change Deecion Evaluaion In Table I, we evaluae he performance of our scene change deecion algorihm proposed in Secion V-A. The wo es sequences used in his experimen, big buck bunny and elephans dream, conain various ypes of scene changes and camera moions. In our experimen, abrup and gradual scene changes are evaluaed separaely. The performance of a scene-change-deecion algorihm is measured in erms of recall and precision rae. The recall and prevision rae are defined as: N c Recall = 100% N c + N m N c Precision = 100% N c + N f where N c, N m and N f represen he number of correc, miss and false deecions, respecively. Our proposed scene change deecion algorihm performs relaively well on big buck bunny sequence, wih recall rae of 96.97% and precision rae of 94.81%. The algorihm has beer performance in deecing abrup changes han gradual changes. While none of he abrup changes are missed by our mehod, he number of missed gradual changes is relaively high (4 ou of 5 were missed). On he oher hand, our mehod has relaively lower recall and prevision raes on elephans dream sequence. This is mainly due o he exisence of non-saic background and fas camera moions in he video conen. This implies ha our deecion mehod is no robus enough o perform well on all ypes of video sequences and here is sill room for furher improvemen.

806 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014 Fig. 11. Performance comparison of he proposed soluion versus he seam carving mehod [5] for sequence ra and roadski. From lef o righ: he original video sequence, he resul of seam carving [5] and our resul. The seam carving mehod is incapable of preserving he shape of prominen edges, as can be observed a from he disorions on he curb-line in he ra sequence and he road-lines of he roadski sequence. Fig. 12. Performance comparison of he proposed soluion versus he pixel-warp mehod [2] for sequence waerski. From lef o righ: he original video sequence, he resul of [2] and our resul. The pixel-warp mehod over-squeezes he waer wave region of he waerski sequence, leading o noiceable arifacs. In conras, our mehod incorporaes cropping ino he whole procedure and performs beer in preserving he original conen. B. Visual Qualiy Comparison We show he reargeing resuls of he proposed soluion wih hree sae-of-he-ar spaial domain mehods [2], [4], [5] in Figs. 11 13 for visual comparison. The performance comparison wih he seam carving mehod [5] is given in Fig. 11. The seam carving mehod resizes a video hrough coninuously removing seams. In some cases, i is incapable of preserving prominen edges. As shown in Fig. 11, while seam carving inroduces noiceable arifacs o he edge regions in boh sequences (ra and roadski), he resuls of our soluion conain fewer visual arifacs as he shape of prominen edges is beer preserved. We compare he performance of he proposed soluion wih ha of he pixel-warp reargeing mehod [2] in Fig. 12. Our mehod differs from he pixel-warp reargeing mehod in ha we have incorporaed cropping in he resizing procedure, hereby avoiding over-squeezing he original video conen. When he change in he aspec raio is significan, as shown in he example of Fig. 12, he mehod enirely based on warping [2] over-squeezes he relaive non-imporan video conen, leading o noiceable visual disorion. Finally, we compare he performance of our soluion wih he mehod proposed by Wang e al. [4], which is a sae-of-he-ar mehod wih opimized compuaional efficiency. Being similar o our mehod, he mehod in [4] incorporaes boh cropping and warping. For mos es sequences, our mehod achieves comparable performance as ha of [4]. One example is shown in Fig. 13. On he oher hand, since our mehod operaes direcly in he compressed domain, i has an advanage in erms of compuaional and memory cos saving. This will be analyzed in Sec. VII-D. C. Effeciveness of Moion Vecor Refinemen In he re-encoding sage, o avoid he compuaionally expensive procedure of moion search, we proposed a moion vecor refinemen scheme ha compues he new moion vecor of each reargeed macroblock using original moion vecors. We evaluae he effeciveness of his approach by comparing i wih full moion search in erms of encoding PSNR. In he experimenal seup, he reargeed sequence is encoded ino an H.264/AVC (baseline profile) bisream using he JM reference sofware [30]. All es sequences are encoded a a bi rae of 2 Mbps and a frame rae of 15 fps. The resuls of six differen es sequences are lised in Table II. As lised in Table II, he moion vecors generaed by our refinemen scheme offer comparable encoding PSNR values as

ZHANG e al.: COMPRESSED-DOMAIN VIDEO RETARGETING 807 Fig. 13. Performance comparison of he proposed soluion versus he approach by Wang e al. [4] for sequence big buck bunny, car and building. Fromlef o righ: he original video sequence, he resul of [4] and our resul. Our mehod achieves comparable resuls in erms of visual qualiy, ye i has a lower compuaional cos and memory consumpion. TABLE II PERFORMANCE COMPARISON OF RE-ENCODING USING THE PROPOSED MOTION VECTOR REFINEMENT APPROACH VERSUS FULL SEARCH ha of full search, which can be viewed as he upper bound. On he average, he PSNR value of he proposed scheme is abou 1 db lower han ha of he full search. For sequences wih less movemen such as big buck bunny and ra, he PSNR difference can be as low as 0.60 db. However, for sequences wih significan moving background (such as building), he esimaed moion vecors using our approach may become less reliable, leading o relaively larger difference (3.53 db difference for building sequence). In all, he proposed soluion achieves fas and accurae re-esimaion of he moion vecor for he oupu arge video while significanly reducing he complexiy of full search. D. Compuaional Complexiy Analysis In Table III, we furher compare he compuaional complexiy of he proposed soluion wih spaial domain video reargeing algorihms. The experimens were conduced on a segmen of big buck bunny sequence (158 frames, size: 672 384) encoded using he H.264/AVC baseline profile coding sandard. I should be noed ha he compuaional complexiy for each frame may be differen, as differen predicion modes are used for each frame. Table III shows he per-frame oal operaion cos averaged over all frames. In he encoding sage, he EPZS approach [31] is used for fas moion search. In he decoding sage, spaial-domain mehods demand he enire full decoding process, including inverse DCT, inverse quanizaion, moion compensaion and inra predicion. Our soluion operaes direcly in he compressed domain, hereby avoiding boh inverse DCT and inverse quanizaion, leading o 13.72% savings in he oal operaion cos (see Table III). For he encoding sage, wih he proposed moion vecor reesimaion scheme, our proposed sysem resuls in 30.17% and 99.92% savings in he oal operaion coss as compared wih he fas [31] and full moion search approach, respecively. In Table IV, we show he compuaion complexiy analysis for wo oher es sequences: roadski (99 frames, size: 540 280) and building (104 frames, size: 720 376). Experimenal resuls on hese wo sequences also demonsraes ha our proposed sysems leads o significan cos savings in boh encoding and decoding sage.

808 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014 TABLE III COMPLEXITYANALYSIS FOR RETARGETING THE BIG BUCK BUNNY SEQUENCE FOR DCT DOMAIN VERSUS THE SPATIAL DOMAIN TABLE IV COMPLEXITYANALYSIS FOR RETARGETING THE ROADSKI AND BUILDING SEQUENCE FOR DCT DOMAIN VERSUS THE SPATIAL DOMAIN Fig. 14. Pairwise comparison resuls of 56 user sudy paricipans, which show ha users have a preference on he visual qualiy of our soluion over he oher hree benchmarking mehods proposed in [2], [4], [5]. E. Subjec Visual Qualiy Tes Lasly, we repor he subjec es resuls on he visual qualiy of he proposed reargeing soluion via a user sudy conduced on 56 paricipans (27 female and 29 male, aged beween 22 and 54). The experimens were conduced in he ypical laboraory environmen. We used 6 differen videos in he experimen and reargeed each video o 50% widh using he mehod in [2], [4], [5] and our mehod. In he subjec es, we presened he oupu video sequences obained by wo reargeing mehods side-by-side o he observer, who is hen asked o choose he beer one among he wo. Among each pair, one is our own resul while he oher is he resul from one of he sae-of-he-ar spaial domain mehods in [2], [4], [5]. The enire user sudy consiss of 6 4 = 24 video pairs and we received 24 56 = 1344 answers overall. I ook on average 15-20 minues for each paricipan o complee he user sudy. To minimize user bias, we randomized he order of es pairs and hid all echnical deails from he paricipans. We show he resuls of our conduced user sudy in Fig. 14. Our resuls were favored in 61.1% (821 of 1344) of he comparisons wih Rubinsein e al. [5], in 59.3% (797 of 1344) of he comparisons wih Krahenbuhl e al. [2], and in 57.9% (778 of 1344) of he comparisons wih Wang e al. [4]. The sudy resuls show ha users have a sronger preference on he visual qualiy of our soluion over he oher hree benchmarking mehods. VIII. CONCLUSION In his paper, we proposed a pracical video reargeing sysem ha operaes direcly on DCT coefficiens and moion vecors in he compressed domain. This soluion avoids he compuaionally expensive process of de-compressing, processing, and recompression. As he sysem uses he DCT coefficiens direcly for processing, only parial decoding of video sreams is needed. The proposed soluion achieves comparable (or slighly beer) visual qualiy performance as ha of several sae-of-ar spaial domain video reargeing mehods, ye i significanly reduces he compuaional and sorage coss. Alhough he proposed sysem uses he laes H.264/AVC coding sandard as an example, he general mehodology is applicable o oher video coding sandards as well.

ZHANG e al.: COMPRESSED-DOMAIN VIDEO RETARGETING 809 This sudy can be exended along he following direcions: 1) The scene change deecion mehod in our proposed sysem, may no be robus enough for scenes ha conain fas camera moions and large moving foregrounds. More sophisicaed compressed-domain scene change deecion algorihms (such as [32]) can be adoped o accoun for more dynamic scenarios. 2) The column-mesh srucure in our proposed soluion only squeezes or sreches video conen along one direcion. A more sophisicaed mesh srucure (e.g. axis-aligned deformaion [33]) can be uilized o allow homogenous scaling of imporan objecs. 3) For moion vecor refinemen, we assigned zero moion vecor o inra-coded blocks. More reliable esimaion of moion vecors for inra blocks is also our fuure effor. REFERENCES [1] L. Wolf, M. Gumann, and D. Cohen-Or, Non-homogeneous conendriven video-reargeing, in Proc. 11h IEEE ICCV, Oc. 2007, pp. 1 6. [2] P. Krähenbühl, M. Lang, A. Hornung, and M. Gross, A sysem for reargeing of sreaming video, ACM Trans. Graph., vol. 28, no. 5, pp. 126:1 126:10, Dec. 2009. [3] Y.-S. Wang, H.-C. Lin, O. Sorkine, and T.-Y. Lee, Moion-based video reargeing wih opimized crop-and-warp, ACM Trans. Graph., vol. 29, no. 4, pp. 90:1 90:9, Jul. 2010. [4] Y.-S. Wang, J.-H. Hsiao, and T.-Y. Lee, Scalable and coheren video resizing wih per-frame opimizaion, ACM Trans. Graph., vol. 30, no. 4, pp. 88:1 88:8, Aug. 2011. [5] M. Rubinsein, A. Shamir, and S. Avidan, Improved seam carving for video reargeing, ACM Trans. Graph., vol. 27, no. 3, pp. 1 9, Aug. 2008. [6] M. Grundmann, V. Kwara, M. Han, and I. Essa, Disconinuous seam-carving for video reargeing, in Proc. IEEE CVPR, Jun. 2010, pp. 569 576. [7] Y.-S. Wang, H. Fu, O. Sorkine, T.-Y. Lee, and H.-P. Seidel, Moionaware emporal coherence for video resizing, ACM Trans. Graph., vol. 28, no. 5, pp. 127:1 127:10, 2009. [8] A. Shamir and O. Sorkine, Visual media reargeing, in Proc. ACM SIGGRAPH ASIA, New York, NY, USA, 2009, pp. 11:1 11:13. [9] L.-Q. Chen, X. Xie, X. Fan, W.-Y. Ma, H.-J. Zhang, and H.-Q. Zhou, A visual aenion model for adaping images on small displays, Mulimedia Sys., vol. 9, no. 4, pp. 353 364, 2003. [10] S. Avidan and A. Shamir, Seam carving for conen-aware image resizing, in ACM SIGGRAPH, New York, NY, USA, 2007, pp. 1 10. [11] M. Rubinsein, A. Shamir, and S. Avidan, Muli-operaor media reargeing, ACM Trans. Graph., vol. 28, no. 3, pp. 1 23, Jul. 2009. [12] Y.-S. Wang, C.-L. Tai, O. Sorkine, and T.-Y. Lee, Opimized scale-andsrech for image resizing, in ACM SIGGRAPH Asia, NewYork, NY, USA, 2008, pp. 1 8. [13] R. Dugad and N. Ahuja, A fas scheme for image size change in he compressed domain, IEEE Trans. Circuis Sys. Video Technol., vol. 11, no. 4, pp. 461 474, Apr. 2001. [14] H. Shu and L.-P. Chau, An efficien arbirary downsizing algorihm for video ranscoding, IEEE Trans. Circuis Sys. Video Technol., vol. 14, no. 6, pp. 887 891, Jun. 2004. [15] Y. Fang, Z. Chen, W. Lin, and C.-W. Lin, Saliency deecion in he compressed domain for adapive image reargeing, IEEE Trans. Image Process., vol. 21, no. 9, pp. 3888 3901, Sep. 2012. [16] H.-M. Nam, K.-Y. Byun, J.-Y. Jeong, K.-S. Choi, and S.-J. Ko, Low complexiy conen-aware video reargeing for mobile devices, IEEE Trans. Consum. Elecron., vol. 56, no. 1, pp. 182 189, Feb. 2010. [17] S. Porwal and J. Mukhopadhyay, A fas DCT domain based video downscaling sysem, in Proc. IEEE ICASSP, vol. 2. May 2006, pp. 1 2. [18] C. Chen, P.-H. Wu, and H. Chen, Transform-domain inra predicion for H.264, in Proc. IEEE ISCAS, vol. 2. May 2005, pp. 1497 1500. [19] X. Hou and L. Zhang, Saliency deecion: A specral residual approach, in Proc. IEEE CVPR, Jun. 2007, pp. 1 8. [20] L. Ii, C. Koch, and E. Niebur, A model of saliency-based visual aenion for rapid scene analysis, IEEE Trans. Paern Anal. Mach. Inell., vol. 20, no. 11, pp. 1254 1259, Nov. 1998. [21] Y. Fang, W. Lin, Z. Chen, C.-M. Tsai, and C.-W. Lin, Video saliency deecion in he compressed domain, in Proc. 20h ACM In. Conf. Mulimedia, 2012, pp. 697 700. [22] J. Zhang and C.-C. Kuo, Region-adapive exure-aware image resizing, in Proc. IEEE ICASSP, Mar. 2012, pp. 837 840. [23] (2013). Video Trace Library [Online]. Available: hp://race. eas.asu.edu/yuv/ [24] R. Wang, H.-J. Zhang, and Y.-Q. Zhang, A confidence measure based moving objec exracion sysem buil for compressed domain, in Proc. IEEE ISCAS, vol. 5. May 2000, pp. 21 24. [25] D. G. Lowe, Disincive image feaures from scale-invarian keypoins, In. J. Compu. Vis., vol. 60, no. 2, pp. 91 110, Nov. 2004. [26] M. Werlberger, W. Trobin, T. Pock, A. Wedel, D. Cremers, and H. Bischof, Anisoropic huber-l 1 opical flow, in Proc. BMVC, Sep. 2009, pp. 1 11. [27] R. Wang and T. Huang, Fas camera moion analysis in MPEG domain, in Proc. ICIP, vol. 3. 1999, pp. 691 694. [28] M. Hashemi, L. Winger, and S. Panchanahan, Macroblock ype selecion for compressed domain down-sampling of MPEG video, in Proc. IEEE Can. Conf. Elecr. Compu. Eng., vol. 1. May 1999, pp. 35 38. [29] B. Shen, I. Sehi, and B. Vasudev, Adapive moion-vecor resampling for compressed video downscaling, IEEE Trans. Circuis Sys. Video Technol., vol. 9, no. 6, pp. 929 936, Sep. 1999. [30] (2013). H.264/AVC JM Reference Sofware [Online]. Available: hp:// iphome.hhi.de/suehring/ml/ [31] A. M. Tourapis, Enhanced predicive zonal search for single and muliple frame moion esimaion, Proc. SPIE, vol. 4671, pp. 1069 1079, Jan. 2002. [32] J. Meng, Y. Juan, and S.-F. Chang, Scene change deecion in an MPEG-compressed video sequence, Proc. SPIE, vol. 2419, pp. 14 25, Apr. 1995. [33] D. Panozzo, O. Weber, and O. Sorkine, Robus image reargeing via axis-aligned deformaion, Compu. Graph. Forum, vol. 31, no. 2, pp. 229 236, Jul. 2012. Jiangyang Zhang received he B.S. degree in elecommunicaions engineering from Zhejiang Universiy, China, in 2008. He is currenly pursuing he Ph.D. degree wih he Universiy of Souhern California, Los Angeles. He joined he Media Communicaions Laboraory in 2010. His main research ineres lies in image processing. Shangwen Li received he B.S. and M.S. degrees in elecrical engineering from Zhejiang Universiy, Hangzhou, China, in 2008 and 2011, respecively. Since 2012, he has been wih he Media Communicaions Laboraory, Universiy of Souhern California, Los Angeles. His research ineress include compuer vision, machine learning, and video coding. C.-C. Jay Kuo (F 99) received he B.S. degree from Naional Taiwan Universiy, Taipei, Taiwan, in 1980, and he M.S. and Ph.D. degrees from he Massachuses Insiue of Technology, Cambridge, in 1985 and 1987, respecively, all in elecrical engineering. He is currenly he Direcor of he Media Communicaions Laboraory and a Professor of elecrical engineering, compuer science, and mahemaics wih he Universiy of Souhern California, Los Angeles, and he Presiden of he Asia-Pacific Signal and Informaion Processing Associaion. His curren research ineress include digial image/video analysis, mulimedia daa compression and informaion forensics, and securiy. He is he co-auhor of over 210 journal papers, 850 conference papers, and en books. He is a fellow of he American Associaion for he Advancemen of Science and he Inernaional Sociey for Opical Engineers.