The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

Similar documents
LOW-COMPLEXITY VIDEO ENCODER FOR SMART EYES BASED ON UNDERDETERMINED BLIND SIGNAL SEPARATION

Hybrid Transcoding for QoS Adaptive Video-on-Demand Services

Fast Intra-Prediction Mode Decision in H.264/AVC Based on Macroblock Properties

Error Concealment Aware Rate Shaping for Wireless Video Transport 1

Instructions for Contributors to the International Journal of Microwave and Wireless Technologies

Cost-Aware Fronthaul Rate Allocation to Maximize Benefit of Multi-User Reception in C-RAN

Following a musical performance from a partially specified score.

A Comparative Analysis of Disk Scheduling Policies

Simon Sheu Computer Science National Tsing Hua Universtity Taiwan, ROC

QUICK START GUIDE v0.98

Improving Reliability and Energy Efficiency of Disk Systems via Utilization Control

Optimized PMU placement by combining topological approach and system dynamics aspects

Decision Support by Interval SMART/SWING Incorporating. Imprecision into SMART and SWING Methods

Analysis of Subscription Demand for Pay-TV

A Scalable HDD Video Recording Solution Using A Real-time File System

Accepted Manuscript. An improved artificial bee colony algorithm for flexible job-shop scheduling problem with fuzzy processing time

Why Take Notes? Use the Whiteboard Capture System

Novel Quantization Strategies for Linear Prediction with Guarantees

Statistics AGAIN? Descriptives

A Quantization-Friendly Separable Convolution for MobileNets

System of Automatic Chinese Webpage Summarization Based on The Random Walk Algorithm of Dynamic Programming

Technical Information

Critical Path Reduction of Distributed Arithmetic Based FIR Filter

Anchor Box Optimization for Object Detection

tj tj D... '4,... ::=~--lj c;;j _ ASPA: Automatic speech-pause analyzer* t> ,. "",. : : :::: :1'NTmAC' I

Quantization of Three-Bit Logic for LDPC Decoding

Study on the location of building evacuation indicators based on eye tracking

Simple VBR Harmonic Broadcasting (SVHB)

Scalable QoS-Aware Disk-Scheduling

Reduce Distillation Column Cost by Hybrid Particle Swarm and Ant

The Traffic Image Is Dehazed Based on the Multi Scale Retinex Algorithm and Implementation in FPGA Cui Zhe1, a, Chao Li2, b *, Jiaqi Meng3, c

AN INTERACTIVE APPROACH FOR MULTI-CRITERIA SORTING PROBLEMS

Integration of Internet of Thing Technology in Digital Energy Network with Dispersed Generation

Conettix D6600/D6100IPv6 Communications Receiver/Gateway Quick Start

Failure Rate Analysis of Power Circuit Breaker in High Voltage Substation

Correcting Image Placement Errors Using Registration Control (RegC ) Technology In The Photomask Periphery

TRADE-OFF ANALYSIS TOOL FOR INTERACTIVE NONLINEAR MULTIOBJECTIVE OPTIMIZATION Petri Eskelinen 1, Kaisa Miettinen 2

AMP-LATCH* Ultra Novo mm [.025 in.] Ribbon Cable 02 MAR 12 Rev C

AIAA Optimal Sampling Techniques for Zone- Based Probabilistic Fatigue Life Prediction

MODELING AND ANALYZING THE VOCAL TRACT UNDER NORMAL AND STRESSFUL TALKING CONDITIONS

Modeling Form for On-line Following of Musical Performances

User s manual. Digital control relay SVA

THE IMPORTANCE OF ARM-SWING DURING FORWARD DIVE AND REVERSE DIVE ON SPRINGBOARD

A STUDY OF TRUMPET ENVELOPES

current activity shows on the top right corner in green. The steps appear in yellow

3 Part differentiation, 20 parameters, 3 histograms Up to patient results (including histograms) can be stored

Detecting Errors in Blood-Gas Measurement by Analysiswith Two Instruments

Product Information. Manual change system HWS

Color Monitor. L200p. English. User s Guide

RIAM Local Centre Woodwind, Brass & Percussion Syllabus

Clock Synchronization in Satellite, Terrestrial and IP Set-top Box for Digital Television

Product Information. Manual change system HWS

SONG STRUCTURE IDENTIFICATION OF JAVANESE GAMELAN MUSIC BASED ON ANALYSIS OF PERIODICITY DISTRIBUTION

Modular Plug Connectors (Standard and Small Conductor)

SCALABLE video coding (SVC) is currently being developed

FPGA Implementation of Cellular Automata Based Stream Cipher: YUGAM-128

SKEW DETECTION AND COMPENSATION FOR INTERNET AUDIO APPLICATIONS. Orion Hodson, Colin Perkins, and Vicky Hardman

arxiv: v1 [cs.cl] 12 Sep 2018

T541 Flat Panel Monitor User Guide ENGLISH

Small Area Co-Modeling of Point Estimates and Their Variances for Domains in the Current Employment Statistics Survey

Simple Solution for Designing the Piecewise Linear Scalar Companding Quantizer for Gaussian Source

Lost on the Web: Does Web Distribution Stimulate or Depress Television Viewing?

Loewe bild 5.55 oled. Modular Design Flexible configuration with individual components. Set-up options. TV Monitor

Multi-Line Acquisition With Minimum Variance Beamforming in Medical Ultrasound Imaging

Loewe bild 7.65 OLED. Set-up options. Loewe bild 7 cover Incl. Back cover. Loewe bild 7 cover kit Incl. Back cover and Speaker cover

Environmental Reviews. Cause-effect analysis for sustainable development policy

INSTRUCTION MANUAL FOR THE INSTALLATION, USE AND MAINTENANCE OF THE REGULATOR GENIUS POWER COMBI

Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Product Information. Miniature rotary unit ERD

Production of Natural Penicillins by Strains of Penicillium chrysogenutn

Reduced complexity MPEG2 video post-processing for HD display

Craig Webre, Sheriff Personnel Division/Law Enforcement Complex 1300 Lynn Street Thibodaux, Louisiana 70301

CASH TRANSFER PROGRAMS WITH INCOME MULTIPLIERS: PROCAMPO IN MEXICO

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

Product Bulletin 40C 40C-10R 40C-20R 40C-114R. Product Description For Solvent, Eco-Solvent, UV and Latex Inkjet and Screen Printing 3-mil vinyl films

(12) Ulllted States Patent (10) Patent N0.: US 8,269,970 B2 P0lid0r et a]. (45) Date of Patent: Sep. 18, 2012

Academic Standards and Calendar Committee Report # : Proposed Academic Calendars , and

S Micro--Strip Tool in. S Combination Strip Tool ( ) S Cable Holder Assembly (Used only

WITH the rapid development of high-fidelity video services

User Manual. AV Router. High quality VGA RGBHV matrix that distributes signals directly. Controlled via computer.

SKIP Prediction for Fast Rate Distortion Optimization in H.264

Popularity-Aware Rate Allocation in Multi-View Video

Error Resilient Video Coding Using Unequally Protected Key Pictures

Sealed Circular LC Connector System Plug

Product Information. Universal swivel units SRU-plus

FMO-based H.264 frame layer rate control for low bit rate video transmission

Adaptive Key Frame Selection for Efficient Video Coding

Key Techniques of Bit Rate Reduction for H.264 Streams

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

INTERCOM SMART VIDEO DOORBELL. Installation & Configuration Guide

Design of a Fast Multi-Reference Frame Integer Motion Estimator for H.264/AVC

GENERAL AGREEMENT ON MMra

Bit Rate Control for Video Transmission Over Wireless Networks

Rate-Distortion Analysis for H.264/AVC Video Coding and its Application to Rate Control

FINE granular scalable (FGS) video coding has emerged

Overview: Video Coding Standards

ELEGT110111C. Servicing & Technology November Pick and place and holding fixtures. Whatever happened to if transformers

Transcription:

Provded by the author(s) and Unversty College Dubln Lbrary n accordance wth publsher polces., Please cte the publshed verson when avalable. tle Dynamc Complexty Scalng for Real-me H.264/AVC Vdeo Encodng Authors(s) Ivanov, Yur; Bleakley, Chrs J. Publcaton date 2007-09-28 Conference detals MULIMEDIA '07, Proceedngs of the 15th Internatonal Conference on Multmeda, Augsburg, Germany, 23-28 September, 2007 Publsher Assocaton for Computng Machnery Item record/more nformaton http://hdl.handle.net/10197/7104 Publsher's statement 2007 ACM. hs s the author's verson of the work. It s posted here by permsson of ACM for your personal use. t for redstrbuton. he defntve verson was publshed n Proceedngs of the 15th Internatonal Conference on Multmeda, {VOL#, ISS#, (2007)} http://dx.do.org/10.1145/1291233.1291444. Publsher's verson (DOI) 10.1145/1291233.1291444 Downloaded 2019-02-2622:01:23Z he UCD communty has made ths artcle openly avalable. Please share how ths access benefts you. Your story matters! (@ucd_oa) Some rghts reserved. For more nformaton, please see the tem record lnk above.

Dynamc Complexty Scalng for Real-me H.264/AVC Vdeo Encodng Yur V. Ivanov School of Computer Scence and Informatcs Unversty College Dubln Belfeld, Dubln 4, Ireland +(353) 1 716 29 15 yury.vanov@ucd.e C. J. Bleakley School of Computer Scence and Informatcs Unversty College Dubln Belfeld, Dubln 4, Ireland +(353) 1 716 29 15 chrs.bleakley@ucd.e ABSRAC he H.264 vdeo encodng standard can acheve hgh codng effcency at the expense of hgh computatonal complexty. ypcally, real-tme software mplementaton requres omsson of most optonal encodng tools leadng to sgnfcantly reduced codng effcency. hs paper proposes a novel method for realtme H.264 encodng based on dynamc control of the encodng parameters to meet real-tme constrants whle mnmzng codng effcency loss. Expermental results show that the method provdes up to 19% lower bt rate than conventonal real-tme encodng usng fxed parameters wth the same vsual qualty. he method allows real-tme 30fps QCIF encodng on a Pentum IV wth smlar codng effcency to full search baselne profle encodng. Categores and Subject Descrptors C.3 [Specal-Purpose and Applcaton-Based Systems]: realtme and embedded systems. General erms Algorthms, Desgn. Keywords H.264, fast mode decson, complexty scalng, real-tme vdeo encodng. 1. INRODUCION he H.264 standard [9] developed by the Jont Vdeo eam (JV) provdes better codng effcency than MPEG-4 and H.263 at low bt rates [17] but at the cost of sgnfcantly ncreased computatonal complexty. hs makes t dffcult to use software mplementatons of the encoder n practcal real-tme multmeda applcatons. Vdeophone conferencng, for example, requres every frame be encoded wthn 1/30 of a second to mantan a Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. o copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. MM 07, September 23 28, 2007, Augsburg, Bavara, Germany. Copyrght 2007 ACM 978-1-59593-701-8/07/0009...$5.00. hgh qualty frame rate and to provde low end-to-end delay. A large number of algorthms have been proposed by researchers to reduce H.264 computatonal complexty. hese algorthms are focused on new methods to reduce the complexty of the most computatonally complex components of the vdeo encoder,.e., Moton Estmaton (ME), Mode Decson (MD) and Dscrete Cosne ransform (DC) codng. In almost all cases, these algorthms are amed at reducng total encodng tme, rather than n meetng real-tme constrants. As n other vdeo codng standards, H.264/AVC explots the spatal, temporal and statstcal redundances of the source vdeo. Snce the amount of redundancy vares between frames, the computatonal complexty of, for example, mode decson can vary sgnfcantly between consequent frames. hs effect s partcularly mportant when usng fast encodng schemes n whch MD and ME search are termnated early. Utlzaton of encodng tools such as Varable Block Szes n MD, B-frames and multple reference frames n ME can further ncrease varaton n the processor workload and therefore create much greater varaton n frame encodng complexty. o the authors knowledge, few papers consder the problem of dynamc complexty scalng n real-tme H.264 vdeo encodng, ncludng utlzaton of frame buffers and mantanng constant processor workload. In contrast, most software solutons smply reduce the MD and ME search sze globally, so as to meet realtme constrants n the worst case. hs leads to sgnfcant reducton n codng effcency. he problem of real-tme vdeo encodng wth dynamcally varyng computatonal complexty has a lot of smlartes wth the real-tme rate-control problem [8]. hat s, mantanng constant frame complexty n real-tme encodng s smlar to mantanng constant bt rate. hus, ths paper nvestgates the use of rate-control (RC) technques for complexty control n realtme vdeo encodng. he mode decson algorthm wth dynamcally varable complexty, proposed n ths paper, s based on a MD class fast search algorthm and operates n real-tme on an average Pentum IV PC. he technque for dynamc complexty scalng, proposed n the paper, s general and can be appled across a range of smlar algorthms, such as [3]. he paper s organzed as follows. Secton 2 revews related work n the feld. Secton 3 provdes an analyss of frame complexty predcton methods and proposes a complexty predcton model. he proposed MD algorthm wth dynamc complexty scalng s descrbed n Secton 4. Expermental results are presented n

Secton 5 and dscussed n Secton 6. Fnally, Secton 7 concludes the paper. 2. RELAED WORK Snce the H.264 standard was adopted and ts complexty was studed and analyzed [5][17], varous methods had been proposed to reduce ts computatonal complexty. In H.264, the most computatonally complex components are Mode Decson and Moton Estmaton [7]. Generally speakng, work done by researchers n the feld can be dvded nto two categores. Algorthms n the frst category, such as [3][6][11-14][16] allow for complexty reducton of the total executon tme of H.264. Some of these methods allow for several fxed complexty settngs. he conventonal mode decson technque can be sgnfcantly mproved by so-called early termnaton (E) and forward SKIP predcton technques [6][11][13][14]. hese technques assume that some block modes can be elmnated from the mode search wthout loss and that correct SKIP decsons may be made at the start of the MD process. he key to the success of these technques s utlzaton of fast and effcent decson metrcs. Further Mode Decson complexty reducton nvolves more sophstcated approaches, such as [3][12], whereby all MBs n the frame are classfed accordng to certan features of the vdeo sgnal. Dfferent MD search parameters are used for each class. For example, n [3], the authors propose Mode Group (MGs) classfcaton that utlzes overlapped MGs based on a measure of the resdual error wth predefned emprcal thresholds. he method provdes 52% total complexty reducton, but only for P frames wth Rate-Dstorton Optmzaton (RDO) on. A featurebased approach wth a rsk mnmzng Mode Decson was proposed n [12], but t s less effectve only reducng total encodng tme by 20-30%. In the second category, there are algorthms that are practcal mplementatons of H.264 on partcular hardware platforms, such as [4][15][18]. Unlke algorthms n the prevous category, they are desgned to operate under real-tme condtons. he authors clam that they can provde a real-tme vdeo encodng on the selected hardware platform. For example, [18] dscusses a realtme H.264 mplementaton on a MS320C6416 DSP. he authors do not concentrate only on the Mode Decson technque (or any other technque) by tself. Instead, they optmze dfferent parts of the H.264 encoder to acheve the realtme goal. In [15], for nstance, several technques such as Intra predcton optmzaton, SAD optmzaton for Moton Estmaton and DC optmzaton are proposed. he resultant encoder was tested for ts performance on the Pocket PCs and Smart phones. hese papers manly dscuss hardware-related ssues of partcular H.264 algorthm mplementatons (such as code optmzng technques) and do not provde a deep dscusson about how the proposed technques can be utlzed on other hardware platforms, f they can be utlzed at all. Fnally, none of these papers nvestgate the dynamc complexty scalng problem,.e. mantanng defned complexty when the processor s workload vares, or desgn of a real-tme encoder workng on a PC. We beleve that the soluton can be found among exstng Rate Control technques snce the RC problem has many smlartes wth dynamc complexty scalng. 3. ANALYSIS Rate control algorthms allocate a bt rate quota for encodng each vdeo frame. he quota must be met wth mnmum loss n encodng effcency. In the computatonal complexty control case, the problem can be formulated as: mn( R),mn( D) such that < actual quota (1) where R s bt rate gan, D s dstorton, actual s the actual encodng tme and quota s the tme quota, whch depends on frame rate. As for RC algorthms [8], complexty scalng can be appled to the frame layer or the macroblock layer. At the frame layer, complexty scalng s appled to all MBs n a gven frame n the same way. hus, a sngle complexty settng s used for the whole frame. At the macroblock layer, complexty scalng s appled to each MB n the frame ndvdually and there s a complexty settng for each MB. he choce of layer depends on the varaton n complexty between dfferent MBs when encoded usng the same parameters. 3.1 Methods of Complexty Predcton Current studes of RC algorthms n [8] show that solutons can be dvded nto two categores: those that operate wthout a buffer, and those that use a scene-content complexty estmaton approach and requre a buffer. Based on ths, we propose the followng schemes for dynamc complexty control. In the frst category, we consder fxed worst-case frame schedulng and truncated tme schedulng. In fxed worst-case schedulng the tme quota WC quota s the maxmum frame encodng tme as measured for the most computatonally complex frame (e.g. n a hgh moton vdeo sequence): WC quota WC frame_lmt = (2) he encodng parameters n worst-case schedulng are fxed such that the actual encodng tme actual s less than WC quota n all cases. he unused processng tme can be calculated as: N WC WC unused = = ( ) (3) 0 where N s the total number of frames n the vdeo sequence and s the actual processng tme for the partcular frame. hs approach has a large dsadvantage: snce the actual encodng tme actual depends on the temporal-spatal complexty of the source vdeo materal, t s mpossble to take advantage of the unused processng tme for vdeo frames requrng less than maxmum computatonal complexty. runcated tme schedulng assumes that the encodng parameters can be adjusted dynamcally to ensure that the encodng tme actual s less than quota. All MBs n the frame are processed wth fxed encodng complexty settngs as n worst case schedulng, quota

but f the quota s exceeded then the rest of the MBs are skpped. he advantage of ths approach s that t allows a reducton n the RUNC WC unused processng tme compared to worst case: < unused unused. hus, hgher encodng settngs than n worst-case schedulng may be used, leadng to greater average codng effcency. he dsadvantage s that the number of skpped MBs n some frames leads to hgh vsual qualty degradaton for fast changng frames. In the second category, we propose a schedulng scheme based on scene complexty estmaton. In ths case the MB encodng parameters are adjusted such that the predcted encodng tme s on average equal to the tme quota. he encodng tme s predcted based on an estmate of scene complexty: = f ( C frame ) (4) predcted where predcted s the predcted frame encodng tme (or predcted actual) and Cframe s the estmated scene frame complexty. A study of prevous RC algorthms [8] reveals that scene complexty can be estmated n a number of ways, e.g., by frame energy, number of allocated bts and by utlzaton of vsual metrcs lke PSNR, MAD or SAD. he functon f can be calculated adaptvely, based on the Cframe, predcted and actual obtaned for the prevous frames. hs scene complexty estmaton approach s combned wth a complexty control scheme that adjusts complexty dynamcally. hs ensures that actual quota and mnmzes unused processng tme. he dsadvantage here s that large varatons n Cframe produce large varatons n predcted and may eventually result n the predcted tme exceedng the tme quota. A frame buffer allows for errors n predcton excess codng tme n one frame s smply subtracted from the quota n the next. he buffer sze must be small for low delay applcatons, where a hgher frame rate s desrable (.e. 30 fps) and can be larger for applcatons that allow greater delay [8]. Small buffer szes may result n buffer overflow for large errors n predcted. In order to avod ths, an effcent MB complexty scalng scheme s requred n addton to accurate scene complexty predcton. It can be concluded that of all of these schemes, scene complexty estmaton s the most advantageous as t provdes maxmum flexblty n mnmzng codng effcency loss. Hence, scene complexty estmaton s appled n ths work and compared wth the worst case schedulng. 3.2 Proposed Model Experments wth varous vdeo sequences ndcate that when a fast Mode Decson algorthm s used, the encodng tme for each MB vares greatly. In fact, frame encodng tme can be determned from the dstrbuton of Mode Decson classes (or MD classes) across all MBs n the frame. hus, the predcted encodng tme for the current frame FastMD_predcted can be estmated as: = 5 = 1 = n t +, t = const (5) FastMD _ predcted where n s the number of MBs that belong to MD class and t s the average encodng tme for that MD class. For ths model t 0 can be determned expermentally across several hgh and low moton QCIF vdeo sequences. For fast Mode Decson algorthms that employ early termnaton schemes such as [6], t calculated per MB for the same MD class can vary greatly, dependng on the stage of the MD process at whch early termnaton occurred. o extend the complexty predcton model to these cases, a scheme for adaptve calculaton of t s proposed. he adaptve scheme nvolves storng the encodng tme statstcs for prevously encoded frames and re-calculatng t for all MD classes every Nth frame. he followng methods of updatng t were assessed expermentally for the proposed model as n Eq. (5) wth N = 5: 1. Use the mean tme for each MD class, calculated across all MBs n the prevous frames: t mean,,..., } (6) = { 0 1 N 1 hs method s the smplest and most straght-forward soluton. 2. Use mean tme for MBs that dd not have ther class changed as a part of a scalng algorthm (.e. were not demoted, see below): t mean,..., } (7) = { not _ demoted _ 0 not _ demoted _ N 1 As demoton happens for MBs wth the lowest prevous J (see Eq. (9)) wthn a class and snce lower J values ndcate hgher scene complexty, then demoted MBs are harder to encode. 3. Use the maxmum tme measured for each MD class calculated across all N frames for MBs that were not early termnated: t = max{ note _ 0,..., note _ N 1} (8) Snce early termnaton sgnfcantly reduces MD tme, calculaton of FastMD_predcted excludes early termnated MBs n order to provde a consstent estmate. he expermental results are provded n able 1. From the expermental results, t can be clearly seen that Maxmum t s very naccurate. he other two methods provde accurate predcton wth an average dfference between predcted and actual tmes of about 7%. Snce the mean not demoted method has a hgher Pearson correlaton coeffcent, t s deemed to provde the most accurate predcton for fast MD wth early termnaton, so the fnal frame complexty predcton model chosen adopts Eq. (5) wth adaptve t calculaton as n Eq. (7). Once frame complexty s predcted, macroblock layer complexty scalng can be performed. he goal s the adjustment of computatonal complexty for each MB n the frame n order that the tme quota s met and codng effcency degradaton s mnmzed.

able 1. Estmaton of adaptve frame predcton models Vdeo sequence Carphone, QCIF Hall, QCIF Akyo, QCIF Method Predcton error, % mn max avg. Pearson correlaton between predcted and actual MD tme 1 0.02 33.04 6.95 0.751 2 0.13 33.75 7.73 0.771 3 8.21 85.61 23.35 0.496 1 0.02 46.70 3.95 0.949 2 0.21 43.90 5.46 0.967 3 4.18 88.62 22.94 0.625 1 0.01 28.8 2.31 0.969 2 0.02 29.13 3.42 0.983 3 2.67 66.93 14.41 0.768 Snce reducng complexty effects both bt rate and dstorton, the need arses to unfy both quanttes nto a sngle metrc whch s representatve of the overall codng effcency. At present, the rate-dstorton model s adopted n H.264 Mode Decson [9] for makng optmal decsons where both bt rate and dstorton are mportant: mn { J }, where J = D + λ R (9) where D s a dstorton measure (usually Sum of Absolute Dfferences) and R represents bt rate. Durng vdeo encodng, the Lagrange rate-dstorton functon J s mnmzed for a partcular value of the Lagrange multpler, λ. Based on ths, we ntroduce a codng effcency metrc, W, whch s dependent on vsual qualty loss, D, and bt rate ncrease, R, relatve to that acheved by the full complexty encoder. We defne W as: W = R + µ D (10) where R s a percentage, D s PSNR n db and µ s a constant relatng bt rate loss and dstorton ncrease. hus, for any gven computaton complexty pont C, the most effcent encoder can be dentfed as the one provdng mnmum W. he constant µ can be nterpreted as the percentage ncrease n bt rate equvalent to a 1 db loss n PSNR. Prevous work [1] determned that, for the frame szes under nvestgaton, a 10% decrease n bt rate s roughly equvalent to a loss of 0.5 db n PSNR. In our work, µ was determned expermentally [7] and was set to 13. Assumng, that the encoder parameter confguratons form a dscrete set, the problem of selectng the optmal encoder confguraton for any gven complexty pont can be solved by Pareto analyss [2]. In ths work, the effcency of the encoder was assessed across a range of parameter confguratons. hese results were a projected on to a graph relatng codng effcency to computatonal complexty. he optmum encoder parameters were then dentfed as those ponts (C, W ) whch form the Convex Hull of the Indvdual Mnma (CHIM) on the Pareto curve, shown as the blue lne n Fgure 1. Parameter confguratons correspondng to ponts nsde the CHIM are suboptmal. In ths work, the MD classes only contan parameter confguratons whch are on the CHIM. he low complexty encodng algorthm consdered n ths work allocates an MD class to each MB based on an analyss of the MB s statstcal propertes. MBs lkely to have low moton are allocated to an MD class wth a narrow search range n ME. hs reduces encodng tme wthout loss of codng effcency. MD class allocaton based on MB statstcs s used for ntal frame encodng tme predcton. If the predcton exceeds the quota, then some macroblocks must have ther MD classes demoted to reduce total encodng tme. In contrast, f the predcton tme s less than the quota, then some MBs may have ther MD classes promoted to mprove codng effcency. Scalng complexty down for a partcular MB s referred to as MD class demoton. For example, a macroblock ntally allocated to class B can be demoted to class C. hs reduces computatonal complexty by C and reduces codng effcency by W, as defned n Eq. (10). For a gven reducton n computatonal complexty of C, t s clear that the best demoton class s on the convex hull of the Pareto curve. W-metrc 17.5 15 12.5 10 7.5 5 2.5 W Class C C Class A 0 30 40 50 60 70 80 90 100 Complexty, C (%) Class B Fgure 1. MB demoton example Pareto curve W It can be observed n Fgure 1 that the gradent of the C convex hull of the Pareto curve s lower between more W W AB BC computatonal complex MD classes (.e. < ). C C hat s, there s less codng effcency loss for a gven reducton n computatonal complexty. Hence, demoton should start wth macroblocks that are n the more computatonally complex MD classes. In some cases, t may not be necessary to demote all MBs wthn a gven class. Snce demoton generally leads to a bt rate and dstorton ncrease, demoton should start wth MBs that have lowest prevous J wthn the class. In ths way, the most effectvely coded MBs are demoted frst, mnmzng codng effcency loss. AB BC

Scalng complexty up for a partcular MB s referred to as MD class promoton. For example, a macroblock ntally allocated to class C can be promoted to class B. hs may mprove MB codng effcency, but also ncreases computatonal complexty. Promoton should start wth the macroblocks that are allocated to less computatonally complex MD classes,.e. class E or D, and contnues untl the hghest complexty class s reached (class A). Snce promoton generally reduces bt rate and dstorton, t should start wth the MBs that have hghest prevous J wthn the gven class, thus the less effectvely coded MBs are promoted frst. he proposed frame complexty estmaton model was tested and optmzed for IPPP GOP structure. For B-frames the average encodng tme t for each MD class s generally hgher than for P-frames due to ncreased computatonal complexty. o overcome ths, t s proposed to utlze dfferent sets of t values P P P B B B for P- and B-frames,.e. { t, t... t } 0 1 and { t, t... t } 0 1. P Alternatvely, a scalng coeffcent γ can be appled for t values when dealng wth B-frames. he partcular value for γ s determned expermentally. Begn Calculate FD FD < 8 Jprev 1 Calculate SAD8x8, JSKIP_predcted JSKIP_predcted < Jmean SAD8x8 > 2 Assgn MB class III Assgn MB class I Assgn MB class II Assgn MB class V Calculate SADfull, JSKIP_P, JSKIP_B Assgn MB class IV 4. ALGORIHM he fast Mode Decson algorthm used n these experments s shown n Fgure 2. It conssts of two parts: MD class selecton and fast Mode Decson wth Early ermnaton. he MD class selecton algorthm chooses the macroblock class based on three metrcs: FD [16], Jprev and SAD8x8 [6]. he Mode Decson algorthm wth Early ermnaton utlzes J values from the prevous frame (.e. Jprev) n order to omt unnecessarly MD computatons [6]. As s typcal of most fast MD algorthms, algorthm s computatonal complexty s dependent on frame content. In order to acheve good codng effcency, frames wth hgh moton typcally requre more processng than frames wth low moton. he algorthm for real-tme dynamc complexty scalng n the H.264 encoder s shown n Fgure 3. he algorthm uses a oneframe buffer. he tme quota quota s re-calculated after each frame s processed as: quota = + mn( N, ) (11) frame_lmt frame_lmt total frame_lmt In order to synchronze the buffer, the last frame encodng tme last_frame cannot exceed the frame encodng tme lmt frame_lmt. he value of frame_lmt s selected n order to allow real-tme encodng at the desred frame rate. 5. EXPERIMENAL RESULS he MD algorthm for real-tme complexty scalng was mplemented n the JM [10] reference encoder and expermentally tested. assembly language or other manual optmzatons were appled. Several QCIF vdeo sequences of 300 frames each were encoded wth an IPPP GOP structure. Reference encodng used all seven VBS modes, CABAC entropy coder and RDO off. Set complexty settngs accordng to class of MB Perform MD wth early termnaton Last MB? End Fgure 2. Fast MD algorthm wth MD classes In the experments, QP was set to 28. he algorthm was tested under condtons, where the value of frame_lmt was set to allow real-tme encodng at 15, 20 and 30fps on a reference 3GHz Pentum IV PC wth 1GB RAM. he obtaned results for bt rate gan, qualty degradaton and complexty reducton versus nonreal tme JM runnng at full search baselne profle on the same PC are shown n ables 2 4. he mnus ( ) sgn denotes mprovement for the new method. Usng Eqs. (12) (14) we calculated the percentage of unused tme unused, predcton error error and trm tme trm: trm,% = (12) error unused n = 300 n= 1 n = 300 actual _ n 1 n= 1 quota _ n FastMD _ predcted _ n actual _ n,% = (13) n = 300 n= 1 total FastMD _ predcted _ wthout _ demoton _ n actual _ n,% = (14) total

Begn For the proposed algorthm runnng on the reference PC at 30fps, the results obtaned nclude the Pearson correlaton between predcted and actual tme. Assgn MD class to every MB of the frame All results are gven n ables 2 5. able 2. Bt rate, % for the proposed method vs. ref. JM Usng Eq.(5) calculate FastMD_predcted Vdeo sequence Frame rate 15fps 20fps 30fps FastMD_predcted > quota Select MBs wth hghest MD class Hghest MD class s SKIP? Select MBs wth lowest MD class Hghest MD class s class A? Among selected MBs choose MB wth lowest J Carphone 0.05 1.04 15.25 able tenns 2.48 3.85 11.16 Coastguard 0.26 0.87 6.48 News 0.84 0.76 5.90 Salesman -0.5-1.50 5.60 Grandmother -1.23-3.08-1.82 Mother -0.7-1.00 1.68 Hall 0.03-1.73-4.41 Akyo -1.15-1.80-1.85 mean 0.01-0.28 4.22 able 3. PSNR, db for the proposed method vs. ref. JM Among selected MBs choose MB wth lowest J Promote chosen MB Vdeo sequence Frame rate 15fps 20fps 30fps Demote chosen MB Usng Eq.(5) calculate FastMD_predcted Usng Eq.(5) calculate FastMD_predcted FastMD_predcted > quota Carphone 0.11 0.28 0.49 able tenns 0.24 0.24 0.46 Coastguard 0.06 0.08 0.13 News 0.09 0.12 0.38 Salesman 0.03 0.07 0.24 Grandmother 0.02 0.19 0.18 Mother 0.06 0.17 0.32 Hall 0.01 0.03 0.20 Akyo 0.01 0.02 0.16 FastMD_predcted < quota Demote last promoted MB mean 0.07 0.13 0.28 able 4. otal encodng tme reducton, % for the proposed method vs. JM Perform Fast MD Measure actual quota = frame_lmt + last_frame Vdeo sequence Frame rate total = total + act ual last_frame=nframeframe_lmt total last_frame>frame_lmt last_frame = frame_lmt Encoded frame s Nth frame? Update t as mean not demoted End 15fps 20fps 30fps Carphone 43.30 52.42 73.27 able tenns 47.20 52.10 72.95 Coastguard 46.73 54.64 73.10 News 40.75 48.03 71.27 Salesman 35.42 48.70 71.25 Grandmother 32.97 48.42 70.54 Mother 32.77 47.91 70.18 Hall 40.92 47.33 70.97 Akyo 38.77 47.41 70.64 mean 39.87 49.66 71.57 Fgure 3. Real-tme complexty scalable fast MD algorthm

able 5. unused, error trm, % and Pearson correlaton, r for the proposed algorthm at 30fps Vdeo sequence unused error trm r between { _ predcted } FastMD and { actual } Carphone 3.81 3.39 74.95 0.581 able tenns 5.75 5.55 50.59 0.546 Coastguard 4.12 3.88 98.27 0.509 News 6.17 5.88 21.11 0.557 Salesman 6.11 6.28 18.04 0.422 Grandmother 2.72 2.56 13.74 0.667 Mother 3.40 3.13 17.00 0.643 Hall 5.68 6.40 5.75 0.630 Akyo 2.85 3.64-2.92 0.738 Worst case schedulng was tested under the same smulaton condtons. he H.264 complexty settngs were selected as shown n able 6, so the encodng tme for the worst sequence allow encodng at 15, 20 and 30fps. he standard non-real tme JM wth full search baselne profle acheves only 8fps on the reference PC. able 6. H.264 settngs for the worst case schedulng H.264 settngs VBS modes Worst schedulng at.. 30fps 20fps 15fps 8fps P16x16, all Intra modes P16x16, P8x8, all Intra modes All VBS modes All VBS modes Search range 1 4 6 8 Hadamard transform off on In order to compare the effcency of both algorthms on the 30fps pont, the dfference n the bt rate and PSNR for the proposed algorthm were calculated relatve to worst-case schedulng. he results are gven n able 7. In the able 7, mnus sgn ( ) ndcates mprovement for the method. able 7. Comparson of the proposed algorthm vs. worst-case schedulng Vdeo sequence Bt rate, % PSNR, db Carphone 2.26 0.04 able tenns 8.10 0.09 Coastguard 6.84 0.07 News 11.10 0.06 Salesman 12.58 0.06 Grandmother 11.15 0.08 Mother 6.89 0.18 Hall 18.80 0.1 Akyo 15.58 0.19 mean 9.89 0.07 6. DISCUSSION From the expermental results n ables 2 5, t can be concluded that the proposed algorthm provdes dynamc complexty scalng for the selected sequences, makng optmal real-tme H.264 vdeo encodng possble on a range of processors. It can be observed from able 5 that unused tme unused and predcton error error for the algorthm are low, around 2.72 6.17% on average. rm tme trm shows the percentage of complexty that was scaled down by the algorthm from the orgnal complexty. he mnus sgn for trm tme (.e. Akyo) means that complexty was scaled up, not down, thus many MBs n that sequence were promoted. From able 5, t can be observed that, dependng on the moton n the source vdeo, the results can be roughly dvded nto three groups low (trm < 10% ), medum (10% trm < 50%) and hgh moton sequences (trm 50% ). For low moton sequences (.e. Akyo, Hall), the algorthm provdes the best complexty scalng results. Predcton s qute accurate, and the Pearson correlaton s qute hgh (0.63 0.73). hese results show nsgnfcant PSNR drop n vsual qualty and even bt rate reducton compared to the full search non-real tme JM. For the medum moton sequences (.e. News, Salesman, Mother etc), the MB complexty scalng scheme results n around 1.65 6% of bt rate gan, whch s due to the ncreased number of Intra macroblocks. he demoton of MD class C to class D s the most lkely reason. Frame complexty predcton s accurate, the Pearson correlaton s n the range 0.55 0.65. Sequences that are hgh moton (.e. Carphone, enns etc) are the most dffcult for the algorthm to handle. Complexty reducton trm of 50 98% from non-scalable fast MD algorthm comes at the cost of a 0.5dB qualty degradaton and a bt rate ncrease of 10 15%. Due to hgh spatal and moton complexty there are few MBs whch that not demoted, resultng n naccuracy n adaptve t calculaton. hs results n poor frame complexty predcton (Pearson correlaton s only 0.5). In fact, demoton of the orgnal MD class A nto class D seems to be necessary for complexty reducton, whch results n a bt rate ncrease. tably, results for Coastguard are much better than for Carphone and enns, despte havng the hghest trm tme (98%). hs can be explaned by the fact that Coastguard has the hghest number of Intra blocks n the orgnal JM encodng, whch results n comparatvely low bt rate ncrease (only 8%) and nsgnfcant qualty degradaton (0.13dB). Based on these results, t can be concluded that the complexty scalng model needs to be mproved to deal wth hgh moton cases, where the demoton rate s hgh. A possble soluton s utlzaton of multple metrcs for t for dfferent demotons. When comparng the proposed method wth the worst schedulng approach, t can be concluded from able 7 that the average bt rate reducton acheved for the method s almost 10% compared to worst case schedulng wth almost 19% maxmum (for Hall vdeo sequence). PSNR s also better for the method for all sequences except enns. However, the dfference n PSNR acheved s neglgble. he results for prevously publshed low complexty

H.264 algorthms clearly ndcate that none of the prevously proposed methods acheve smlar results. he bt rate gan and PSNR degradaton was plotted aganst target processor performance for both methods. he performance of the reference PC that provdes 30fps s equal to 33msec/frame. PSNR, db Bt rate, % 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 60 50 40 30 20 10 0 20 25 30 35 40 45 50 55 60 65 70-10 Performance, msec/frame Fgure 4. Bt rate gan for both methods Worst case Method Worst case Method 0 20 25 30 35 40 45 50 55 60 65 70 Performance, msec/frame Fgure 5. Qualty degradaton for both methods In Fgure 4, the bt rate curve for the proposed algorthm (sold lne) les much lower than for the standard worst case schedulng method (dashed lne), whch means that the proposed algorthm provdes better encodng effcency. It can be observed that the lower the frame rate the closer the resultant bt rate curves, thus, the proposed algorthm becomes less effectve relatve to worst schedulng. Very hgh bt rate gan for the proposed algorthm at 35fps ndcates that the processor s capabltes are the lmtng factor and, n order to save processng tme, the algorthm uses only Intra predcton and SKIPs. herefore, the most effcent operatng pont for the algorthm s located on the knee of the curve, whch s around 30fps. 7. CONCLUSIONS In ths paper, varous ssues relatng to dynamc real-tme complexty scalng n H.264, such as complexty predcton methods, MB complexty scalng and tme schedulng algorthms were nvestgated. he proposed real-tme complexty scalable MD algorthm sgnfcantly outperforms the standard worst case schedulng approach, both n terms of bt rate reducton (10% n average) and frame rate. It offers roughly 30% of the computatonal complexty requred for full search wth the baselne profle at the same codng effcency and allows an effcent real-tme vdeo encodng on the reference Pentum IV PC. 8. REFERENCES [1] Bjontegaard, G. Calculaton of average PSNR dfferences between RD curves. Document VCEG-M33, IU- VCEG Meetng, Austn, 2001. [2] Das, I. On characterzng the knee of the Pareto curve based on rmal-boundary Intersecton, Structural and Multdscplnary Optmzaton, 18, 3 (1999), 107 115. [3] Feng, B., Zhu, G., and Lu, W. Fast Adaptve Inter Mode Decson Method for P Slces n H.264. In Proc. of IEEE 3 rd Int. Conf. on Consumer Communcatons and Networkng (CCNC 06), 2 (2006), 745 748. [4] Hsu, K.W., Xang L, and Chopra, R. An IC desgn for realtme moton estmaton for H.264 dgtal vdeo. In Proc. of 48 th Symp. On Crcuts and Syst., 2, (August 7 10, 2005), 1489 1493. [5] Implementaton Studes Group of ISO/IEC. Man Results of the AVC Complexty Analyss. Document ISO/IEC JC1/SC29/WG11, Klagenfurt, 2002. [6] Ivanov, Y. V., and Bleakley, C. J. Skp Predcton and Early ermnaton for Fast Mode Decson n H.264/AVC. In Proc. of Int. Conf. on Dgtal Communcatons (ICD). (August, 2006). [7] Ivanov, Y.V., and Bleakley, C. J. Survey and Pareto Analyss Method for Codng Effcency Assessment of Low Complexty H.264 Algorthms. In Proc. of 10th Irsh Machne Vson and Image Processng Conf. (IMVIP) (2006), 172 179. [8] Jang, M. and Lng, N. Low-Delay Rate Control for Realtme H.264/AVC Vdeo Codng. IEEE rans. Multmeda, 8, 3 (June 2006), 467 477. [9] Jont Vdeo eam (JV) of ISO/IEC MPEG and IU- VCEG. Draft IU- Recommendaton and Fnal Draft Internatonal Standard of Jont Vdeo Specfcaton (IU- Rec. H264 ISO/IEC 14496-10 AVC). Document JV- G050d35.doc, 7 th Meetng: Pattaya, haland, March, 2003.

[10] JV reference software JM 9.5, on the Web: http://phome.hh.de/suehrng/tml. [11] Kannangara, C. S., Rchardson, I. E.G., Bystrom, M., Solera, J. R., Zhao, Y., MacLennan, A., and Cooney, R. Low-Complexty Skp Predcton for H.264 hrough Lagrangan Cost Estmaton. IEEE rans. Crcuts Syst. Vdeo echnol., 16, 2 (2006), 202 208. [12] Km, C., and Jay Kuo, C. C. A Feature-based Approach to Fast H.264 Intra/Inter Mode Decson. In Proc. of IEEE Int. Symp. Crcuts and Systems (ISCAS 05), 1, (May 23 26, 2005), 308 311. [13] Km, Y., Choe,Y., and Cho, Y. Fast Mode Decson Algorthm usng AZCB Predcton. In Proc. of Int. Conf. on Consumer Electroncs. (ICCE 06). Dgest of techncal papers. (Jan. 7-11, 2006), 33 34. [14] L, G. L., Chen, M. J., L H. J., and Hsu, C.. Effcent Moton Search and Mode Predcton Algorthms for Moton Estmaton n H.264/AVC. In Proc. of IEEE Int. Symp. Crcuts and Systems (ISCAS 05) 6, (May 23-26, 2005), 5481 5484. [15] Rao, G. N., Prasad, RSV, Chandra, D. J., and Narayanan, S. Real-me Software Implementaton of H.264 Baselne Profle Vdeo Encoder for Moble and Handheld Devces Acoustcs. In Proc. of IEEE Int. Conf. on Speech and Sgnal Processng (ICASSP 06), 5, (May 14-19, 2006), 457 460. [16] Wang, H., and Zhu, Z. Fast Mode Decson and Reducton of the Reference Frames for H.264 Encoder. In Proc. of Int. Conf. On Control and Automaton (ICCA 05), 2, (June 26 29, 2005), 1040 1043. [17] Wegand,., Sullvan, G. J., Bjontegaard, G., and Luthra, A. Overvew of the H.264/AVC Vdeo Codng Standard. In IEEE rans.on Crcuts Syst. Vdeo echnol. 13, 7 (2003), 560 576. [18] Zhe, W., and Canhu, C. Realzaton and optmzaton of DSP based H.264 encoder. In Proc. of IEEE Internatonal Symposum on Crcuts and Syst. (ISCAS 06) (May 21-24, 2006).