COMPLEXITY-DISTORTION ANALYSIS OF H.264/JVT DECODERS ON MOBILE DEVICES. Alan Ray, Hayder Radha. Michigan State University

Similar documents
H.264/AVC Baseline Profile Decoder Complexity Analysis

Chapter 2 Introduction to

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Visual Communication at Limited Colour Display Capability

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

HEVC: Future Video Encoding Landscape

The H.263+ Video Coding Standard: Complexity and Performance

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

The H.26L Video Coding Project

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b

Multicore Design Considerations

REAL-TIME H.264 ENCODING BY THREAD-LEVEL PARALLELISM: GAINS AND PITFALLS

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

PERCEPTUAL QUALITY OF H.264/AVC DEBLOCKING FILTER

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

AUDIOVISUAL COMMUNICATION

Overview: Video Coding Standards

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

Video coding standards

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Understanding Compression Technologies for HD and Megapixel Surveillance

DELTA MODULATION AND DPCM CODING OF COLOR SIGNALS

Power Reduction via Macroblock Prioritization for Power Aware H.264 Video Applications

Frame Processing Time Deviations in Video Processors

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

Dual Frame Video Encoding with Feedback

Interleaved Source Coding (ISC) for Predictive Video Coded Frames over the Internet

A Study of Encoding and Decoding Techniques for Syndrome-Based Video Coding

CONSTRAINING delay is critical for real-time communication

Systematic Lossy Forward Error Protection for Error-Resilient Digital Video Broadcasting

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

A low-power portable H.264/AVC decoder using elastic pipeline

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Error concealment techniques in H.264 video transmission over wireless networks

Video Over Mobile Networks

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Implementation of an MPEG Codec on the Tilera TM 64 Processor

ENCODING OF PREDICTIVE ERROR FRAMES IN RATE SCALABLE VIDEO CODECS USING WAVELET SHRINKAGE. Eduardo Asbun, Paul Salama, and Edward J.

WITH the demand of higher video quality, lower bit

Multiview Video Coding

Dual frame motion compensation for a rate switching network

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Compressed-Sensing-Enabled Video Streaming for Wireless Multimedia Sensor Networks Abstract:

PAPER Wireless Multi-view Video Streaming with Subcarrier Allocation

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

Analysis of Video Transmission over Lossy Channels

Principles of Video Compression

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Bridging the Gap Between CBR and VBR for H264 Standard

MULTI-STATE VIDEO CODING WITH SIDE INFORMATION. Sila Ekmekci Flierl, Thomas Sikora

THE new video coding standard H.264/AVC [1] significantly

Chapter 10 Basic Video Compression Techniques

FAST SPATIAL AND TEMPORAL CORRELATION-BASED REFERENCE PICTURE SELECTION

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

Modeling and Evaluating Feedback-Based Error Control for Video Transfer

4 H.264 Compression: Understanding Profiles and Levels

AN IMPROVED ERROR CONCEALMENT STRATEGY DRIVEN BY SCENE MOTION PROPERTIES FOR H.264/AVC DECODERS

Error Resilient Video Coding Using Unequally Protected Key Pictures

Multimedia Communications. Video compression

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Intra-frame JPEG-2000 vs. Inter-frame Compression Comparison: The benefits and trade-offs for very high quality, high resolution sequences

Key Techniques of Bit Rate Reduction for H.264 Streams

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

Film Grain Technology

Multimedia Communications. Image and Video compression

Bit Rate Control for Video Transmission Over Wireless Networks

Digital Video Telemetry System

Adaptive Key Frame Selection for Efficient Video Coding

17 October About H.265/HEVC. Things you should know about the new encoding.

Systematic Lossy Error Protection of Video based on H.264/AVC Redundant Slices

Evaluation of SGI Vizserver

Switching Solutions for Multi-Channel High Speed Serial Port Testing

WITH the rapid development of high-fidelity video services

LBVC: Towards Low-bandwidth Video Chat on Smartphones

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

Systematic Lossy Error Protection based on H.264/AVC Redundant Slices and Flexible Macroblock Ordering

ATSC Candidate Standard: Video Watermark Emission (A/335)

FLEXIBLE SWITCHING AND EDITING OF MPEG-2 VIDEO BITSTREAMS

An Overview of Video Coding Algorithms

Advanced Video Processing for Future Multimedia Communication Systems

Motion Video Compression

PACKET-SWITCHED networks have become ubiquitous

HEVC Subjective Video Quality Test Results

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

Analysis of MPEG-2 Video Streams

A Study on AVS-M video standard

ATSC Standard: Video Watermark Emission (A/335)

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

Spatio-temporal inaccuracies of video-based ultrasound images of the tongue

Interframe Bus Encoding Technique for Low Power Video Compression

SHOT DETECTION METHOD FOR LOW BIT-RATE VIDEO CODING

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

Transcription:

COMLEXY-DSORON ANALYSS OF H.264/JV DECODERS ON MOLE DEVCES Alan Ray, Hayder Radha Michigan State University ASRAC Operational complexity-distortion curves for H.264/JV decoding are generated and analyzed for low-complexity mobile devices under a variety of bitrate constraints. he focus of our study is on achieving optimum complexity-distortion operational points by evaluating different combinations of Group of icture (Go) types and varying the Quantization arameter (Q) and entropy encoder (arithmetic or universal) to meet the desired rate constraints. Using a 0Mhz ntel XA5 platform (found in popular iaq devices), complexity-distortion curves are developed for common Go structures and a wide range of Q values. he curves, based on extensive operational experimentation, indicate that under typical conditions for mobile platforms (low to mid computational complexity), or & -frame combinations outperform the more compressed streams that include -frames. n the -d range, the optimum complexity-distortion or & structures outperformed -frame structures by up to 21% in complexity for the equivalent distortion level and under the same bitrate constraints. Further, under the same complexity and bitrate constraints, selecting the optimum Go structure achieves as much as a 10d SNR improvement. 1. NRODUCON he emerging H.264/JV video standard [1] has received a great deal of attention due to its coding efficiency when compared with previous standards such as baseline MEG-4, MEG-2, and H.263. he coding-efficiency advantages of H.264, however, come at the expense of higher computational complexity. For example, the study in [2] showed that H.264 decoders could exhibit more than double the complexity of H.263 decoders. Furthermore, previous studies have shown that fractional-pixel motion-compensation interpolation and the loop filtering consume a significant amount of computational power in emerging H.264 decoders [2,3]. Since these operations are part of the baseline (required) part of H.264, there is a need to evaluate new ways for minimizing both complexity and distortion for H.264 decoders on low-complexity devices. n particular, new wireless handhelds have both complexity and bitrate constraints, yet the range of these constraints differ from traditional systems (e.g., powerful Cs that are networked over the best-effort nternet). Under common operational scenarios, a lowcomplexity wireless device may have significantly greater complexity/power constraints than bitrate limitation (e.g., over a wireless access LAN). n this paper, we evaluate the feasibility of achieving optimum complexity-distortion operational points for low-complexity H.264 decoders by adapting the picture types (,, and ) under different bitrate constraints. We show that, over a wide range of Q values, adapting different Group of icture (Go) structures provides the H.264 coding system the option for realizing optimum complexity-distortion points while adhering to a certain rate requirement. ased on extensive operational complexity-distortion analysis, we show that for low- to mid-complexity constraints, and under the same bitrate constraint, an all -frame or hybrid and frame H.264 structure could provide the optimum complexity-distortion curve compared with compressed H.264 streams that include one or more - frames in their Go structure. he remainder of this paper is organized as follows. Section 2 describes the overall experimental set-up for our study. Section 3 describes our results and the implications for the H.264 decoder, and section 4 summarizes the analysis and future questions of interest.

2. EXERMEN SEU Our study uses three main components, which are discussed below: A mobile device as a testbed, an extensive selection of encoded video clips, and an optimized version of a publicly available H.264 decoder. 2.1 Experimental latform We used a standard H iaq 5550 with 128M RAM and a 0Mhz ntel XA5 processor running Microsoft ocketc 2003 as our test platform. During the experiments, all extraneous background processes were terminated, including wireless communication. We eliminated network performance issues by storing encoded sequences in files and downloading them for each experiment. No additional effort was made to prioritize the decoding thread, so the operating system handled memory management and scheduling in its normal manner. A simple DOS shell was used on the iaq in lieu of developing a graphical user interface for the decoder, further simplifying the operating system calls. Shell output was limited to non-timed portions of the tests. 2.2 Encoder and Sequence Selection he standard JV reference decoder version JM 6.1e [4] was used, with slight modifications to the interface to facilitate data gathering. able shows the standard encoder settings. hree video test sequences with different characteristics were selected: Akiyo for its high compression rates, Foreman for its higher motion and panning, and Mobile because of its coding difficulty. Four different Go structures were tested; in each of them every twelfth frame was an frame to refresh the sequence. he first structure was an all sequence. he second was an all sequence. he third alternated between and frames (---- ). he final sequence used two frames between frames (- ----- ). n the results, they are referred to as,, and sequences respectively. For a given sequence, the same Q value was used for,, and frames. Each sequence was encoded and timed using both entropy modes: arithmetic (CAAC) and contextbased adaptive variable-length coding (CAVLC). he CAVLC entropy coding mode used in the current H.264 standard is different from a universal variable length coding mode used in earlier draft versions. arameter Value Q Value 0-51; each trial used a constant Q for all frames. Frame Rates 5, 10, 15 fps Format QCF (176x144) -Frame Frequency Every 12 frames Hadamard ransform On Max. Search Range 16 Num. Reference Frames 1 Forced ntra-macroblocks None lock Search Restrictions None Slices Unused S Frames Unused Entropy Coding CAVLC & CAAC Loop Filter arameters Default able : Encoder Options Complexity Sequence @ d % Diff. Complexity @ d % Diff. Akiyo : 0.385 (CAAC) : 0.44 14.2% : 0.44 : 0.49 11.4% Foreman : 0. (CAAC) : 0.4 21.4% : 0.55 : 0.58 5.4% Mobile & : 0.31 (CAAC) : 0.34 9.7% : 0.52 : 0.55 5.8% Akiyo : 0.32 (CAVLC) : 0.36 11.1% : 0.41 : 0. 9.8% Foreman : 0.36 (CAVLC) : 0. 10.0% : 0.67 : 0.73 9.0% Mobile : 0.5 (CAVLC) : 0.42 3.7% : 0.82 : 0.91 11.0% able : Relative Complexity of best or vs. best or sequence at and d, shown for 10 fps 2.3 Decoder Optimizations he JM 6.1e decoder [4] was used as the baseline code for the decoder. Many modifications were made to the decoder to streamline the code and improve performance. he changes included using circular buffers instead of memory copying, streamlining bitoriented processing, and reducing calls to the most frequent functions. he results presented later for CAVLC entropy decoding significant improvements to the reference software s algorithm. he primary improvements involved additional caching and optimization of frequently called functions. Despite the generally greater complexity of CAVLC when compared to CAAC, the CAVLC decoding has been improved an additional 10-15% compared to the CAVLC algorithms in the reference software.

a. SNR (d) c. SNR (d) 0.31 0.41 0.51 0.61 0.31 0.39 0.48 b. d. SNR (d) SNR (d) 0.31 0.41 0.51 0.61 0.26 0. 0.44 0.53 Figure : 15fps CAAC Complexity-Distortion Curves (a) Akiyo <1Kpf, (b) Akiyo <2Kpf, (c) Foreman <100Kbps, (d) Foreman <200Kbps Changes did not affect the numerical accuracy of the decoder. Due to the lack of WindowsCE profiling tools, a limited number of timing statements were introduced into the decoder, but frequency of these was negligible compared to the overall decoder complexity. Our configuration used the encodergenerated parameters for the leaky bucket parameters, but no additional rate control was implemented. Decoder timings ignored the program initialization time and simple timed from beginning of the first frame to the end of the last. he code was built with ntel s compiler for WinCE and Microsoft s Embedded Visual C++ 4.0 linker. he compiler flags were optimized for speed; file size was ignored. 2.4 erformance Metrics he aforementioned four different Go structures (,,, ) were compared for the three different video clips (Akiyo, Foreman, Mobile) using two different entropy codings (CAVLC, CAAC). Different frame rates (5, 10, and 15 fps) were also tested to explore whether the more temporally related streams made a significant difference in the complexity-distortion curves. Each video clip/sequence pairing was timed for each quantization parameter (0-51). However, results were only thoroughly examined for quantization (Q) values that provided the more practical range of -d. Since no rate control was used, bitrates are based on the total number of bits generated for the given Q value used. Results are shown for single consecutive runs as experiments indicate that identical runs tend to vary by only 1-2%. Certain specific sequences seem to have unusual complexities and break the smooth complexity curves that characterize most of the data. Once timing data had been gathered for a given sequence and set of Go structures, various bit rate limits were selected and plotted. All complexity data has been normalized as a fraction of the time it takes to decode a 0-Q all- frame sequence of the video sequence in question.

a. SNR (d) b. SNR (d) 0. 0.38 0.50 0. 0.38 0.50 Figure : 15fps Mobile CAAC Complexity-Distortion Curves: (a) <4K/frame, (b) <6K/frame 3. RESULS & ANALYSS 3.1 Arithmetic Coding (CAAC) Regardless of the framerate selected,,,, and sequences performed similarly for a given video clip. he (two frames between every frame) sequences were always slightly more computationallycomplex for a given distortion than the sequence. Likewise, the sequence was less complex than the sequence. he performance of the all sequence varied greatly: t did very poorly on the highly compressible Akiyo, while compared similarly with the Go for portions of Foreman, and varied widely for Mobile. able shows the performance gain from selecting a larger, less compressed stream at a low quality setting (d) and a high quality setting (d). Despite the higher bandwidth required for the or sequences, a significant performance advantage is seen in terms of complexity-distortion optimization under a given maximum bitrate. Figure b shows the CAAC encoded Akiyo complexity-snr graphs for all four sequences under 2K per frame (15 frames per second). While the and sequences achieve better compression and thus provide higher quality pictures at low bitrates, the sequence runs 10-14% faster at a given distortion level compared to the Go, closer to % compared to the and Gos. (he Go s best achievable distortion is approximately 34ds, due to the bandwidth limitation.) More importantly, for a given complexity constraint, selecting the Go structure achieves as much as a 10-15d SNR improvement over the Go, as well as the and structures (e.g. a complexity of 0.37 is d for a Go, but d for the Go). n general, it s clear that the Go decodes more efficiently (in terms of optimum complexitydistortion) but requires higher bitrates relative to the, and Go structures, as seen in the Foreman figures. n the Akiyo example, the highly correlated frames are coded extremely efficiently so that the Go represents the optimum curve. Figure a shows results for Akiyo using a maximum bitrate of Kbps. Here we see that the rate is too low to support a high-quality pure -frame sequence, but that the high compressibility of Akiyo allows the Go to almost match the distortion of the and sequences at a much lower complexity. Figure c-d show the CAAC Foreman sequence with the same 1K and 2K per frame limits. Here, the Go is similar to the Go at high distortions. As the distortion decreases, the Go improves to offer a significant performance advantage over the other sequences. his option is attractive when computational power is at a premium and the bandwidth may be more flexible. For example, in Figure d, the Go shows as much as a 4d improvement over the Go; and as much as 7d over the frames for a fixed complexity constraint. (he and Gos in the Foreman deviate from the expectation that lower distortion causes greater complexity. he effect is greatly enhanced because of the focus on the -d range.) CAAC encoded Mobile, shown in Figure, shows the Go running 15 to 20% faster than the alternative options for a given distortion, and a 3-5d SNR improvement for a given complexity. Once again, the Go is limited by the lack of bandwidth. n addition, for the first time, the Go s performance is competitive with the Go

a. SNR (d) c. SNR (d) 0.24 0. 0.56 0.34 0.51 0.68 0.85 1.03 b. d. SNR (d) SNR (d) 0.24 0.48 0.71 0.34 0.51 0.68 0.85 1.03 1.20 Figure : 15fps CAVLC Complexity-Distortion Curves (a) Foreman <1Kpf, (b) Foreman <2Kpf, (c) Mobile <4Kpf, (d) Mobile <6Kpf he steep distortion improvement slope generated by increasing in bandwidth shows that in terms of complexity, higher bitrate limits are preferable to more compressed Go structures. As pointed out by Horowitz et al. much of the interpolation and loop filtering complexity is fixed as long as a numerically correct decoder is required [3]. hese experimental results indicate the extra parsing is much less complicated than additional motion prediction. he general trend shown in these figures continues as bitrate limits are increased: A much lower complexity option using or Gos is available for a small increase in bandwidth. 3.2 Context-based (CAVLC) As mentioned previously, the context based adaptive variable length entropy coding mode is a relatively new addition to the standard. Much of the previous literature examines the earlier universal variable length coding mode (UVLC). Horowitz et al., for example, primarily examines the UVLC complexity while footnoting that experimentation suggested CAVLC was roughly twice as complex as UVLC. [3] he scope of our research focuses on the operational complexity-distortion curves of CAVLC and comparing them with the CAAC curves. Figure presents the Foreman and Mobile sequences with the same rate limits as shown in Figure c-d and Figure. While the Go still performs the best for a given distortion (up to 20% faster) or a given complexity (up to 5ds), the Go performance is significantly degraded. Not only does the Go s minimum distortion shrink due to the increased bandwidth used by CAVLC, but its complexity increases much more steeply as a function of distortion. n fact, all four Gos increase more rapidly in complexity as distortion decreases (compared to CAAC sequences). However, only the Go radically

changes its performance relationship to the other Gos. Figure b suggests that at lower distortion (>32d), the and Gos complexity is similar to the Go. Figure c-d, showing the performance of the CAVLC Mobile sequence under two different bitrate constraints, shows another change in the Go performance. Under CAAC, the Go s complexity was similar to the or Gos for a given distortion. For CAVLC, the Go is the slowest for a given complexity. At d, the Go is roughly % more complex than the or Gos. Compared to the CAAC curves (without normalization), the CAVLC curves have approximately 10% greater complexity at high distortions (around d). he complexity difference grows, largely based upon bandwidth. For the Mobile sequences at low distortion (and high bitrate), the CAVLC sequences are approximately 50% more complex than the CAAC sequences of the same distortion. he significant increase in the Go complexity as distortion decreases, especially for the Go, indicates that the CAVLC entropy decoding mode is, in the current implementation, significantly more complex for a given distortion than the equivalent CAAC sequence. his complexity most likely reflects a combination of two factors: he capabilities and weaknesses of the iaq architecture and software and the specific implementation of the CAVLC algorithm in the reference software. Overall, the operational complexity-distortion are very similar for the CAVLC sequences when compared to the CAAC ones in terms of optimal Go structures. As shown in able and Figure, the curves are very similar despite the slightly higher bitrates required for CAVLC encoded sequences. For this implementation, the CAVLC sequences generally require greater complexity to decode compared to the same sequence encoded using CAAC. 4. CONCLUSONS n this paper, we explored the complexity-distortion curves for a mobile H.264/JV decoding environment, including the impact of rate limits upon the complexity curves. We showed that simpler sequences potentially achieve equal distortion with lower complexity than more compressed sequences. his suggests an efficient real-time encoding for mobile devices may use less computing power and compensate with faster network service. he highly compressible sequences (e.g. Akiyo) benefit greatly from -frame compression; while more difficult sequences vary in their optimal Go structures. Alternatively, sequences with quantization parameters for -frames that are much smaller than or frame parameters may lead to better distortion rates without increasing the complexity. hese results are also highly dependent upon the mobile network implementation as sufficient network processing will erase the computational savings. Our study suggests that maximizing the network bandwidth could provide a viable approach to achieving high video quality for mobile platforms while maintaining the low-complexity constraints of these platforms. 5. REFERENCES [1] Draft U- Recommendation and Final Draft nternational Standard of Joint Video Specification (U- Rec. H.264 SO/EC 14496-10 AVC), Joint Video eam (JV), Mar. 2003, Doc. JV-G050. [2] V. Lappalainen, A. Hallapuro, and.d. Hämäläinen, Complexity of Optimized H.26L Video Decoder mplementation, EEE CSV, vol 13., pp. 717-7, July 2003 [3] M. Horowitz, A. Joch, F. Kossentini, and A. Hallapuro, H.264/AVC aseline rofile Decoder Complexity Analysis, EEE CSV, vol 13., pp. 704-7716, July 2003 [4] JV Reference Software version JM 6.1e via H.264/AVC Software Coordination webpage. Available: http://bs.hhi.de/~suehring/tml/