A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

Similar documents
Joint Algorithm-Architecture Optimization of CABAC

A Low-Power 0.7-V H p Video Decoder

A Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding

Chapter 2 Introduction to

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

The H.26L Video Coding Project

Low-Power Techniques for Video Decoding. Daniel Frederic Finchelstein

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

HIGH Efficiency Video Coding (HEVC), developed by the. A Deeply Pipelined CABAC Decoder for HEVC Supporting Level 6.2 High-tier Applications

H.264/AVC Baseline Profile Decoder Complexity Analysis

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Overview: Video Coding Standards

Decoder Hardware Architecture for HEVC

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Highly Efficient Video Codec for Entertainment-Quality

THE new video coding standard H.264/AVC [1] significantly

Video coding standards

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

A low-power portable H.264/AVC decoder using elastic pipeline

Workload Prediction and Dynamic Voltage Scaling for MPEG Decoding

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

Advanced Video Processing for Future Multimedia Communication Systems

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Video Compression - From Concepts to the H.264/AVC Standard

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Chapter 10 Basic Video Compression Techniques

Principles of Video Compression

Joint source-channel video coding for H.264 using FEC

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Performance of a H.264/AVC Error Detection Algorithm Based on Syntax Analysis

Low Power Design of the Next-Generation High Efficiency Video Coding

Multimedia Communications. Video compression

Frame Processing Time Deviations in Video Processors

Video Codec Requirements and Evaluation Methodology

Motion Video Compression

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

17 October About H.265/HEVC. Things you should know about the new encoding.

Multicore Processing and Efficient On-Chip Caching for H.264 and Future Video Decoders

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

PERFORMANCE OF A H.264/AVC ERROR DETECTION ALGORITHM BASED ON SYNTAX ANALYSIS

Modeling and Evaluating Feedback-Based Error Control for Video Transfer

Scalability of MB-level Parallelism for H.264 Decoding

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

Error concealment techniques in H.264 video transmission over wireless networks

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors

H.264/AVC. The emerging. standard. Ralf Schäfer, Thomas Wiegand and Heiko Schwarz Heinrich Hertz Institute, Berlin, Germany

IMAGE SEGMENTATION APPROACH FOR REALIZING ZOOMABLE STREAMING HEVC VIDEO ZARNA PATEL. Presented to the Faculty of the Graduate School of

REAL-TIME H.264 ENCODING BY THREAD-LEVEL PARALLELISM: GAINS AND PITFALLS

SUBJECTIVE ASSESSMENT OF H.264/AVC VIDEO SEQUENCES TRANSMITTED OVER A NOISY CHANNEL

Novel VLSI Architecture for Quantization and Variable Length Coding for H-264/AVC Video Compression Standard

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b

WITH the demand of higher video quality, lower bit

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

CONSTRAINING delay is critical for real-time communication

Dual Frame Video Encoding with Feedback

Multimedia Communications. Image and Video compression

Implementation of an MPEG Codec on the Tilera TM 64 Processor

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

Lossless Compression Algorithms for Direct- Write Lithography Systems

Image Segmentation Approach for Realizing Zoomable Streaming HEVC Video

Real-time SHVC Software Decoding with Multi-threaded Parallel Processing

Performance Comparison of JPEG2000 and H.264/AVC High Profile Intra Frame Coding on HD Video Sequences

Hardware study on the H.264/AVC video stream parser

Overview of the H.264/AVC Video Coding Standard

A Novel Parallel-friendly Rate Control Scheme for HEVC

Design Challenge of a QuadHDTV Video Decoder

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

Authors: Glenn Van Wallendael, Sebastiaan Van Leuven, Jan De Cock, Peter Lambert, Joeri Barbarien, Adrian Munteanu, and Rik Van de Walle

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

Error Concealment for SNR Scalable Video Coding

A Novel Study on Data Rate by the Video Transmission for Teleoperated Road Vehicles

HEVC: Future Video Encoding Landscape

HEVC Real-time Decoding

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

THIS PAPER describes a video compression scheme that

MULTI-CORE SOFTWARE ARCHITECTURE FOR THE SCALABLE HEVC DECODER. Wassim Hamidouche, Mickael Raulet and Olivier Déforges

Error resilient H.264/AVC Video over Satellite for low Packet Loss Rates

Hardware Decoding Architecture for H.264/AVC Digital Video Standard

A Study on AVS-M video standard

Motion Compensation Hardware Accelerator Architecture for H.264/AVC

AUDIOVISUAL COMMUNICATION

PACKET-SWITCHED networks have become ubiquitous

Color Quantization of Compressed Video Sequences. Wan-Fung Cheung, and Yuk-Hee Chan, Member, IEEE 1 CSVT

WITH the rapid development of high-fidelity video services

Visual Communication at Limited Colour Display Capability

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

4 H.264 Compression: Understanding Profiles and Levels

Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

Reduced complexity MPEG2 video post-processing for HD display

Project Interim Report

Key Techniques of Bit Rate Reduction for H.264 Streams

Transcription:

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

Motivation High demand for video on mobile devices Compressionto reduce storage and transmission Battery capacity limited by size, weight, and cost Need low power video coding Achieve performance required for real time HD Digital Camera DVC Palm Pre ipod PSP Video Conferencing iphone 2

Low Power Video Coding Energy per operation H.264/AVC Baseline Decoder V.Sze et al. JSSC, Nov. 2009 3.3 mm 176 I/O PADS 2T T Delay Supply Voltage (VDD) 3.3 mm SRAM CORE DOMAIN MEMORY CONTROLLER DOMAIN Parallelism and voltage scaling shown to be effective in power reduction > 10x power reduction However, certain algorithms inherently serial E.g. Context Adaptive Binary Arithmetic Coding (CABAC) H.264/AVC High Profile uses CABAC for entropy coding 3

Arithmetic Coding Example: Pr(A) = 0.6; Pr(B) = 0.4 Entropy Encoding Symbol Sequence: A-B-B 0 0.6 1 A Range Offset 1 0 0 0.36 0.6 0.36 0.504 B B 0.6 0.6 0 0.24 0.36 Output Binary Bitstream:.1001 (Binary Fraction) Binary Arithmetic Coding has binary symbols ( bins ) Binarizer maps syntax elements to bins 4 Range updated after every bin

Context-Adaptive Context (probability model) Adaptive estimation of probability (update context state) Context can be switched and updated every bin Bin-to-bin dependencies Cycle-to-cycle dependencies syntax BINS Arithmetic Binarizer Coding elements Engine Coding BITS: 0010 Engine Probability Adaptation Context Selection Pr(0) Context Modeler Pr(1) ENCODER Encoder: Syntax Element Bins Bits 5

CABAC Challenges Decoder: Bits Bins Syntax Element BITS: 0010 DECODER Pr(0) Arithmetic Decoding Engine Pr(1) BINS De-Binarizer Probability Adaptation syntax elements Context Modeler Context Selection Data Dependencies (difficult to parallelize) Contexts and Range are updated after every bin At decoder, data feedback required Context modeling and interval division tied to bins (not bits) Number of cycles proportional to number of bins 6

Real-time H.264 CABAC Requirements Level Max Frame Rate Max Bins per Picture MaxBit Rate Peak Bin Rate fps Mbins Mbits/sec Mbins/sec 4.0 30 9.2 25 275 5.1 26.7 17.6 300 2107 Max Bin Rate = (Max Bins per Picture) x (Max Frame Rate) For real-time decoding, decode frame within interframe time interval Frequency requirements reach multi-ghz range Parallelism needed to lower frequency to acceptable range 7

Bin H.264/AVC CABAC Parallelism Speculation required Frame Buffering required for frames Limited by latency requirement Slice Coding Efficiency Penalty macroblock Slice 0 Slice 1 Can we do better by changing the algorithm? 8

Entropy Slices Proposed by Sharp in 2008 [VCEG-AI32] Only entropy coding is independent Coding penalty overhead due to reduced training Coding Efficiency Penalty 20% 18% 16% 14% 12% 10% 8% 6% 4% 2% 0% H.264/AVC Slices Entropy Slices Reduced Reduced Prediction Prediction Reduced Reduced Start Code and Training Reduced Training Header Prediction 0 20 40 60 Number of Slices per Frame Start Code Prefix and Header Can we further reduce coding penalty? 9

Syntax Element Parallelism Place syntax elements in different groups Assign groups to different partitions and process partitions in parallel Allocation of syntax elements to partitions based on distribution (balance workload) 33% 16% 12% 17% 22% E.g. Average distribution of bins (720p sequences QP=27) Macroblock Info Prediction Mode Coded Block Pattern Significance Map Coefficient Level 10

Reduce Cycle Count H.264/AVC Slice Syntax Element Partitions MBINFO PRED CBP SIGMAP COEFF MB0 MB1 LEGEND Slice header MB2 different syntax elements groups Start code macroblock Cycles 11

Context Training for Coding Efficiency Coding efficiency depends on accuracy of bin probability estimate Better estimate achieved with more bins (context training) Syntax element partitioning does not reduce number of bins used with each context Entropy Slices per frame [MB/slice] Total Coding Penalty Coding Penalty due to Reduced Training 1 [3600] 0.00% 0.00% 2 [1800] 0.30% 0.20% 3 [1200] 0.61% 0.41% 4 [900] 0.88% 0.57% 6 [600] 1.47% 0.95% 8 [450] 1.93% 1.20% 18 [200] 4.13% 2.38% 36 [100] 7.36% 3.87% 72 [50] 12.21% 5.50% e.g. BigShips QP=27, IPPP 12 Improved Coding Efficiency 12

Area Cost (ASIC) Entire CABAC does NOT have to be replicated Context selection, and context memory are not replicated Area increase due to Replicated arithmetic decoder Control and FIFO between engines 13

Experimental Results Validated with JM12.0 under common conditions across 720p: BigShips, City, Crew, Night, ShuttleStart For approx. same speed-up (~2.4 to 2.7x) H.264/AVC Slices Entropy Slices Syntax Element Partitioning Area Cost 3x 3x 1.5x Prediction Structure BDrate Speedup BDrate Speedup BDrate Speedup Ionly 0.87 2.43 0.25 2.43 0.06 2.60 IPPP 1.44 2.42 0.55 2.44 0.32 2.72 IBBP 1.71 2.46 0.69 2.47 0.37 2.76 2 to 4x reduction in coding penalty 14

Adaptive Bin Allocation (Varying QP) To reduce Start Code overhead assign multiple groups to each partition and reduce partitions (5 3) Bin distribution changes with QP combine adaptively Low QP (QP=22) COEFF 23% MBINFO 5% PRED 11% CBP 11% CBP 22% SIGMAP 8% COEFF 5% MBINFO 33% High QP (QP=37) SIGMAP 50% PRED 32% Mode MBINFO PRED CBP SIGMAP COEFF Low QP 0 0 0 1 2 High QP 0 1 2 2 2 15

Through hput Increase 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 Throughput Increase Low QP Switch to High QP High QP 22 27 32 37 Quantization Parameter (QP) (Left to Right) BigShips, City, Crew, Night, ShuttleStart 16

16% Additional Parallelism Combine with slice level parallelism 1. H.264/AVC Slices (8 slices) 2. Entropy Slices (8 slices) 3. Entropy Slices (4 slices) + Syntax Element Partitioning Coding Efficiency Penal lty 14% 12% 10% 8% 6% 4% 2% 0% 6x (1) (2) (3) 0 5 10 15 20 25 30 Throughput Increase 17

Conclusions A new CABACalgorithm for next generation standard to increase concurrency by processing the bins of different syntax elements in parallel. Achievea throughput increase of up to 3x without sacrificing coding efficiency, power, ordelay and minimal area cost. Can be combined with other approaches for improved coding efficiency and throughput/power. Acknowledgements: Funding from Texas Instruments and NSERC 18