A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

Similar documents
Overview: Video Coding Standards

Chapter 2 Introduction to

The H.26L Video Coding Project

High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

Video coding standards

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

THE new video coding standard H.264/AVC [1] significantly

WITH the demand of higher video quality, lower bit

A Low-Power 0.7-V H p Video Decoder

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

H.264/AVC Baseline Profile Decoder Complexity Analysis

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

The H.263+ Video Coding Standard: Complexity and Performance

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

H.264/AVC. The emerging. standard. Ralf Schäfer, Thomas Wiegand and Heiko Schwarz Heinrich Hertz Institute, Berlin, Germany

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

Hardware study on the H.264/AVC video stream parser

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Frame Processing Time Deviations in Video Processors

Principles of Video Compression

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

A parallel HEVC encoder scheme based on Multi-core platform Shu Jun1,2,3,a, Hu Dong1,2,3,b

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Design Challenge of a QuadHDTV Video Decoder

Multimedia Communications. Video compression

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

Multimedia Communications. Image and Video compression

17 October About H.265/HEVC. Things you should know about the new encoding.

MPEG decoder Case. K.A. Vissers UC Berkeley Chamleon Systems Inc. and Pieter van der Wolf. Philips Research Eindhoven, The Netherlands

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

Multicore Design Considerations

Low-Power Techniques for Video Decoding. Daniel Frederic Finchelstein

Video Compression - From Concepts to the H.264/AVC Standard

Implementation of an MPEG Codec on the Tilera TM 64 Processor

A Configurable H.265-Compatible Motion Estimation Accelerator Architecture for Realtime 4K Video Encoding in 65 nm CMOS

Research Article Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

Advanced Computer Networks

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

Joint Algorithm-Architecture Optimization of CABAC

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

A Study of Encoding and Decoding Techniques for Syndrome-Based Video Coding

Video Over Mobile Networks

Fast Mode Decision Algorithm for Intra prediction in H.264/AVC Video Coding

REAL-TIME H.264 ENCODING BY THREAD-LEVEL PARALLELISM: GAINS AND PITFALLS

Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Motion Compensation Hardware Accelerator Architecture for H.264/AVC

Memory interface design for AVS HD video encoder with Level C+ coding order

Hardware Decoding Architecture for H.264/AVC Digital Video Standard

A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm

Novel VLSI Architecture for Quantization and Variable Length Coding for H-264/AVC Video Compression Standard

A CONFIGURABLE H.265-COMPATIBLE MOTION ESTIMATION ACCELERATOR ARCHITECTURE SUITABLE FOR REALTIME 4K VIDEO ENCODING

STUDY OF AVS CHINA PART 7 JIBEN PROFILE FOR MOBILE APPLICATIONS

CERIAS Tech Report Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E

Variable Block-Size Transforms for H.264/AVC

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

Workload Prediction and Dynamic Voltage Scaling for MPEG Decoding

Transactions Briefs. Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Into the Depths: The Technical Details Behind AV1. Nathan Egge Mile High Video Workshop 2018 July 31, 2018

A Study on AVS-M video standard

SCALABLE video coding (SVC) is currently being developed

Film Grain Technology

Chapter 10 Basic Video Compression Techniques

A low-power portable H.264/AVC decoder using elastic pipeline

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

A Highly Parallel and Scalable CABAC Decoder for Next Generation Video Coding

Lossless Compression Algorithms for Direct- Write Lithography Systems

Video Encoder Design for High-Definition 3D Video Communication Systems

Video Coding IPR Issues

Scalability of MB-level Parallelism for H.264 Decoding

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

Performance evaluation of Motion-JPEG2000 in comparison with H.264/AVC operated in pure intra coding mode

PERCEPTUAL QUALITY COMPARISON BETWEEN SINGLE-LAYER AND SCALABLE VIDEOS AT THE SAME SPATIAL, TEMPORAL AND AMPLITUDE RESOLUTIONS. Yuanyi Xue, Yao Wang

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

1. INTRODUCTION. Index Terms Video Transcoding, Video Streaming, Frame skipping, Interpolation frame, Decoder, Encoder.

Video coding using the H.264/MPEG-4 AVC compression standard

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

Error concealment techniques in H.264 video transmission over wireless networks

Decoder Hardware Architecture for HEVC

CONTEXT-BASED COMPLEXITY REDUCTION

Comparative Study of JPEG2000 and H.264/AVC FRExt I Frame Coding on High-Definition Video Sequences

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

Low Power Design of the Next-Generation High Efficiency Video Coding

AUDIOVISUAL COMMUNICATION

Update on Super HDTV Decoder Project

Motion Video Compression

yintroduction to video compression ytypes of frames ysome video compression standards yinvolves sending:

Dual Frame Video Encoding with Feedback

Transcription:

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System Zhibin Xiao and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis

Outline Introduction to H.264 CAVLC Encoder Features of Target Fine-Grained Many-Core System The Proposed Parallel CAVLC Encoder Results and Performance Analysis Summary

Advanced Video Processing Video applications are everywhere: High definition video, realtime video conference, portable handset

Introduction to H.264/AVC Standard Drafted on May 2003 from JVT formed by ITU and ISO MPEG organization Target from high-definition TV to low-resolution mobile video Huge computation complexity with more data dependency and irregular processings Video Input - Decoder Coder Control Transform/ Quantizer 0 Motion- Compensated Intra/Inter Predictor Deq./Inv. Transform Control Data Quant. Transf. coeffs Entropy Coding Video Output Motion Estimator Motion Data

Introduction of H.264 CAVLC Encoder Context-adaptive variablelength coding (CAVLC) Adopted in H.264 baseline profile Reverse zigzag scanned runlength coding and adaptive coding table selection Up to 27 4x4 or 2x2 blocks within a macroblock in order Less processing regularity Serial in pixel level SIMD approach is not feasible in this case Task-level parallelism is available 16x16 Macroblock CAVLC Processing Order -1 16 17 0 1 4 5 18 19 22 23 2 3 6 7 20 21 24 25 8 9 12 13 10 11 14 15 (a) Luma (b) Chroma Cb/Cr

Introduction of H.264 CAVLC Encoder CAVLC Five parameters of each 4x4 block are coded separately coeff_token, Sign_trail, Levels, Total_zeros, Run_before CAVLC data-flow graph Serial scanning phase Parameter coding phase Parameter Coding Phase input residual data Serial Scanning Phase data receiver zigzag predict nc CAVLC scanning coeff_token encoder sign_trail encoder levels encoder VLC packer encoded bitstream total_zeros encoder run_before encoder

Outline Introduction to H.264 CAVLC Encoder Features of Target Fine-Grained Many-Core System The Proposed Parallel CAVLC Encoder Results and Performance Analysis Summary

Target Many-core System Architecture Key features 164 Enhanced prog. procs. 3 Dedicated-purpose procs. 3 Shared memories Long-distance circuit-switched communication network Dynamic Voltage and Frequency Scaling (DVFS) osc DVFS Core Tile Comm Motion FFT Estimation Viterbi 16 KB Shared Decoder Memories

Project motivation and mapping methdology Fine-grained many-core system for DSP applications energy efficient scalable performance highly flexibile Mapping methdology Sequential C code Parallel C code Fine-grained assembly-level code

Outline Introduction to H.264 CAVLC Encoder Features of Target Fine-Grained Many-Core System The Proposed Parallel CAVLC Encoder Results and Performance Analysis Summary

Parallel CAVLC : Memory Optimization Coeff_Token table selection Encode number of non-zero coefficients (nnz) in current 4x4 block The table index depends on top and left 4x4 blocks A row of nnz values of previous blocks has to be stored in the shared big memory 45 0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15 0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15 0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15 80 0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15 720p HDTV: 324 word memory for nnz Table elimination and compression Levels encoded at runtime Reduce more than 75% table memory for coeff_token, total_zeros and run_before Width compression Zero-value reduction

CAVLC Partition and Dataflow mapping A 20-processor mapping No long-distance link 8 routing processors Run_before Router VLC Binary Packing data_out Every encoding blocks can fit into a fine-grained core (128 word instruction and data memory) Total_zeros Router Router Levels P1 Levels P2 Router data_in Data Receiver Zigzag Reorder CAVLC Scanning Sign_trail Router Router Chroma nc Predicting Luma nc Predicting Router Coeff_ token Router 16 KB Shared Memory

Mapping and Throughput Optimization 15-processor mapping 4 long-distance link Zigzag Reorder CAVLC Scanning Sign_trail Total_zeros Run_before Reduce 5 routing processors data_in Data Receiver Coeff_ token Router1 Router2 Router3 Throughput optimization Readjust workload Chroma nc Predicting Luma nc Predicting Levels P1 Levels P2 VLC Binary Packing data_out Code optimization 16 KB Shared Memory 550 760 172 132 245 34 53 11 19 19 29 34 555 1203 1548 Throughput optimization 377 600 172 44 153 34 53 160 11 44 29 34 426 463 420

Outline Introduction to H.264 CAVLC Encoder Features of Target Fine-Grained Many-Core System The Proposed Parallel CAVLC Encoder Results and Performance Analysis Summary

Parallel CAVLC Encoder Performance Throughput Five QCIF video test sequences with varying Quantization Parmeter (10-40) Scaled performance can achieve 30fps 720p HDTV (1280x720) processing

Performance Comparison with General CPU Performance comparison Intel Core 2 Duo, Intel Pentium 4 and Pentium 4 HT Throughput 4.86-6.83 times better Scaled area 20.2 times smaller

Performance Comparison: traditional DSPs Performance estimation on DSPs CAVLC takes 18.2% computation time for H.264 baseline encoder 1.0-6.15 higher throughput and 6.2 times smaller area compared to TI C642 DSP Scaled to 65nm More demanding test for our design Platform Target App. Processor Type Tech. Area (mm 2 ) Freq. (MHz) Scaled Area to 65nm (mm 2 ) Scaled Freq. to 65nm (MHz) Test Sequence CAVLC Performance (fps 720p) TI C642 CIF 24fps 8-way VLIW 130nm CMOS 72 600 18 1200 50 frames IPPP...P QP=25 28 ADSP BF561 CIF 30fps Dualcore DSP 130nm CMOS N/A 600 N/A 1200 N/A 36 TI C641 QCIF 24.5fps 8-way VLIW 130nm CMOS 72 600 18 1200 100 frames IPPP P QP=28 7.4 This work AsAP 720p HDTV 30fps Array (15 cores) 65nm CMOS 2.89 * 1070 2.89 * 1070 2 frames IP QP=20 36.0-41.3

Processor Activity Analysis & Power Processor activity type Execution Stalls on input or output Analysis Data receiving stall on output 7%-65% active time for most processors Bottleneck: zigzag reorder and CAVLC scanning, over 94% active time Power estimation One processor 59mW@1.07GHz, 1.3V, 65nm 100% active Nearly zero leakage when processor is idle 323mW@1.07GHz, 1.3V, 15-processor + memory

Outline Introduction to H.264 CAVLC Encoder Features of Target Fine-Grained Many-Core System The Proposed Parallel CAVLC Encoder Results and Performance Analysis Summary

Summary Fine-grained many-core system Energy efficient, scalable and flexible Exploiting task-level parallelism The proposed parallel CAVLC encoder 15-processor plus 324 word memory, 720p HDTV at 30 fps 4.86-6.83 times higher scaled throughput than latest generalpurpose processor 1.0-6.15 higher scaled throughput and 6.2 times smaller area compared with traditional DSPs Future work Further power reduction using DVFS A complete parallel H.264 baseline encoder

Acknowledgments Intellasys Inc. SRC GRC Grant 1598 and CSR Grant 1659 ST Microelectronics NSF Grant 0430090 and CAREER Award 0546907 Intel and S Machines Corporation UC Micro UCD Faculty Research Grant

The End Thank You!