Design Challenge of a QuadHDTV Video Decoder

Similar documents
Update on Super HDTV Decoder Project

A Low-Power 0.7-V H p Video Decoder

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

Overview: Video Coding Standards

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

17 October About H.265/HEVC. Things you should know about the new encoding.

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Video coding standards

Multimedia Communications. Video compression

The H.26L Video Coding Project

Chapter 2 Introduction to

Multimedia Communications. Image and Video compression

ATSC vs NTSC Spectrum. ATSC 8VSB Data Framing

Multicore Design Considerations

Digital Image Processing

Hardware Decoding Architecture for H.264/AVC Digital Video Standard

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

A Single-chip MPEG2 Video Encoder LSI with Multi-chip Configuration for a Single-board Encoder

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

Advanced Computer Networks

Alain Legault Hardent. Create Higher Resolution Displays With VESA Display Stream Compression

Chapter 10 Basic Video Compression Techniques

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

An Overview of Video Coding Algorithms

Video Compression. Representations. Multimedia Systems and Applications. Analog Video Representations. Digitizing. Digital Video Block Structure

A Novel VLSI Architecture of Motion Compensation for Multiple Standards

WITH the demand of higher video quality, lower bit

Research Topic. Error Concealment Techniques in H.264/AVC for Wireless Video Transmission in Mobile Networks

A Low Power Implementation of H.264 Adaptive Deblocking Filter Algorithm

So far. Chapter 4 Color spaces Chapter 3 image representations. Bitmap grayscale. 1/21/09 CSE 40373/60373: Multimedia Systems

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Principles of Video Compression

Memory interface design for AVS HD video encoder with Level C+ coding order

Content storage architectures

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

Decoder Hardware Architecture for HEVC

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Frame Processing Time Deviations in Video Processors

THE new video coding standard H.264/AVC [1] significantly

Digital Video Telemetry System

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

A High Performance VLSI Architecture with Half Pel and Quarter Pel Interpolation for A Single Frame

Video 1 Video October 16, 2001

Film Grain Technology

HEVC: Future Video Encoding Landscape

Low-Power Techniques for Video Decoding. Daniel Frederic Finchelstein

Verification Methodology for a Complex System-on-a-Chip

A low-power portable H.264/AVC decoder using elastic pipeline

Study of AVS China Part 7 for Mobile Applications. By Jay Mehta EE 5359 Multimedia Processing Spring 2010

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

Hardware study on the H.264/AVC video stream parser

Welcome Back to Fundamentals of Multimedia (MR412) Fall, ZHU Yongxin, Winson

MPEG decoder Case. K.A. Vissers UC Berkeley Chamleon Systems Inc. and Pieter van der Wolf. Philips Research Eindhoven, The Netherlands

Introduction to image compression

Feasibility Study of Stochastic Streaming with 4K UHD Video Traces

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Reduced complexity MPEG2 video post-processing for HD display

Motion Compensation Hardware Accelerator Architecture for H.264/AVC

THE architecture of present advanced video processing BANDWIDTH REDUCTION FOR VIDEO PROCESSING IN CONSUMER SYSTEMS

Scalability of MB-level Parallelism for H.264 Decoding

A CONFIGURABLE H.265-COMPATIBLE MOTION ESTIMATION ACCELERATOR ARCHITECTURE SUITABLE FOR REALTIME 4K VIDEO ENCODING

MULTIMEDIA TECHNOLOGIES

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

Cisco Explorer 4642HD and 4652HD High- Definition Set-Tops

Technical Note PowerPC Embedded Processors Video Security with PowerPC

Motion Re-estimation for MPEG-2 to MPEG-4 Simple Profile Transcoding. Abstract. I. Introduction

Video Over Mobile Networks

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

A Real-Time Encoding and Decoding System for Nonlinear HDTV Editor

Midterm Review. Yao Wang Polytechnic University, Brooklyn, NY11201

Motion Video Compression

TOWARD A FOCUSED MARKET William Bricken September A variety of potential markets for the CoMesh product. TARGET MARKET APPLICATIONS

EXOSTIV TM. Frédéric Leens, CEO

IEEE802.11a Based Wireless AV Module(WAVM) with Digital AV Interface. Outline

STUDY OF AVS CHINA PART 7 JIBEN PROFILE FOR MOBILE APPLICATIONS

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

A QFHD 30 fps HEVC Decoder Design

yintroduction to video compression ytypes of frames ysome video compression standards yinvolves sending:

A High Performance Deblocking Filter Hardware for High Efficiency Video Coding

CHROMA CODING IN DISTRIBUTED VIDEO CODING

Cisco Explorer 4640HD and 4650HD High-Definition Set-Tops

The H.263+ Video Coding Standard: Complexity and Performance

H.264/AVC Baseline Profile Decoder Complexity Analysis

Design and analysis of microcontroller system using AMBA- Lite bus

Video Coding IPR Issues

MPEG-2. ISO/IEC (or ITU-T H.262)

Digital Media. Daniel Fuller ITEC 2110

A Study on AVS-M video standard

Microwave PSU Broadcast DvB Streaming Network

Video (Fundamentals, Compression Techniques & Standards) Hamid R. Rabiee Mostafa Salehi, Fatemeh Dabiran, Hoda Ayatollahi Spring 2011

Video Compression - From Concepts to the H.264/AVC Standard

Hi3518A Professional HD IP Camera SoC. Brief Data Sheet. Issue 03. Date Baseline Date

Improving Quality of Video Networking

FINAL REPORT PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT

Transcription:

Design Challenge of a QuadHDTV Video Decoder Youn-Long Lin Department of Computer Science National Tsing Hua University MPSOC27, Japan

More Pixels YLLIN NTHU-CS 2

NHK Proposes UHD TV Broadcast Super HiVision 768x432 pixels at 6 fps (XHDTV) Baseband signal is 24 Gbps. Using MPEG-2 encoding chips, the signal was compressed to 25 Mbps for transmission. HDTV signals at present are 1.5 Gbps for baseband and 2 Mbps for compressed signals. High Performance compression / decompression and transmission / storage are needed for 24 Gbps 3 Mbps YLLIN NTHU-CS 3

768x432 UHD TV 384x2 QFHD TV 192x18 HDTV SDTV YLLIN NTHU-CS 4

Video Coding Technology Trend H.264 5% 69% YLLIN NTHU-CS 5

Features of Video Coding Standards Standard MPEG-1 MPEG-2 MPEG-4 H.264 MB size * *(frame) * * Block size 8*8 8*8 *, 8*8 *, *8, 8*, 8*8, 8*4, 4*8, 4*4 Transform DCT DCT DCT/ Wavelet 4*4 int transform Entropy coding VLC VLC VLC VLC, CAVLC and CABAC ME, MC Yes Yes Yes 41 MVs per MB Pixel accuracy ½ pel ½ pel ¼ pel ¼ pel Reference frames One frame One frame One frame Multiple (5) frames Picture type I, P, B I, P, B I, P, B I, P, B Transmission rate Up to 1.5 Mbps 2-15 Mbps 64kbps2Mbps 64kbps 15Mbps YLLIN NTHU-CS 6

Not all H.264/AVC systems are equal Relative Computational Complexity #Ref Frames Search Range 8 32 5.9 24.6 55.7 1 1 2.54 8.87 Video Coding with H.264/AVC: Tools, Performance and Complexity, J. Ostermann et al, IEEE CAS Mag., Q1 24. YLLIN NTHU-CS 7

Quality vs Bit-rate vs Decoding Throughput Decoding Capability of a 6MHz CPU QP 21 26 Bit Rate (Kbps) 1723 74 37 Fps 44 55 65 H.264/AVC Baseline Profile Decoder Complexity Analysis, M. Horowitz, IEEE T-CSVT, July 23 YLLIN NTHU-CS 8

Our Target Single-Chip Decoder for QFHD (384x2) H.264/AVC High Profile Video CABAD 8x8 Transform Commodity DDR External Memory Platform-Based Design YLLIN NTHU-CS 9

Performance Resolution Size Clock Frequency Application SQCIF (128 x 96) 1..4 MHz Video phone QCIF (176 x 144) 2..8 MHz CIF (352 x 288) 8.3 3 MHz Mobile TV D2 (72 x 48) 28.1 1 MHz Car TV Surveillance 72HD (18 x 72) 75. 3 MHz Home theater 18HD (192 x 188) 17. 62 MHz QFHD (384 x 2) 675. 249 MHz Digital signage Medical video Satellite image Space exploration YLLIN NTHU-CS 1

Essential Issues Memory Tradeoff Between the Size of Internal Memory and Bandwidth of External Access Massive Parallelism Macroblock Decoding Scheduling YLLIN NTHU-CS 11

NTHU H.264 Decoder Architecture CPU Display Memory Controller Ethernet AHB MAU & AMBA Interface Translator Parser CAVLD/ CABAD coeff mvdinfo IQ & IT MVG residual mv & ridx IPRED INTERP BSG recon bs DF para & predinfo H.264 Video Decoder YLLIN NTHU-CS 12

Memory

Memory Size (Bytes) size vs. b/w in ME 124929 D Full HD 3fps, # of rf =1, SRV=SRH=64 Level A : 24 Bytes, 19658 MB/s Level B : 12 Bytes, 15MB/s Level C: 4977 Bytes, 317MB/s Level D: 124,929 Bytes, 62 MB/s 4977 C B 12 A 24 62 317 15 19658 Memory Bandwidth (MB/s) YLLIN NTHU-CS 14

rf mem rf1 mem CB mem rf AG CB AG IME block diagram rf router rf reg array CMB reg CMB reg CMB reg CMB reg comparator comparator comparator comparator MVGen rf MVGen rf MVGen rf MVGen rf MVGen rf MVGen rf MVGen rf MVGen rf YLLIN NTHU-CS 15 MV AG MV mem

Memory Size (Bytes) size vs. b/w in ME 124929 D C 4977 B 12 24 ours A 62 317 15 19658 Memory Bandwidth (MB/s) YLLIN NTHU-CS

Reference-data Pre-fetch System No redundant fetching Collecting several MB s motion vectors, and read the same place by only one single operation Minimize the number of burst initials Averagely 2 burst initials per MB (1 for luma, 1 for chroma) : a group of sequentially read (burst read) YLLIN NTHU-CS 17

Reference-data Pre-fetch System (Cont) CABAC.... MB1 MB9 MB8 MB7 R7 Reference Region & Index Register R6 MB6 MB7 MB7 Region Information MB7 MV R5 MB4 MB5 MB6 MB7 Translator Motion Vector Generator R4 MB4 MB5 R3 MB2 MB3 MB4 R2 MB1 MB2 R1 MB MB1 MB2 R MB R2 Information Region Analyzer / Searcher OES manager MAU Interface Buffer R2 Information R R1 R2 R2 Data from SDRAM R/R1 Data MB7 Information MB7 MB6 MB5 MB4 MB3 MB2 MB1 MB Interp YLLIN NTHU-CS 18

Massive Parallelism

RLD/IQ/IDCT Timing Diagram 3 122 14 144 1 195 212 219 t coeflag_mem read 2 1 1 1 1 1 1 1 1 1 1 1 1 1 coeff_mem read luma ac 1 luma ac_14_15 4 dc 15 chroma ac 1 15 15 15 chroma ac_6_7 IQ stage 1 luma ac 1 luma ac_14_15 15 chroma ac 1 15 15 15 chroma ac_6_7 IQ stage 2 luma ac 1 luma ac_14_15 15 chroma ac 1 15 15 15 chroma ac_6_7 IDCT stage 1 1 1 1 1 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 4 4 4 1 1 1 1 IDCT stage 2 4 4 4 4 4 4 4 4 4 4 4 4 residual_mem write 4 4 4 4 4 4 4 4 4 4 4 4 1 YLLIN NTHU-CS 2

DF Timing Diagram YLLIN NTHU-CS 21

Dual Pipelined Edge Filter Stage 1 Read Pixels L L1 L2 L3 M M1 M2 M3 R R1 R2 R3 Stage 2 Strong filter (Bs=4)/ Left delta calculation R21 delta calculation L1 L11 L12 L13 Left delta M1 M11 M12 M13 R21 delta R1 R11 R12 R13 Stage 3 Left Weak Filter (Bs<4) Right delta calculation R21 filter L2 L21 L22 L23 M2 M21 M22 M23 Right delta R2 R21 R22 R23 Stage 4 Right Weak filter (Bs<4) L31 L3 L32 L33 M3 M31 M32 M33 R3 R31 R32 R33 Stage 5 Write Pixels YLLIN NTHU-CS 22

System-Level Optimization Cyclic-Queue-Based IP Interface

Sequential Decoder Timing Diagram (I Frame) PARSER CABAD IQ/IT BSG IPRED DF Header information decode Initial context table and condition offset MB decode MB 1 decode MB 2 decode (time) YLLIN NTHU-CS 24

Elastic Pipeline Decoder Timing Diagram (I Frame) PARSER CABAD IQ/IT BSG IPRED DF (time) Header information decode Initial context table and condition offset MB decode MB 4 decode MB 1 decode MB 5 decode MB 2 decode MB 6 decode YLLIN NTHU-CS MB 3 decode 25

ASAP Decode with Cyclic Queue Timing Diagram (I frame) PARSER CABAD IQ/IT BSG IPRED DF Header information decode Initial context table and condition offset MB decode MB 1 decode MB 5 decode MB 6 decode MB 2 decode MB 7 decode YLLIN NTHU-CS MB 3 decode MB 8 26 MB 4 decode decode (time)

Comparison of Different Scheduling Methods (Cycles/ MB) 65 6 55 5 486 62 644 8.3 54 9 8 7 KB 45 4 486 5.6 5.6 6 35 5 3 4 25 2 2.62 3 15 1 5 1 159 14 2 1 Sequential Elastic Pipeline ASAP Ping-Pong ASAP Cyclicqueue SRAM Usage Turnaround Cycle Processing Cycle YLLIN NTHU-CS 27 Test Pattern: pedestrian Resolution: 72*48 QP: 28 GOP: III Frame #: 3

Verification Environment H264 filelist tbench fpga_lib rtl_sim asic_lib mfu amba_wrap top lm_wrap main_ctrl Easy Bug Tracing gate_sim mvg bsg parser mem cabad vn idct ipred nlint interp netlist df Sub IP jm11. hd_amba syn def filelist tbench rtl_sim xilinx_mem altera_mem artisan_mem rtl syn vn nlint gate_sim YLLIN NTHU-CS 28

A Multimedia SOC Platform CPU Accelerator (FPGA) USB(PHY) Daughter Board ROM/ Flash Memory SRAM SDRAM FPGA VIC USB 2. Static memory SDRAM Controller(4-CH) High-Speed Bus JPEG Codec DMA SRAM PWM WDT TIMER APB Bridge Capture Display Controller Peripheral Bus DAI SSI SD SM UART GPIO 12C Audio Codec I2S Flash memory with SSI Video-In CCIR61 TV/LCD YLLIN NTHU-CS 29 Flash Card Button LED

Summary Super High Definition Video Capturing, Delivery and Display are on the Horizon Massive Parallelism is Essential for Making Consumer Applications Possible Tradeoff Among Memory Usage, Bandwidth and Logic Has Profound Impact on the Overall System Performance System Design Should Be Adaptable to Content, Quality Variation YLLIN NTHU-CS 3