Frame Processing Time Deviations in Video Processors

Similar documents
How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Chapter 10 Basic Video Compression Techniques

17 October About H.265/HEVC. Things you should know about the new encoding.

Chapter 2 Introduction to

Implementation of an MPEG Codec on the Tilera TM 64 Processor

A low-power portable H.264/AVC decoder using elastic pipeline

Principles of Video Compression

Altera's 28-nm FPGAs Optimized for Broadcast Video Applications

Understanding Compression Technologies for HD and Megapixel Surveillance

Motion Video Compression

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

Lossless Compression Algorithms for Direct- Write Lithography Systems

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

H.264/AVC Baseline Profile Decoder Complexity Analysis

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

Logic Devices for Interfacing, The 8085 MPU Lecture 4

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

Multicore Design Considerations

SoC IC Basics. COE838: Systems on Chip Design

AUDIOVISUAL COMMUNICATION

The H.263+ Video Coding Standard: Complexity and Performance

V6118 EM MICROELECTRONIC - MARIN SA. 2, 4 and 8 Mutiplex LCD Driver

JPEG2000: An Introduction Part II

A Low-Power 0.7-V H p Video Decoder

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

06 Video. Multimedia Systems. Video Standards, Compression, Post Production

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

Performance Driven Reliable Link Design for Network on Chips

Digital Video Telemetry System

Milestone Solution Partner IT Infrastructure Components Certification Report

SVC Uncovered W H I T E P A P E R. A short primer on the basics of Scalable Video Coding and its benefits

FPGA Development for Radar, Radio-Astronomy and Communications

Workload Prediction and Dynamic Voltage Scaling for MPEG Decoding

High Performance Carry Chains for FPGAs

New forms of video compression

Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring

Film Grain Technology

EAN-Performance and Latency

AN-ENG-001. Using the AVR32 SoC for real-time video applications. Written by Matteo Vit, Approved by Andrea Marson, VERSION: 1.0.0

Upgrading a FIR Compiler v3.1.x Design to v3.2.x

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

Performance of a Low-Complexity Turbo Decoder and its Implementation on a Low-Cost, 16-Bit Fixed-Point DSP

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Synchronization Issues During Encoder / Decoder Tests

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

ESE (ESE534): Computer Organization. Last Time. Today. Last Time. Align Data / Balance Paths. Retiming in the Large

AMERICAN NATIONAL STANDARD

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Commsonic. (Tail-biting) Viterbi Decoder CMS0008. Contact information. Advanced Tail-Biting Architecture yields high coding gain and low delay.

Ending the Multipoint Videoconferencing Compromise. Delivering a Superior Meeting Experience through Universal Connection & Encoding

White Paper. Video-over-IP: Network Performance Analysis

A RANDOM CONSTRAINED MOVIE VERSUS A RANDOM UNCONSTRAINED MOVIE APPLIED TO THE FUNCTIONAL VERIFICATION OF AN MPEG4 DECODER DESIGN

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

WITH the demand of higher video quality, lower bit

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

MPEG decoder Case. K.A. Vissers UC Berkeley Chamleon Systems Inc. and Pieter van der Wolf. Philips Research Eindhoven, The Netherlands

The H.26L Video Coding Project

Dual frame motion compensation for a rate switching network

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

Video conferencing and display solutions

AE16 DIGITAL AUDIO WORKSTATIONS

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

Certus TM Silicon Debug: Don t Prototype Without It by Doug Amos, Mentor Graphics

UG0651 User Guide. Scaler. February2018

Taos - A Revolutionary Zero Latency, Multi-Channel, High-Definition H.264 Video Codec Architecture

Minimax Disappointment Video Broadcasting

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

On Complexity Modeling of H.264/AVC Video Decoding and Its Application for Energy Efficient Decoding

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

A Fast Constant Coefficient Multiplier for the XC6200

A Highly Scalable Parallel Implementation of H.264

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Dual Frame Video Encoding with Feedback

Design of Fault Coverage Test Pattern Generator Using LFSR

Error Resilient Video Coding Using Unequally Protected Key Pictures

IEEE Santa Clara ComSoc/CAS Weekend Workshop Event-based analog sensing

Video coding standards

12-bit Wallace Tree Multiplier CMPEN 411 Final Report Matthew Poremba 5/1/2009

Display and NetViz technology inside Air Traffic Management architecture

CHARACTERIZATION OF END-TO-END DELAYS IN HEAD-MOUNTED DISPLAY SYSTEMS

REAL-TIME H.264 ENCODING BY THREAD-LEVEL PARALLELISM: GAINS AND PITFALLS

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

Commsonic. Satellite FEC Decoder CMS0077. Contact information

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Figure.1 Clock signal II. SYSTEM ANALYSIS

Hardware Implementation of Block GC3 Lossless Compression Algorithm for Direct-Write Lithography Systems

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

Analysis of Video Transmission over Lossy Channels

Motion Compensation Hardware Accelerator Architecture for H.264/AVC

BUSES IN COMPUTER ARCHITECTURE

A Novel Study on Data Rate by the Video Transmission for Teleoperated Road Vehicles

Transcription:

Tensilica White Paper Frame Processing Time Deviations in Video Processors May, 2008 1

Executive Summary Chips are increasingly made with processor designs licensed as semiconductor IP (intellectual property). Video processing is no exception. The nature of video coding is such that some frames of video take longer to process than others. Wide variations in processing time make correct operation of the chip and the final system unpredictable. A video processor that minimizes the variations in processing time for each frame enables a more reliable, less expensive, and lower power system design. The Tensilica Diamond 388VDO processor excels at minimizing deviations in frame processing time. How Video Is Compressed Whether video is processed in a hardware block, a general purpose processor, or an optimized DSP processor, each frame takes a different amount of time to process. This is because each frame of video is different and is compressed in variable ways by the encoder for best efficiency. Most video coding standards process video as a sequence of square macroblocks. One important technique for video compression is identifying and coding macroblocks in each video frame image that are identical or similar to their neighbor. For example, the sky at the top left corner of the image in Figure 1 is almost identical from one macroblock to its neighbor. Another important technique for achieving compression is by identifying objects that have moved from a nearby location in a previous frame of video, such as with the light post on the bridge. Figure 1: An example frame showing the Vincent Thomas Bridge between Long Beach and San Pedro, California. 2

These techniques are known as intra-frame and inter-frame prediction, respectively. By using prediction, a video encoder must encode the location in the current or a previous frame from which to predict each macroblock and the imperfections between the prediction and the actual captured pixels. Fortunately the imperfections, known as residuals, tend to be much smaller data values than the actual captured pixels and so less data must be encoded than would be the case without prediction. Naturally, video frames with a lot of macroblocks with fine details will be less accurately predicted from their neighbors and video frames with a lot of moving objects will be less accurately predicted from previous video frames. Therefore, the marathon frame in Figure 2(a) will require more data for prediction and yield less accurate prediction and therefore more data to encode the imperfections than the sky frame of Figure 2(b). (a) Figure 2: A frame from the 2006 New York City marathon and a frame of clear sky. The Implications of Prediction Types In video coding, some video frames are entirely intra-predicted. This avoids dependencies on previous frames, which ends the propagation from one frame to the next of any random or imprecision errors that accumulate. Because fully intra-predicted video frames rely only on prediction from neighbors in the same frame, the opportunity to make more accurate predictions from previous frames is lost. As a result, the predictions in intra-predicted frames are less accurate, the residuals values are greater, and the bitstream data for the compresses frame is greater than for inter-predicted video frames. Intrapredicted frames are typically coded with two or more times as many bits as inter-coded frames. Because of the difference in the amount of data required to encode frames, almost invariably codecs take a longer time on intra-predicted (I) frames than on inter-predicted (P) or bi-inter-predicted (B) frames. Some video codecs take dramatically longer to process I frames while others exhibit more invariant processing times for the different frame types. (b) 3

Video Device Design Video entertainment devices such as Blu-Ray/DVD players, set-top boxes, and portable media players are designed to buffer several frames of video before they are displayed. By doing so, while the video processor decodes a difficult video frame, the output interface is still able to display video frames from the buffer at the correct frame rate. The buffer is then refilled as the decoder finishes easy frames early. A video processor with little deviation in processing time for different frames allows the video chip and system to be designed with less frame buffer memory and therefore at lower cost. bitstream video decoder display control LCD display frame buffer memory decoder frame 0 (typically an I frame) frame 1 frame 2 frame 3 frame 4 frame 5 frame 6 frame 7 display latency delay frame 0 frame 1 frame 2 frame 3 frame 4 regular interval time In real-time video systems, such as for 2-way conferencing or automotive cameras, processing latency is critical. Such systems can not tolerate the display latency introduced by buffering multiple frames and must run the video codec at a high enough clock speed to handle the worst possible video frame. A video processor with little deviation in processing time for different frames allows the video chip and system to be designed with a lower clock speed and therefore at lower cost, lower power consumption, and higher reliability. Many chip makers selecting their first video codec IP core do not think to ask about deviation in frame processing time. Few IP core vendors care to measure or specify it. However, chip makers with deep expertise in video processing always ask this important question and use it as a key decision criterion in selecting their video codec core. Calculating the Effect of Frame Processing Time Deviation on Clock Rate The frame processing time for many video processors varies with significant deviation between intra- and inter-predicted frames. In Figure 3, for example, it is easy to identify the regularly occurring I frames by their spikes in processing time. The higher the spikes stand out above the average, the greater accommodations must be made in the system design to accommodate the difficult frames. 4

60 50 million cycles to process 40 30 20 10 0 1 11 21 31 41 51 61 71 81 frame Figure 3: Video frame processing time on a processor with typical deviation. The clock speed needed to perform video processing, F, is calculated as F = R max N B f = 0 f + B 1 where R is the frame rate, N is the total number of frames processed, B is the number of display output frame buffers, and P i is the processing time in cycles for frame i. This is the clock rate required to process the worst case window of video frames where the size of the window is the number of the display output frame buffers. Because I frames are generally a small portion of frames in video sequences and are almost invariably surrounded by faster-to-process P and B frames, this means that the frequency required to process video decreases dramatically with just an output display buffer that is just two frames deep. In the case of display-latency-critical video processors such as for video conferencing and safety-critical applications, display frame buffering can not be tolerated. As a result, B = 1 and the frequency required is simply that for the worst case frame of video. 1 F = R max =0 N f i= f B ( P ) It can also be derived from the two equations above that clock rate required for processing a sequence of video frames is lower if the worst case frame requires less processing time. f P i 5

Frame Processing Time Deviation for Diamond 388VDO The Tensilica Diamond 388VDO video DSP processor and the software codecs that run on it have remarkably low deviation in frame processing time. Figure 4 shows an example of frame processing times possible with the processor for decoding H.264 video. By comparison to Figure 3, the deviation in frame processing time is lower and the worst case spikes in frame processing time are lower for Diamond 388VDO than for typical video processors. 60 50 million cycles to process 40 30 20 10 0 1 11 21 31 41 51 61 71 81 frame Figure 4: Video frame processing time on Tensilica Diamond 388VDO A H.264 Baseline Profile video stream of movie trailer video sequence with a ratio of 60 P frames per I frame decodes on Diamond 388VDO with a standard deviation of only 22% of the average frame processing time. This pegs Tensilica s Diamond 388VDO as a truly steady video codec processor. Key Underlying Design Details Diamond 388VDO achieves this reliability as a benefit of using the Tensilica Xtensa processor and Tensilica s industry-leading tools for simulation, code profiling, and instruction set development. Xtensa is a uniquely configurable embedded processor architecture allowing highly application-specific performance optimizations. Determining the optimal configuration of an Xtensa processor is possible because of the fast and accurate processor simulations, the detailed code profiling, and the clear illustration from these tools of performance bottlenecks and ways that they can be removed with appropriate extensions to the processor. See the Xtensa processor development tool kit product brief for more information. The Diamond 388VDO video processor is actually built from two Xtensa cores and a specialized DMA controller as shown in Figure 5. The separation of data processing into 6

two processors allows one, the Stream core, to manage the system and handle the compressed bitstream while the other, the Pixel core, handles the heavy duty DSP processing functions simultaneously. The DMA block moves data between external frame buffer memory and the internal scratchpad memories of the two cores. Efficient use of the DMA controller by the Stream core for prefetching allows the Pixel core to remain busy, avoiding processor stalls and giving comparatively invariant frame processing time. Diamond 388VDO Xtensa Stream Core (multi-issue) Xtensa Pixel Core (SIMD) 5-channel DMA Xtensa PIF interconnect Port 0 Port 1 Figure 5: Block diagram of Diamond 388VDO Because I frames achieve less compression than P frames they require more residuals data to be decoded from the bitstream. For video processors with separate stream and SIMD cores, this extra entropy decoding on the stream core makes it limit the overall throughput rate of the processor on I frames. This accounts for the amount of processing time in the I frame spikes of most processors. The instruction set extensions in the Diamond 388VDO stream core to accelerate entropy decoding are why the processor is less susceptible to long processing time of I frames than other video processors. Summary The design of Diamond 388VDO and its codec software to keep frame processing time deviation low make it dependable for many chip designs for which reliable performance is critical. This is a key factor in the decision of leading mobile entertainment and realtime video SOC designers to choose Diamond 388VDO as the video processor for their chips. Tensilica, Inc. 3255-6 Scott Blvd. Santa Clara, CA 95054 7

Phone: (408) 986-8000 FAX: (408) 986-8919 8