How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors

Similar documents
Frame Processing Time Deviations in Video Processors

OL_H264MCLD Multi-Channel HDTV H.264/AVC Limited Baseline Video Decoder V1.0. General Description. Applications. Features

OL_H264e HDTV H.264/AVC Baseline Video Encoder Rev 1.0. General Description. Applications. Features

Chapter 2 Introduction to

Understanding Compression Technologies for HD and Megapixel Surveillance

Chapter 10 Basic Video Compression Techniques

Principles of Video Compression

17 October About H.265/HEVC. Things you should know about the new encoding.

A low-power portable H.264/AVC decoder using elastic pipeline

Implementation of an MPEG Codec on the Tilera TM 64 Processor

Motion Video Compression

Video Transmission. Thomas Wiegand: Digital Image Communication Video Transmission 1. Transmission of Hybrid Coded Video. Channel Encoder.

WITH the demand of higher video quality, lower bit

Skip Length and Inter-Starvation Distance as a Combined Metric to Assess the Quality of Transmitted Video

SUMMIT LAW GROUP PLLC 315 FIFTH AVENUE SOUTH, SUITE 1000 SEATTLE, WASHINGTON Telephone: (206) Fax: (206)

In MPEG, two-dimensional spatial frequency analysis is performed using the Discrete Cosine Transform

Video coding standards

The Multistandard Full Hd Video-Codec Engine On Low Power Devices

Fast MBAFF/PAFF Motion Estimation and Mode Decision Scheme for H.264

P1: OTA/XYZ P2: ABC c01 JWBK457-Richardson March 22, :45 Printer Name: Yet to Come

The H.263+ Video Coding Standard: Complexity and Performance

Dual Frame Video Encoding with Feedback

Interframe Bus Encoding Technique and Architecture for MPEG-4 AVC/H.264 Video Compression

Mauricio Álvarez-Mesa ; Chi Ching Chi ; Ben Juurlink ; Valeri George ; Thomas Schierl Parallel video decoding in the emerging HEVC standard

Multicore Design Considerations

Film Grain Technology

Workload Prediction and Dynamic Voltage Scaling for MPEG Decoding

A HIGH THROUGHPUT CABAC ALGORITHM USING SYNTAX ELEMENT PARTITIONING. Vivienne Sze Anantha P. Chandrakasan 2009 ICIP Cairo, Egypt

Digital Video Telemetry System

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Video compression principles. Color Space Conversion. Sub-sampling of Chrominance Information. Video: moving pictures and the terms frame and

Performance Driven Reliable Link Design for Network on Chips

Altera's 28-nm FPGAs Optimized for Broadcast Video Applications

Introduction to Video Compression Techniques. Slides courtesy of Tay Vaughan Making Multimedia Work

1022 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 4, APRIL 2010

Lossless Compression Algorithms for Direct- Write Lithography Systems

Audio and Video II. Video signal +Color systems Motion estimation Video compression standards +H.261 +MPEG-1, MPEG-2, MPEG-4, MPEG- 7, and MPEG-21

New forms of video compression

Introduction to image compression

Interframe Bus Encoding Technique for Low Power Video Compression

Dual frame motion compensation for a rate switching network

Performance Evaluation of Error Resilience Techniques in H.264/AVC Standard

AUDIOVISUAL COMMUNICATION

Motion Compensation Hardware Accelerator Architecture for H.264/AVC

A Novel Macroblock-Level Filtering Upsampling Architecture for H.264/AVC Scalable Extension

The reduction in the number of flip-flops in a sequential circuit is referred to as the state-reduction problem.

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

AN MPEG-4 BASED HIGH DEFINITION VTR

The H.26L Video Coding Project

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

H.264/AVC Baseline Profile Decoder Complexity Analysis

Challenges in the design of a RGB LED display for indoor applications

Modeling and Optimization of a Systematic Lossy Error Protection System based on H.264/AVC Redundant Slices

SoC IC Basics. COE838: Systems on Chip Design

AN-ENG-001. Using the AVR32 SoC for real-time video applications. Written by Matteo Vit, Approved by Andrea Marson, VERSION: 1.0.0

White Paper. Video-over-IP: Network Performance Analysis

SVC Uncovered W H I T E P A P E R. A short primer on the basics of Scalable Video Coding and its benefits

An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions

Evaluating Oscilloscope Mask Testing for Six Sigma Quality Standards

Objectives. Combinational logics Sequential logics Finite state machine Arithmetic circuits Datapath

COMP 249 Advanced Distributed Systems Multimedia Networking. Video Compression Standards

Digital Image Processing

A Low-Power 0.7-V H p Video Decoder

A CYCLES/MB H.264/AVC MOTION COMPENSATION ARCHITECTURE FOR QUAD-HD APPLICATIONS

Design of Fault Coverage Test Pattern Generator Using LFSR

MPEG has been established as an international standard

A Novel Study on Data Rate by the Video Transmission for Teleoperated Road Vehicles

Project Proposal: Sub pixel motion estimation for side information generation in Wyner- Ziv decoder.

VVD: VCR operations for Video on Demand

Minimax Disappointment Video Broadcasting

Commsonic. (Tail-biting) Viterbi Decoder CMS0008. Contact information. Advanced Tail-Biting Architecture yields high coding gain and low delay.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ISCAS.2005.

Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring

Low Power VLSI CMOS Design An Image Processing Chip for RGB to HSI Conversion

Error Concealment for SNR Scalable Video Coding

A High-Performance Parallel CAVLC Encoder on a Fine-Grained Many-core System

Adaptive Key Frame Selection for Efficient Video Coding

ROBUST ADAPTIVE INTRA REFRESH FOR MULTIVIEW VIDEO

Overview: Video Coding Standards

A video signal consists of a time sequence of images. Typical frame rates are 24, 25, 30, 50 and 60 images per seconds.

Optimization of Multi-Channel BCH Error Decoding for Common Cases. Russell Dill Master's Thesis Defense April 20, 2015

Timing Error Detection: An Adaptive Scheme To Combat Variability EE241 Final Report Nathan Narevsky and Richard Ott {nnarevsky,

Selective Intra Prediction Mode Decision for H.264/AVC Encoders

Multimedia Communications. Video compression

International Journal for Research in Applied Science & Engineering Technology (IJRASET) Motion Compensation Techniques Adopted In HEVC

Robust 3-D Video System Based on Modified Prediction Coding and Adaptive Selection Mode Error Concealment Algorithm

H.261: A Standard for VideoConferencing Applications. Nimrod Peleg Update: Nov. 2003

Samsung LED technology A cost-effective, eco-friendly alternative to conventional LCD technology

ATI Theater 650 Pro: Bringing TV to the PC. Perfecting Analog and Digital TV Worldwide

CERIAS Tech Report Preprocessing and Postprocessing Techniques for Encoding Predictive Error Frames in Rate Scalable Video Codecs by E

A RANDOM CONSTRAINED MOVIE VERSUS A RANDOM UNCONSTRAINED MOVIE APPLIED TO THE FUNCTIONAL VERIFICATION OF AN MPEG4 DECODER DESIGN

A Real-Time Encoding and Decoding System for Nonlinear HDTV Editor

Implementation of MPEG-2 Trick Modes

CODING EFFICIENCY IMPROVEMENT FOR SVC BROADCAST IN THE CONTEXT OF THE EMERGING DVB STANDARDIZATION

Memory interface design for AVS HD video encoder with Level C+ coding order

Analysis of Video Transmission over Lossy Channels

High Performance Carry Chains for FPGAs

Microbolometer based infrared cameras PYROVIEW with Fast Ethernet interface

Constant Bit Rate for Video Streaming Over Packet Switching Networks

Contents. xv xxi xxiii xxiv. 1 Introduction 1 References 4

Transcription:

WHITE PAPER How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors Some video frames take longer to process than others because of the nature of digital video compression. These wide variations in video-processing time make correct operation of an ASIC or SOC video-processing system unpredictable. A video processor that minimizes processing-time variations in for each video frame enables the design of a more reliable and less expensive system that also consumes less power. Whether video is processed in a hardware block, a general purpose processor, or an optimized DSP processor, each frame takes a different amount of time to process. This is because each frame of video is different and is compressed in variable ways by the encoder for best efficiency. Most digital video coding standards process video as a sequence of square macroblocks and an important video-compression technique is identifying and coding macroblocks in each video frame image that are identical or similar to their neighbor. Finding nearly identical macroblocks can reduce videocoding time. For example, the sky appearing at the top left corner of the image in Figure 1 is almost identical from one macroblock to the adjacent macroblocks. Figure 1: An example frame showing the Vincent Thomas Bridge between Long Beach and San Pedro, California.

Page 2 Another important technique for achieving compression is identifying objects that have moved from a nearby location in a previous frame of video, such as with the light post on the bridge. The ability to find similar macroblocks within one frame and finding similar macroblocks between successive video frames are essential algorithms used in techniques respectively called intra-frame and inter-frame prediction. Using prediction, a video encoder encodes the location in the current or a previous frame from which to predict each macroblock. The encoder also encodes the imperfections between the prediction and the actual captured pixels. The imperfections, called residuals, tend to be need much smaller data values than unencoded pixels and so less data must be encoded than would be the case without prediction. This pixel reduction leads to compression. Video frames with many finely detailed macroblocks will be less accurately predicted from their neighbors and video frames with many moving objects will be less accurately predicted from frame to frame. The marathon frame in Figure 2(a) requires more data for prediction and yields less accurate prediction than the sky appearing in the frame shown in Figure 2(b). Therefore, a video coder will require more bits to encode the prediction imperfections of the frame in Figure 2(a) than it will for the frame shown in 2(b). (a) (b) Figure 2: A frame from the 2006 New York City marathon (a) and a frame of clear sky (b). The consequences of intra-frame video prediction Some video frames are entirely intra-predicted, which avoids dependencies on previous frames and prevents accumulated random errors or errors caused by algorithmic imprecision from propagating from one frame to the next. Because fully intra-predicted video frames rely only on prediction from neighboring macroblocks within the same frame, the opportunity to increase prediction accuracy using information from previous frames is lost. As a result, intra-frame predictions are less

Page 3 accurate and the residual values are larger, which causes the bitstream data for the compressed frames to become larger as well. Intra-predicted frames typically require at least twice as many bits to encode compared to inter-coded frames. Because the amount of data required to encode video frames differs from frame to frame, video codecs almost invariably need more time to encode intra-predicted (I) frames than inter-predicted (P) or bi-inter-predicted (B) frames. Some video codecs need substantially more time to process I frames while others exhibit more consistent processing times for the different frame types. Video Device Design Video entertainment devices such as Blu-Ray/DVD players, set-top boxes, and portable media players buffer several frames of video before displaying them. By doing so, the output interface can still display video frames from the output buffer at the correct frame rate while the video processor decodes a difficult video frame. The output buffer is then refilled as the video decoder finishes easy-to-decode frames early. Figure 3 illustrates this process. A video processor with little deviation in processing time for different video frames allows the video chip and system to be designed with a smaller, less costly frame-buffer memory. bitstream video decoder display control LCD display frame buffer memory decoder frame 0 (typically an I frame) frame 1 frame 2 frame 3 frame 4 frame 5 frame 6 frame 7 display latency delay frame 0 frame 1 frame 2 frame 3 frame 4 regular interval time Figure 3: A frame buffer evens out the display of decoded video frames When latency is critical Processing latency is critical for real-time video systems, which are used for applications such as 2-way videoconferencing and automotive cameras. These systems can not tolerate the display latency introduced by buffering multiple frames

Page 4 and must run the video codec fast enough to handle the worst-case video frame. A video processor that exhibits little deviation in processing time for different video frames allows the video chip and system to be designed with a lower clock speed and therefore at lower cost, lower power consumption, and higher reliability. Many chip design teams selecting their first video codec IP core do not think to ask about deviation in frame processing time. It s an obscure but important specification. Few IP core vendors measure or specify this characteristic. However, ASIC and SOC designers with deep expertise in video processing always ask this important question and use it as a key decision criterion in selecting their video codec core. The effect of frame-processing time deviation on clock rate The frame-processing time for many video processors varies, with significant deviation between intra- and inter-predicted frames. In Figure 4, for example, it is easy to identify the regularly occurring I frames by the spikes in processing time. The tallest spikes stand out above the average. In this situation, the system design must be made in to accommodate the difficult frames or there will be problems with the displayed frame rate. 60 50 million cycles to process 40 30 20 10 0 1 11 21 31 41 51 61 71 81 frame Figure 4: Video frame processing time on a processor with typical deviation.

Page 5 The clock speed needed to perform video processing, F, is calculated as F = R max N B f = 0 f + B 1 i= f B P i where R is the frame rate, N is the total number of frames processed, B is the number of frames that can be stored in the output display buffer, and P i is the processing time in cycles for frame i. This formula computers the clock rate required to process the worst case window of video frames where the size of the window is the number frames stored in the output display buffers. I frames generally constitute only a small number of the frames in video sequences and are almost invariably surrounded by faster-to-process P and B frames, so the clock frequency needed to process video decreases dramatically just by increasing the output display buffer size from one to two frames. The latency caused by display-frame buffering can not be tolerated for video processors where the display latency is critical, such as for video conferencing, security, and safety applications. As a result, B = 1 in the above equation and the required clock frequency is simply that for the worst case frame of video. F = R N 1 max f =0 ( P ) f Note that the clock rate required for processing a sequence of video frames is lower if the worst-case frame requires less processing time. Frame-processing time deviation for the 388VDO Video Engine Tensilica s 388VDO video DSP processor and the software video codecs that run on it have remarkably low deviation in frame processing time. Figure 5 shows the frame-processing times achieved with this video processor for decoding H.264 video. Compare Figure 5 with Figure 4. The deviation in frame-processing time is lower and the worst case spikes in frame processing time are lower for the 388VDO Video Engine than for typical video processors.

Page 6 60 50 million cycles to process 40 30 20 10 0 1 11 21 31 41 51 61 71 81 frame Figure 5: H.264 video-frame processing time for Tensilica s 388VDO Video Engine The 388VDO Video Engine decodes an H.264 Baseline Profile video stream of a movie trailer video sequence with a ratio of 60 P frames per I frame with a standard deviation of only 22% relative to the average frame-processing time. Key underlying video design details The above result, illustrated in Figure 5, demonstrates that the 388VDO Video Engine makes it easier for ASIC and SOC design teams to develop reliable video features for their designs. The 388VDO Video Engine is actually built from two differently configured Xtensa configurable RISC processor cores and a specialized DMA controller as shown in Figure 6. Distributing the video-processing tasks to two processor cores allows one processor, the Stream processor, to manage the system and handle the compressed bitstream while the other processor, the Pixel processor, handles the heavy duty, macroblock-related DSP processing functions. The DMA block moves data between an external frame buffer and the 388VDO Video Engine s internal scratchpad memories. Efficient use of the DMA controller by the Stream processor for prefetching data allows the Pixel processor to remain busy, which results in comparatively invariant frame-processing times.

Page 7 Figure 3: Block diagram of the 388VDO Video Engine Because I frames are less compressible than P frames, compressed I frames have more residual data that must be decoded in the bitstream. For video processors with separate stream-processing and SIMD cores, the extra entropy decoding performed by the stream core limits the overall throughput rate of the video processor when decompressing I frames. This characteristic accounts for the additional processing time in the I-frame spikes exhibited by most video processors. The instruction set configuration extensions made to the 388VDO Video Engine s stream processor core accelerate entropy decoding, which is why the video processor handles I frames more consistently than other video processors. Conclusion The design of 388VDO Video Engine and its codec software keep frame processing time deviation low. This attribute makes the video processor dependable for many ASIC and SOC designs that have critical timing requirements for video processing. Consistent video-frame processing performance is a key factor in the design of video-processing systems and should play a role in your design team s decision criteria when selecting a video-processing IP core.

Page 8 Note: If you would like some video-processing help or advice on your next ASIC or SOC design, contact Tensilica for a consultation. US Sales Offices: Santa Clara, CA office: 3255-6 Scott Blvd. Santa Clara, CA 95054 Tel: 408-986-8000 Fax: 408-986-8919 San Diego, CA office: 1902 Wright Place, Suite 200 Carlsbad, CA 92008 Tel: 760-918-5654 Fax: 760-918-5505 Boston, MA office: 25 Mall Road, Suite 300 Burlington, MA 01803 Tel: 781-238-6702 x8352 Fax: 781-820-7128 International Sales Offices: Yokohama office (Japan): Xte Shin-Yokohama Building 2F 3-12-4, Shin-Yokohama, Kohoku-ku, Yokohama 222-0033, Japan Tel: 045-477-3373 (+81-45-477-3373) Fax: 045-477-3375 (+81-45-477-3375) UK office (Europe HQ): Asmec Centre Eagle House The Ring Bracknell Berkshire RG12 1HB Tel : +44 1344 38 20 41 Fax : +44 1344 30 31 92 Israel: Amos Technologies Moshe Stein moshe@amost.co.il Beijing office (China HQ): Room 1109, B Building, Bo Tai Guo Ji, 122th Building of Nan Hu Dong Yuan, Wang Jing, Chao Yang District, Beijing, PRC Postcode: 100102 Tel: (86)-10-84714323 Fax: (86)-10-84724103 Taiwan office: 7F-6, No. 16, JiHe Road, ShihLin Dist, Taipei 111, Taiwan ROC Tel: 886-2-2772-2269 Fax: 886-2-66104328 Seoul, Korea office: 27th FL., Korea World Trade Center, 159-1, Samsung-dong, Kangnam-gu, Seoul 135-729, Korea Tel: 82-2-6007-2745 Fax: 82-2-6007-2746