How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors

WHITE PAPER How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors Some video frames take longer to process than others because of the nature of digital video compression. These wide variations in video-processing time make correct operation of an ASIC or SOC video-processing system unpredictable. A video processor that minimizes processing-time variations in for each video frame enables the design of a more reliable and less expensive system that also consumes less power. Whether video is processed in a hardware block, a general purpose processor, or an optimized DSP processor, each frame takes a different amount of time to process. This is because each frame of video is different and is compressed in variable ways by the encoder for best efficiency. Most digital video coding standards process video as a sequence of square macroblocks and an important video-compression technique is identifying and coding macroblocks in each video frame image that are identical or similar to their neighbor. Finding nearly identical macroblocks can reduce videocoding time. For example, the sky appearing at the top left corner of the image in Figure 1 is almost identical from one macroblock to the adjacent macroblocks. Figure 1: An example frame showing the Vincent Thomas Bridge between Long Beach and San Pedro, California.

Page 2 Another important technique for achieving compression is identifying objects that have moved from a nearby location in a previous frame of video, such as with the light post on the bridge. The ability to find similar macroblocks within one frame and finding similar macroblocks between successive video frames are essential algorithms used in techniques respectively called intra-frame and inter-frame prediction. Using prediction, a video encoder encodes the location in the current or a previous frame from which to predict each macroblock. The encoder also encodes the imperfections between the prediction and the actual captured pixels. The imperfections, called residuals, tend to be need much smaller data values than unencoded pixels and so less data must be encoded than would be the case without prediction. This pixel reduction leads to compression. Video frames with many finely detailed macroblocks will be less accurately predicted from their neighbors and video frames with many moving objects will be less accurately predicted from frame to frame. The marathon frame in Figure 2(a) requires more data for prediction and yields less accurate prediction than the sky appearing in the frame shown in Figure 2(b). Therefore, a video coder will require more bits to encode the prediction imperfections of the frame in Figure 2(a) than it will for the frame shown in 2(b). (a) (b) Figure 2: A frame from the 2006 New York City marathon (a) and a frame of clear sky (b). The consequences of intra-frame video prediction Some video frames are entirely intra-predicted, which avoids dependencies on previous frames and prevents accumulated random errors or errors caused by algorithmic imprecision from propagating from one frame to the next. Because fully intra-predicted video frames rely only on prediction from neighboring macroblocks within the same frame, the opportunity to increase prediction accuracy using information from previous frames is lost. As a result, intra-frame predictions are less

Page 3 accurate and the residual values are larger, which causes the bitstream data for the compressed frames to become larger as well. Intra-predicted frames typically require at least twice as many bits to encode compared to inter-coded frames. Because the amount of data required to encode video frames differs from frame to frame, video codecs almost invariably need more time to encode intra-predicted (I) frames than inter-predicted (P) or bi-inter-predicted (B) frames. Some video codecs need substantially more time to process I frames while others exhibit more consistent processing times for the different frame types. Video Device Design Video entertainment devices such as Blu-Ray/DVD players, set-top boxes, and portable media players buffer several frames of video before displaying them. By doing so, the output interface can still display video frames from the output buffer at the correct frame rate while the video processor decodes a difficult video frame. The output buffer is then refilled as the video decoder finishes easy-to-decode frames early. Figure 3 illustrates this process. A video processor with little deviation in processing time for different video frames allows the video chip and system to be designed with a smaller, less costly frame-buffer memory. bitstream video decoder display control LCD display frame buffer memory decoder frame 0 (typically an I frame) frame 1 frame 2 frame 3 frame 4 frame 5 frame 6 frame 7 display latency delay frame 0 frame 1 frame 2 frame 3 frame 4 regular interval time Figure 3: A frame buffer evens out the display of decoded video frames When latency is critical Processing latency is critical for real-time video systems, which are used for applications such as 2-way videoconferencing and automotive cameras. These systems can not tolerate the display latency introduced by buffering multiple frames

Page 4 and must run the video codec fast enough to handle the worst-case video frame. A video processor that exhibits little deviation in processing time for different video frames allows the video chip and system to be designed with a lower clock speed and therefore at lower cost, lower power consumption, and higher reliability. Many chip design teams selecting their first video codec IP core do not think to ask about deviation in frame processing time. It s an obscure but important specification. Few IP core vendors measure or specify this characteristic. However, ASIC and SOC designers with deep expertise in video processing always ask this important question and use it as a key decision criterion in selecting their video codec core. The effect of frame-processing time deviation on clock rate The frame-processing time for many video processors varies, with significant deviation between intra- and inter-predicted frames. In Figure 4, for example, it is easy to identify the regularly occurring I frames by the spikes in processing time. The tallest spikes stand out above the average. In this situation, the system design must be made in to accommodate the difficult frames or there will be problems with the displayed frame rate. 60 50 million cycles to process 40 30 20 10 0 1 11 21 31 41 51 61 71 81 frame Figure 4: Video frame processing time on a processor with typical deviation.

Page 5 The clock speed needed to perform video processing, F, is calculated as F = R max N B f = 0 f + B 1 i= f B P i where R is the frame rate, N is the total number of frames processed, B is the number of frames that can be stored in the output display buffer, and P i is the processing time in cycles for frame i. This formula computers the clock rate required to process the worst case window of video frames where the size of the window is the number frames stored in the output display buffers. I frames generally constitute only a small number of the frames in video sequences and are almost invariably surrounded by faster-to-process P and B frames, so the clock frequency needed to process video decreases dramatically just by increasing the output display buffer size from one to two frames. The latency caused by display-frame buffering can not be tolerated for video processors where the display latency is critical, such as for video conferencing, security, and safety applications. As a result, B = 1 in the above equation and the required clock frequency is simply that for the worst case frame of video. F = R N 1 max f =0 ( P ) f Note that the clock rate required for processing a sequence of video frames is lower if the worst-case frame requires less processing time. Frame-processing time deviation for the 388VDO Video Engine Tensilica s 388VDO video DSP processor and the software video codecs that run on it have remarkably low deviation in frame processing time. Figure 5 shows the frame-processing times achieved with this video processor for decoding H.264 video. Compare Figure 5 with Figure 4. The deviation in frame-processing time is lower and the worst case spikes in frame processing time are lower for the 388VDO Video Engine than for typical video processors.

Page 6 60 50 million cycles to process 40 30 20 10 0 1 11 21 31 41 51 61 71 81 frame Figure 5: H.264 video-frame processing time for Tensilica s 388VDO Video Engine The 388VDO Video Engine decodes an H.264 Baseline Profile video stream of a movie trailer video sequence with a ratio of 60 P frames per I frame with a standard deviation of only 22% relative to the average frame-processing time. Key underlying video design details The above result, illustrated in Figure 5, demonstrates that the 388VDO Video Engine makes it easier for ASIC and SOC design teams to develop reliable video features for their designs. The 388VDO Video Engine is actually built from two differently configured Xtensa configurable RISC processor cores and a specialized DMA controller as shown in Figure 6. Distributing the video-processing tasks to two processor cores allows one processor, the Stream processor, to manage the system and handle the compressed bitstream while the other processor, the Pixel processor, handles the heavy duty, macroblock-related DSP processing functions. The DMA block moves data between an external frame buffer and the 388VDO Video Engine s internal scratchpad memories. Efficient use of the DMA controller by the Stream processor for prefetching data allows the Pixel processor to remain busy, which results in comparatively invariant frame-processing times.

Page 7 Figure 3: Block diagram of the 388VDO Video Engine Because I frames are less compressible than P frames, compressed I frames have more residual data that must be decoded in the bitstream. For video processors with separate stream-processing and SIMD cores, the extra entropy decoding performed by the stream core limits the overall throughput rate of the video processor when decompressing I frames. This characteristic accounts for the additional processing time in the I-frame spikes exhibited by most video processors. The instruction set configuration extensions made to the 388VDO Video Engine s stream processor core accelerate entropy decoding, which is why the video processor handles I frames more consistently than other video processors. Conclusion The design of 388VDO Video Engine and its codec software keep frame processing time deviation low. This attribute makes the video processor dependable for many ASIC and SOC designs that have critical timing requirements for video processing. Consistent video-frame processing performance is a key factor in the design of video-processing systems and should play a role in your design team s decision criteria when selecting a video-processing IP core.

Page 8 Note: If you would like some video-processing help or advice on your next ASIC or SOC design, contact Tensilica for a consultation. US Sales Offices: Santa Clara, CA office: 3255-6 Scott Blvd. Santa Clara, CA 95054 Tel: 408-986-8000 Fax: 408-986-8919 San Diego, CA office: 1902 Wright Place, Suite 200 Carlsbad, CA 92008 Tel: 760-918-5654 Fax: 760-918-5505 Boston, MA office: 25 Mall Road, Suite 300 Burlington, MA 01803 Tel: 781-238-6702 x8352 Fax: 781-820-7128 International Sales Offices: Yokohama office (Japan): Xte Shin-Yokohama Building 2F 3-12-4, Shin-Yokohama, Kohoku-ku, Yokohama 222-0033, Japan Tel: 045-477-3373 (+81-45-477-3373) Fax: 045-477-3375 (+81-45-477-3375) UK office (Europe HQ): Asmec Centre Eagle House The Ring Bracknell Berkshire RG12 1HB Tel : +44 1344 38 20 41 Fax : +44 1344 30 31 92 Israel: Amos Technologies Moshe Stein moshe@amost.co.il Beijing office (China HQ): Room 1109, B Building, Bo Tai Guo Ji, 122th Building of Nan Hu Dong Yuan, Wang Jing, Chao Yang District, Beijing, PRC Postcode: 100102 Tel: (86)-10-84714323 Fax: (86)-10-84724103 Taiwan office: 7F-6, No. 16, JiHe Road, ShihLin Dist, Taipei 111, Taiwan ROC Tel: 886-2-2772-2269 Fax: 886-2-66104328 Seoul, Korea office: 27th FL., Korea World Trade Center, 159-1, Samsung-dong, Kangnam-gu, Seoul 135-729, Korea Tel: 82-2-6007-2745 Fax: 82-2-6007-2746