WHITE PAPER Taos - A Revolutionary Zero Latency, Multi-Channel, High-Definition H.264 Video Codec Architecture Introduction The Taos H.264 video codec architecture addresses crucial requirements for latency, channel density, video resolution and video quality in real-time video applications, such as video surveillance, video conferencing, wireless video networking and electronic newsgathering. The Taos architecture implements unique features, such as zero latency, high channel-density and HD video quality, which solve the most demanding requirements in these applications. In addition, Taos also addresses equally important system-level issues, such as noise filtering, optimal network bandwidth usage and error resiliency and concealment. Taos builds upon 1st generation low-delay, multi-channel and HD codecs from W&W Communications. As such, Taos is a tried and proven video-codec architecture for practical solutions in realtime video feedback systems. Latency Defined Simply put, latency is defined here as the time lapse between writing the first pixel at the source and producing the first pixel at the decoder output. Latency sensitive video applications require that the time lapse between source and decoded video be extremely small. How small depends on the application, but as a guideline the range is between several milliseconds to less than 33ms. Zero latency refers here to latency of sub 10ms. True Multi-Channel Defined True multi-channel is defined here as independently encodable and decodable video streams. Each video stream is encoded with its own set of encoding parameters. Changing parameters for one stream does not affect other streams and can be done dynamically during the encoding process. Similarly, decoding one stream does not affect the decoding of other streams, including error propagation and concealment. Abstract Taos is a truly revolutionary H.264/ MPEG-4 AVC (Part 10) codec architecture, which provides video processing functionality highly optimized for real-time video feedback systems, such as in video surveillance, video conferencing, video telephony, wireless video networking and electronic newsgathering applications. Its zero latency, true multi-channel and true HD capabilities meet the most difficult-to-satisfy requirements in these applications. Author Kishan Jainandunsing, PhD VP Marketing
2 W&W Communications True HD Defined The HD, or High-Definition, moniker is used in the video industry for resolutions of 1280x720 and upwards. See the chart below. Figure 1. Video resolution chart The term true HD refers here to 1920x1088 resolution at 60 frames per second in progressive scan mode. This represents the highest resolution defined [at present] for HDTV. The Taos Architecture A high-level block diagram representation of the Taos architecture is shown below. At the heart of the architecture is the multi-stream, zero latency, and high-definition H.264 codec. The I/O subsystem supports eight physical video ports, which can be all eight inputs, all eight outputs or any combination of inputs and outputs. Each video port supports multiplexed video streams. This allows Taos to support up to 32 independent video streams simultaneously in encode or decode mode. subsystem provides several functions, such as multiplexing several decoded streams onto a single video display port and on-screen display (OSD) support. An I2C master interface allows video peripheral circuits to be controlled, such as PAL/NTSC encoders and decoders, HDMI receivers and transmitters, CMOS and CCD sensors. A flash memory interface controller provides support for flash devices over a serial interface for storage of Taos configuration settings. A 32-bit/66MHz PCI bus and a 32-bit generic host bus provide communication with an external host processor for network connectivity, audio, driver, operating system and application software support. A high-performance, multi-channel DMA controller handles high-speed data transfers of encoded streams between the codec and external host processor memory. The DMA engine supports scatter/gather data transfers, significantly reducing overhead on the host CPU. Zero Latency Encode-Decode In mainstream implementations the encoding process starts when a complete frame of video is present, introducing at least 33ms of latency at the encoder and another 33ms at the decoder. Together with multi-pass motion estimation, multi-pass rate control and framebased source filtering, traditional implementations can easily exhibit in excess of 200ms encode-decode latency. In contrast, Taos implements fine-grain pipelining at the macro-block level, advanced bit rate prediction and in-loop source filtering. The encoding process starts as soon as the first lines of video are available in a frame. In this way the encoder does not need to wait for an entire frame to be present before it starts encoding. This comes with the extra benefit of very little memory needed for buffering. a) Frame-based pipelining, high latency implementation Figure 2. Taos high-level block diagram representation Dual DDR controllers support external DDR-2 memory and provide sufficient memory bandwidth and storage capacity to support 32 independent video streams at up to 1920x1088 resolution. A video pre-processor subsystem supports several functions, such as de-multiplexing of input video streams, frame rate adaptation, content-adaptive noise filtering, duplication and downscaling. A video post-processor b) Fine-grain pipelining, zero latency implementation Figure 3. Taos zero latency vs. high latency encodingdecoding. In addition, Taos performs single pass motion estimation, single-pass rate control and in-loop contentadaptive motion compensated temporal filtering. This, in combination with the macro-block level fine-grain pipelining, results in sub 2ms encode-decode latency Taos A Revolutionary H.264 Video Codec Architecture
W&W Communications 3 for 1920x1088 video and sub 4ms latency for D1 video. Operation in Baseline, Main or High Profile does not affect latency and video quality. Zero latency can drastically simplify system design in applications where added latency due to other parts in the system, such as transmitters and receivers, is negligible. In these cases complicated A/V time stamping and synchronization schemes are not needed as the low latency of the video stream with respect to the audio stream provides inherent synchronization between the two streams. An example is in electronic newsgathering (ENG), where the compressed video is transmitted over a short-wave radio link from the cameras in the field to a nearby satellite uplink truck. The negligible latency between captured and decoded video, negates the need to insert complex systems to achieve A/V synchronization. In another ENG example camera panning/zooming and video feeds from different camera angles need to be interpreted in real-time by the production crew. Zero or near-zero latency in the video feeds provides inherent synchronization between all the different video feeds and with panning/zooming actions of the cameras. Requirements of sub 33ms latency are very desirable in these applications. session can progress spontaneously and naturally, without the need for awkward and artificial communication protocols between the participants. Requirements of sub 33ms latency are necessary in these applications. a) High latency implementation Unnatural and non-spontaneous conversations b) Taos zero latency implementation Natural and spontaneous conversations Figure 5. Implications of latency in video conferencing applications. Mission critical surveillance applications are another example of applications that are highly sensitive to latency. In case of securing valuables, such as money in a bank, priceless artifacts in a museum, or merchandise in a store, it is important that the area or building where the intrusion occurs is instantly secured. Another latency sensitive surveillance application is in multiple-camera-tracking, where video feeds from several cameras are stitched together chronologically into a single feed, which tracks one or more moving objects of interest. Too much latency in the video feeds makes stitching these together a complicated task and renders the application useless for rapid response action. To track objects moving at normal to high speed, requirements of sub 10ms latency are required. a) High Latency implementation a) High latency implementation b) Taos zero latency implementation Encode Decode Back-end Analytics c) Zero latency multiple-camera-tracking b) Taos zero latency implementation Figure 4. Implications of latency in live video broadcast applications. Video conferencing and video telephony are highly latency-sensitive applications. In case of noticeable delay a conversation becomes impossible, unless a walkytalky like protocol is strictly followed. This makes the conversation unnatural and cumbersome. With Taos zero latency, a video conferencing or video telephony Figure 6. Implications of latency in mission critical video surveillance applications. An emerging application with high sensitivity to latency is wireless video networking in the home. This application has recently gained a lot of interest from CE manufacturers and aims to eliminate the HDMI cable between the HDTV set and video source, such as a settop box, DVD player/recorder or game box. A similar compelling case exists for the computer industry, where the link between laptop or desktop on the one hand and flat panel monitor on the other hand is being replaced by a wireless connection. July 2007
4 W&W Communications 720p stream, or one 1080i/p stream. The aggregate of streams across all 8 ports must always sum up to not more than 32 streams. a) High Latency implementation b) Taos zero latency implementation Figure 7. Implications of latency in wireless video networking applications. In these applications user interaction with the remote control, game pad, keyboard or mouse, should result in instant screen updates. Since transmission at multigigabit per second rates over a highly unpredictable RF link is impractical, video compression is required and requirements of sub 10ms latency are not exaggerated in these applications. Multi-channel Encoding Taos input video ports support temporal and spatial multiplexed streams. Through temporal multiplexing, a single stream is created by time division multiplexing of frames of individual streams. In this case resolution and frame size of all streams must be the same. Only the frame rate may be different between streams. In this mode a single input port can support multiplexing of 32 separate video streams. Alternatively, the 32 streams can be distributed across the eight ports. a) Same resolution & frame size, same frame rates b) Same resolution & frame size, different frame rates Figure 8. Temporally multiplexed video streams. In case of spatial multiplexing a single stream is created by multiplexing the frames of the streams into single frames. In this case the frames may be of different resolution and size, but they must be of the same frame rate. In this mode a single input port can support multiplexing of up to 16 CIF streams, 4 D1 streams, one Figure 9. Spatially multiplexed video streams. In both temporal and spatial multiplexing the individuality of each stream is completely kept in tact and each stream can be fully recovered through de-multiplexing by the Taos pre-processor subsystem. Typical multi-channel applications are found in video conferencing and video surveillance. In video conferencing each camera can capture a single participant or distinct group of participants. Also, usually a separate video stream is allocated for presentations. In video surveillance applications multiple cameras monitor multiple areas. Digital video recorders (DVRs), video servers and multi-sensor camera configurations all benefit from Taos high channel-densities. Multi-channel Decoding The Taos output video ports support multiplexing of decoded video streams. Several modes are supported, such as picture-by-picture (PxP), picture-in-picture (PiP) and picture-on-picture (PoP). Multiplexing of up to 16 video streams per port is possible, with a maximum of 32 streams in aggregate over 8 video output ports. In case of PxP (tiling) an integer number of multiplexed streams must fit within the output resolution. For instance, 4 D1 or 16 CIF frames fit in a 1920x1088 frame. Stream Duplication and Scaling A video input port can duplicate its stream, scale it down and compress it simultaneously with the original stream. For instance, a D1 or 720p30 stream can be copied, scaled down, frame rate-adapted to CIF or QCIF at 15 frames/second and compressed. In a video conferencing application a remote participant can receive this stream on a mobile phone. In a video surveillance application a security guard can receive this stream on a mobile phone, while the higher resolution stream is recorded to hard disk for later analysis. In a home video networking application the scaled down stream can be received on a parent s mobile phone to monitor what the kids are watching. Taos A Revolutionary H.264 Video Codec Architecture
W&W Communications 5 HD Encoding and Decoding Taos has the horsepower to encode or decode HD video up to 1080p60, which satisfies the most demanding applications. The quality of the video lies within 2 to 5% of the theoretical performance delivered by the JVT (Joint Video Team) JM (Joint Model) H.264 reference codec. Figure 10. Taos HD video quality compared to JVT JM results. The ability to process HD quality video has become a must for OEMs of enterprise video conferencing equipment. Falling prices of HD displays and ubiquity of broadband connections is making this possible. Also, early HD video conferencing systems showed marked increase of usability due to the life-like experience provided by an HD video feed. Continuously increasing storage capacity, video processing horsepower and image sensor resolutions against continuously falling prices is causing the video surveillance industry to shift from analog and hybrid systems to fully digital systems and from CIF and VGA resolutions to D1 and HD resolutions. Another trend that is causing the need for higher resolution surveillance video is in analytics, where highpowered back-end servers crunch through the video to perform object recognition. Back-end analytics needs to be run post facto on recorded video streams. Digitally zooming in on a face in the crowd needs to maintain sufficient image detail to be able to perform such analyses. In most practical cases the range of zoom factors to be expected leaves only HD video usable in such applications. Figure 11. Facial recognition using HD quality video. The trend towards everything HD in the home is making support for HD in wireless video networking a hard requirement. Here however, the requirement is to be able to support nothing less than 1080p60 video. Also in electronic newsgathering the trend is towards HD. With the February 2009 FCC deadline to switch off all analog broadcast looming, ENG systems are being converted to handle video formats from 720p30 up to 1080p60. Frame Rates and Resolutions Each video port may operate at different frame rates and resolutions, completely independent from each other. The earlier mentioned ability to handle up to 32 streams can be distributed across video ports. The relationship between frame rate and number of streams at a given resolution is given in the table below for 1080, 720, D1 and CIF resolutions, where n is the number of streams. Resolution 1080i/p 720p D1 CIF Frame Rate (n = streams) 60/n, n 32 120/n, n 32 300/n, n 32 1200/n, n 32 An example of multi-stream, multi-port distribution for different frame rates and resolutions is given in Figure 12. Two conditions apply: 1. The total number of frames per second cannot exceed the equivalent of one 1080p60 stream or the equivalent of 1200 CIF frames/second. 2. The total number of streams cannot exceed 32. July 2007
6 W&W Communications In this example the conversion factor used between CIF and the other resolutions is according to the table below. Resolution CIF 1 D1 4 720p 8 1080i/p 16 Conversion Factor Figure 12. Example of frame rate and resolution distribution. Frame rates and resolutions can be changed dynamically, not exceeding the maximum processing capacity provided by Taos. Macro-block intra-refresh allows an I-frame to be distributed across multiple frames, thus smoothing out bit rate peaks in I-frame forcing and making I-frames more robust under noisy channel conditions an error occurring in an I-frame slice does not corrupt an entire I-frame in this case, but only the slice in which it occurred. Multiple slices are another method to contain and recover from errors quickly. By dividing up frames into multiple slices, an error in a slice does not propagate across the slice s boundary and is thus contained. Multiple slices and macro-block intra-refresh have both the effect of lowering overall bit error rates. Dynamic control of resolution and frame rate is an important feature in multi-channel applications, such as video surveillance and video conferencing. When a particular camera detects increased activity, the system can increase the resolution and frame rate for that particular video stream at the expense of other video streams and redirect Taos resources appropriately. Figure 13. Dynamic resolution and frame rate changes. Error Resiliency and Concealment Taos provides a series of powerful error resiliency features. Among these are variable GoP (Group of Pictures) size, I-frame forcing, macro-block intra-refresh and multiple slices. Variable GoP size can be used to make transmission of the compressed video more robust under noisy channel conditions. I-frame forcing can be used for reasonably noise-free transmission channels, which permit very long or infinite GoP sizes. The few times packets are corrupted or dropped, the decoder requests the encoder to transmit an I-frame, so that the decoder can recover from the problem. Figure 14. Various error resiliency techniques supported by Taos. On the decode side the decoder can either freeze on the frame immediately preceding the corrupted frame, or substitute corrupt macro-blocks with skips to cover them up. These error resiliency and concealment features are very important in real-time video feedback applications. The tolerance for errors is very low in all these applications and recovery must happen fast. Taos provides support for the implementation of H.241 protocols on a host processor for communication between the encoder and decoder. Through this, the decoder can signal the encoder to change GoP size, force an I-frame, change macro-block intra-refresh and multiple-slices parameters. Taos A Revolutionary H.264 Video Codec Architecture
W&W Communications 7 Figure 15. H.241 protocol support for error resiliency by Taos. Bit Rate Control Taos implements constant bit rate (CBR) control for network transmission applications as well as variable bit rate (VBR) control for storage applications. Bit rate control does not affect zero latency. The variance of the bit rate in case of VBR can be set, so as not to exceed available bandwidth of the storage interface. 100% Motion Information Access to motion information is Noisy Source important in video surveillance and video conferencing applications. In a video surveillance application this information can be used to detect an intruder or hazardous situation. In video conferencing applications the information can be used to automatically switch focus to a participant. Through Taos flexibility in reallocating videoprocessing resources dynamically, video frame rates and resolutions can be increased instantly for camera feeds in which motion has been detected. Bit Rate Taos provides raw motion information in two ways. One is by providing motion vector statistics (average, minimum, maximum and variance) across definable regions and the other is by providing complete motion vector maps and SAD (Sum of Absolute Differences) information for entire frames. Both motion information methods are highly computeintensive. Taos therefore off-loads an external host CPU from performing such calculations. Instead, the host may run OEM specific algorithms on the raw motion information, which interpret whether or not motion is occurring, what relevance the motion has and what action to undertake. Noise Filtering Taos implements in-loop, content-adaptive motioncompensated temporal filtering (CA-MCTF). This re-duces noise levels in the source video with filter strengths adaptively changing based on the content. Subjective quality greatly improves by leaving fine detailed features in the video unaffected, while removing random noise. Sharpness and clarity of the video is maintained as much as possible, while encoder bit rates are reduced by up to 45%. The single-pass, in-loop operation of the filter maintains zero encode-decode latency. 55% Bit Rate Figure 16. CA-MCTF for noise filtering. Network Efficiency CA-MTCF Filtered The Taos encoder takes into consideration maximum transmission unit (MTU) size. Slices can be defined as a function of the number of bytes that optimally fits in the MTU. This avoids fragmentation and segmentation. The result is that network bandwidth is not being wasted unnecessarily, but instead is optimally used, without the need for expensive over-provisioning. Programmability and Time-To-Revenue Taos strikes a good balance between programmability and hardwired functionality. Its rich register set provides extensive control over many of the video processing and system interface functions. Thus, developers do not have to take on the arduous, time-consuming and expensive task of application software porting and July 2007
programming of video compression algorithms, as is the case with integrated host CPU and programmable DSP architectures. This in turn means low risk development and quick time-to-revenue for OEMs. Low Power Dissipation and Cost Taos is designed with low power dissipation in mind. Total power dissipation in single channel 1080p60 mode is sub 500mW, or sub 25mW in single channel CIF mode at 30 frames/second. This addresses the most stringent power dissipation requirements for outdoor camera specifications. At the same time the Taos architecture has been designed with low cost in mind. This is achieved through a combination of efficient logic implementation, a 90nm silicon process and high channel densities. The result is the most competitive cost per channel in the industry. Conclusions Taos is a truly revolutionary H.264 codec architecture, which provides video processing functionality highly optimized for real-time video feedback systems, such as in video surveillance, video conferencing, video telephony, wireless video networking and electronic newsgathering applications. Its zero latency, true multi-channel and true HD capabilities meet the most difficult-to-satisfy requirements in these applications. Especially the zero latency capabilities address the most fundamental problem in these real-time feedback systems. By removing the latency normally introduced by video compressiondecompression systems, Taos opens up many more opportunities for H.264 video coding beyond the here mentioned target applications, such as in automotive and robotics. And by increasing channel density beyond any existing solution on the market today, Taos promises to bring down cost drastically in its target markets. For More Information Taos builds on the legacy of the W&W Communications WW10K and WW20K H.264 HD codec chipsets. These encoder-decoder chipsets are capable of the same low latencies. For more information on Taos and these chip- sets, contact W&W Communications at www.wwcoms. com or write an email to info@wwcoms.com. W&W Communications, Inc. reserves the right to make changes to its products and product specifications at any time without notice. W&W Communications is a trademark of W&W Communications, Inc. All other trademarks and registered trademarks are property of their respective holders. Copyright 2001-2006 W&W Communications, Inc. All rights reserved. USA & International Europe China W&W Communications, Inc. 2903 Bunker Hill Lane, Suite 107 Santa Clara, CA 95054, USA Tel: +1.408.481.0264 Fax: +1.408.213.2951 Email: info@wwcoms.com W&W Communications, Inc. Gran Via 6, 4 Madrid, 28013, Spain Tel: +34.91.524.7467 Fax: +34.91.524.7499 Beijing WWComs Info Technology Ltd. Shangdi DongLu #5-1 JingMeng GaoKe Bldg. A, Suite 201 Beijing, China 100085 Tel: +86.10.6296.8780 Fax: +86.10.6296.5943 www. wwcoms.com ww-wp-taos-r-1