Altera's 28-nm FPGAs Optimized for Broadcast Video Applications

Similar documents
Implementing Audio IP in SDI II on Arria V Development Board

Video and Image Processing Suite

Upgrading a FIR Compiler v3.1.x Design to v3.2.x

2. Logic Elements and Logic Array Blocks in the Cyclone III Device Family

White Paper Lower Costs in Broadcasting Applications With Integration Using FPGAs

SDI Audio IP Cores User Guide

Video and Image Processing Suite User Guide

Serial Digital Interface Reference Design for Stratix IV Devices

SignalTap Analysis in the Quartus II Software Version 2.0

AN 776: Intel Arria 10 UHD Video Reference Design

Frame Processing Time Deviations in Video Processors

SDI Audio IP Cores User Guide

Serial Digital Interface II Reference Design for Stratix V Devices

Serial Digital Interface Demonstration for Stratix II GX Devices

SignalTap Plus System Analyzer

Bitec. HSMC Quad Video Mosaic Reference Design. DSP Solutions for Industry & Research. Version 0.1

White Paper Versatile Digital QAM Modulator

SDI II MegaCore Function User Guide

Intel FPGA SDI II IP Core User Guide

Altera JESD204B IP Core and ADI AD9144 Hardware Checkout Report

9. Synopsys PrimeTime Support

GM69010H DisplayPort, HDMI, and component input receiver Features Applications

Altera JESD204B IP Core and ADI AD6676 Hardware Checkout Report

. ImagePRO. ImagePRO-SDI. ImagePRO-HD. ImagePRO TM. Multi-format image processor line

12. IEEE (JTAG) Boundary-Scan Testing for the Cyclone III Device Family

Commsonic. Satellite FEC Decoder CMS0077. Contact information

Data Converters and DSPs Getting Closer to Sensors

OF AN ADVANCED LUT METHODOLOGY BASED FIR FILTER DESIGN PROCESS

An FPGA Based Solution for Testing Legacy Video Displays

SMPTE 292M EG-1 Color Bar Generation, RP 198 Pathological Generation, Grey Pattern Generation IP Core - AN4088

Hardware Implementation for the HEVC Fractional Motion Estimation Targeting Real-Time and Low-Energy

Altera JESD204B IP Core and ADI AD9250 Hardware Checkout Report

Using SignalTap II in the Quartus II Software

SWITCHED INFINITY: SUPPORTING AN INFINITE HD LINEUP WITH SDV

SMPTE 259M EG-1 Color Bar Generation, RP 178 Pathological Generation, Grey Pattern Generation IP Core AN4087

SDI II IP Core User Guide

Commsonic. (Tail-biting) Viterbi Decoder CMS0008. Contact information. Advanced Tail-Biting Architecture yields high coding gain and low delay.

AN 848: Implementing Intel Cyclone 10 GX Triple-Rate SDI II with Nextera FMC Daughter Card Reference Design

MIPI D-PHY Bandwidth Matrix Table User Guide. UG110 Version 1.0, June 2015

Bitec. HSMC DVI 1080P Colour-Space Conversion Reference Design. DSP Solutions for Industry & Research. Version 0.1

Innovative Fast Timing Design

Efficient FPGA-based Video Systems. Aaron Behman Xilinx

How to Manage Video Frame- Processing Time Deviations in ASIC and SOC Video Processors

11. JTAG Boundary-Scan Testing in Stratix V Devices

GM68020H. DisplayPort receiver. Features. Applications

ATI Theater 650 Pro: Bringing TV to the PC. Perfecting Analog and Digital TV Worldwide

Conver'ng SD and HD Content to 4K Resolu'on: Tradi'onal Up- Conversion Is Not Enough. Jed Deame February 22, 2013

AN1035: Timing Solutions for 12G-SDI

The Avivo Display Engine. Delivering Video and Display Excellence

M598. Radeon E8860 (Adelaar) Video & Graphics PMC. Aitech

INTRODUCTION AND FEATURES

The Extron MGP 464 is a powerful, highly effective tool for advanced A/V communications and presentations. It has the

IE1204 Digital Design. F11: Programmable Logic, VHDL for Sequential Circuits. Masoumeh (Azin) Ebrahimi

EXOSTIV TM. Frédéric Leens, CEO

AN 823: Intel FPGA JESD204B IP Core and ADI AD9625 Hardware Checkout Report for Intel Stratix 10 Devices

HD-SDI to HDMI Scaler

A Fast Constant Coefficient Multiplier for the XC6200

Using the Quartus II Chip Editor

G-106Ex Single channel edge blending Processor. G-106Ex is multiple purpose video processor with warp, de-warp, video wall control, format

MULTIMEDIA TECHNOLOGIES

ATEM Television Studio HD

By David Acker, Broadcast Pix Hardware Engineering Vice President, and SMPTE Fellow Bob Lamm, Broadcast Pix Product Specialist

Broadcast H.264 files live with ATEM Television Studio!

G-106 GWarp Processor. G-106 is multiple purpose video processor with warp, de-warp, video wall control, format conversion,

DM-RMC-4KZ-100-C: DigitalMedia 8G+ 4K60 4:4:4 HDR Receiver and Room Controller

Digital Blocks Semiconductor IP

UG0651 User Guide. Scaler. February2018

IE1204 Digital Design F11: Programmable Logic, VHDL for Sequential Circuits

Avivo and the Video Pipeline. Delivering Video and Display Perfection

March 13, :36 vra80334_appe Sheet number 1 Page number 893 black. appendix. Commercial Devices

EZwindow4K-LL TM Ultra HD Video Combiner

PMC-704 Dual Independent Graphics Input/Output PMC

E2 Full-sized Event Master processor

VC100XUSB-Pro Installation Guide

GM60028H. DisplayPort transmitter. Features. Applications

What is ASPECT RATIO and When Should You Use It? A Guide for Video Editors and Motion Designers

3GSDI to HDMI 1.3 Converter

National Park Service Photo MC-4000 Master Control Processor

Implementation of an MPEG Codec on the Tilera TM 64 Processor

G406 application note for projector

This paper is a preprint of a paper accepted by Electronics Letters and is subject to Institution of Engineering and Technology Copyright.

PCI Frame Grabber. Model 611 (Rev.D)

ATEM Television Studio Pro HD

A better way to get visual information where you need it.

ATEM Television Studio HD

ArcticLink III VX6 Solution Platform Data Sheet

System Memory Requirements for Digital TV and Set-Top Platforms

The ASI demonstration uses the Altera ASI MegaCore function and the Cyclone video demonstration board.

Digital Blocks Semiconductor IP

Beyond the Resolution: How to Achieve 4K Standards

New forms of video compression

RX460 4GB PCIEX16 4 X DisplayPort

IP FLASH CASTER. Transports 4K Uncompressed 4K AV Signals over 10GbE Networks. HDMI 2.0 USB 2.0 RS-232 IR Gigabit LAN

Radian Video Wall Processor

HD ENCODULATOR TM, SD ENCODULATOR TM LUMANTEK

ATEM Television Studio HD

Multiband Noise Reduction Component for PurePath Studio Portable Audio Devices

Milestone Leverages Intel Processors with Intel Quick Sync Video to Create Breakthrough Capabilities for Video Surveillance and Monitoring

Bring out the Best in Pixels Video Pipe in Intel Processor Graphics

Pivoting Object Tracking System

Transcription:

Altera's 28-nm FPGAs Optimized for Broadcast Video Applications WP-01163-1.0 White Paper This paper describes how Altera s 40-nm and 28-nm FPGAs are tailored to help deliver highly-integrated, HD studio equipment products. The paper provides an analysis of the performance requirements, resource utilization, and power consumption characteristics for the format conversion of multiple video channels. This is a common function for broadcast applications ranging from video capture cards to multiviewers, video walls, and A/V switchers. The paper also describes the architectural enhancements featured in Altera s 28-nm FPGAs that specifically improve their capability for broadcast applications. Introduction Increasing industry demand to deliver HD video channels requires studio equipment providers to deliver integrated products that provide the required bandwidth and processing power, while minimizing cost and power. Although some studio equipment providers resort to full custom ASICs, time-to-market pressure and development expense often rule out this option. Application-specific standard products (ASSPs) provide an alternative in some applications, but they can be inflexible and cannot provide high integration relative to shifting market demands. Against this backdrop, Altera offers its latest generation of 40-nm and 28-nm FPGAs tailored to deliver studio equipment developers higher integration and customization than ASSP-based systems, while avoiding the lengthy development times and costs of full custom ASICs. Up/Down Cross Conversion (UDX) Requirements The process of converting video prior to storage, encoding, or display can be described as up/down cross conversion (UDX). Figure 1 shows a simplified block diagram of a 2-channel UDX design developed by Altera. This design has extensive functionality, in addition to simple format conversion, and correspondingly overestimates required gate resources for most applications. This design is used to analyze the fitness, performance, and power characteristics of Altera FPGAs implemented in studio equipment products. The 2-channel UDX design ingests video over serial digital interface () or digital visual interface (DVI). This design can handle two SD-, HD-, or 3G- progressive or interlaced input streams up to 1080p60, such as NTSC, PAL, 720p, 1080i, and 1080p. The Active Format Description (AFD) Extractor extracts code from the channels to support dynamic clipping, scaling, and padding for bidirectional format conversion between 4:3 and 16:9 aspect ratios. Next, the input switch performs 4:2:2 to 4:4:4 chroma sampling conversion as required, which allows selection of two of the three input streams for input to the two video processing channels. 101 Innovation Drive San Jose, CA 95134 www.altera.com 2011 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS and STRATIX are Reg. U.S. Pat. & Tm. Off. and/or trademarks of Altera Corporation in the U.S. and other countries. All other trademarks and service marks are the property of their respective holders as described at www.altera.com/common/legal.html. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. April 2011 Altera Corporation Feedback Subscribe

Up/Down Cross Conversion (UDX) Requirements Page 2 Figure 1. 2-channel Up/Down Cross Conversion (UDX) design developed by Altera DVI AFD Extractor R G B Input Switch Nios II Processor MA Deinterlacer Scaler 16-port multi-port front end Frame Rate Conversion Frame Buffer DDR3 memory controller Background Color Space Converter On Screen Display Alpha Blending Mixer Interlacer Output Switch R G B AFD Inserter HDMI DVI AFD Extractor MA Deinterlacer Scaler Frame Buffer Color Space Converter Switch Interlacer AFD Inserter Y Y Cb Cr Y Y Cb Cr Frame Rate Conversion Indicates additional function not depicted Within a video processing channel, a motion-adaptive (MA) deinterlacer deinterlaces the video input in 4:2:2 mode, double-buffering it in external RAM, one output frame for each input field. Following that, the video frames are scaled to the desired resolution and buffered in external memory for frame rate conversion. The converted image is then mixed with the second channel and logos before displaying the image over user-selectable output such as, DVI, or HDMI. 1 The UDX design has been successfully implemented and demonstrated in hardware. Calculating Resource and Memory Requirements The memory bandwidth requirements for Altera s UDX design are determined by the deinterlacing stage and associated frame buffering. The per-channel device resource requirements for the UDX design are as shown in Table 1: Table 1. UDX Design Device Resource Requirements (Per Video Channel) Resource Minimum FPGA External RAM Logic elements (LEs) 45K N/A Internal RAM (Mbits) 2.6 N/A DSP (18x18 multipliers) 110 N/A Transceiver channels 1 ( or DVI) N/A External RAM (Mbytes) N/A 13.22 1080p Memory Bandwidth The memory bandwidth requirements are defined by the maximum resolution video the channel must handle. Since the design handles resolutions up to 1080p Video, the following equation calculates the memory bandwidth required to buffer a 1080p video: <Each 1080p frame width> x <height> = 1920 x 1080 = 2073600 bits 2073600 x 60 FPS x 2 color planes x 10 bit resolution = 2.48832 Gbps

Up/Down Cross Conversion (UDX) Requirements Page 3 Therefore, the minimum memory bandwidth required to write 1080p video is 2.48832 Gbps. However, the design must also account for the maximum size of word determined by the width of the memory interface. For the target FPGAs, a 64-bit memory interface is assumed, which yields a 256-bit word. To avoid splitting pixels, 12, 20-bit pixels per read or write are packed into a 256-bit word with 16 unused bits: 12 pixels x 20 bits = 240 bits. Thus, the actual bandwidth required to read or write 1080p video without splitting pixels in a 64-bit memory interface can be expressed as follows: 2.48832 Gbps x (256/240) = 2.654208 Gbps Motion-Adaptive Deinterlacing Algorithm The motion-adaptive deinterlacing algorithm requires one write at 1080i, plus either four reads at 1080i, or two reads at 1080p: 1 write @ 1080i = 0.5 x 2.654208 Gbps = 1.327104 Gbps 4 reads @ 1080i or 2 reads @ 1080P = 2 x 2.654208 Gbps = 5.30816 Gbps Total = 6.635264 Gbps If the deinterlacer includes the motion bleed feature, the store and compare motion values of the current frame must be compared with stored values. The motionadaptive deinterlacing algorithm also requires one write and one read of video motion values; the minimum bandwidth required for each read or write assuming 10- bit motion values is as follows: 1920 x 1080 x 60/2 FPS x 10 bits = 0.622 Gbps At 10 bits per motion value, a total of 25 motion values can fit into a single 256-bit word. Taking into consideration the avoidance of splitting pixels across the 256-bit word, the bandwidth required becomes: 0.622 Gbps x (256/250) = 0.637 Gbps So, the memory bandwidth required for a single channel of motion-adaptive deinterlacing is: 6.635264 Gbps + (2 x 0.637 Gbps) = 7.90953984 Gbps Similarly the bandwidth required for a framebuffer is calculated by adding memory requirements for writing and reading one 1080p frame: 2.48832 Gbps x (256/240)*2 = 5.308 Gbps Hence the total memory bandwidth required per UDX channel equals the sum of memory bandwidth requirements of the deinterlacer and the frame buffer 7.90953984 Gbps + 5.308Gbps = 13.21795584 Gbps, or ~13.22 Gbps

Implementing the UDX Design in 40-nm and 28-nm FPGAs Page 4 Implementing the UDX Design in 40-nm and 28-nm FPGAs Consider a simple two-channel UDX design, common to capture cards, such as the one shown in Figure 2. Figure 2. PCIe Capture Card with Two-Channel UDX FPGA SW CODECs PCIe SD/HD/DL x2 SD/HD/3G Up/Down/X Conversion (10-bit) MA deinterlacing Polyphase Scaling Aspect Ratio Conversion Keyer DisplayPort Monitoring SD/HD/DL x2 The memory bandwidth requirements for the two-channel UDX design is calculated as follows: 2 channels x 13.22 Gbps = 26.44 Gbps Table 2 outlines the resources required for a 2-channel PCIe capture card, including a DisplayPort output for monitoring, and a PCIe interface to transfer the video data to the host and access software codecs. Table 2. FPGA Required for 2-Channel PCIe Capture Card Resource Type per Channel Two Format Conversion DisplayPort and PCIe Interface Logic element (LE) 45K 90K 12K 102K Internal RAM (Mbits) 2.6 5.2 0.3 5.5 DSP (18x18 multipliers) 110 220 N/A 220 Transceiver channels 2 ( or DVI) 4 (2 input, 2 output) 4 (DisplayPort) plus 4 (PCIe Gen2x4) or 8 (PCIe Gen1x8) Total Capture Card 12 or 16 Table 3 below shows the target 40-nm and 28-nm FPGAs that are the best fit for the capture card design, as well as the relevant device resource counts. For the maximum memory bandwidth, symmetric interfaces (that is, at least two interfaces of same width and speed) are noted because sometimes the FPGAs can support higher memory bandwidth with additional interfaces of different data widths, and/or speeds. However, since this situation is often not desirable or practical, only the maximum bandwidth with symmetric interfaces is shown. Both FPGA options easily meet the memory bandwidth requirement of 26.44 Gbps, as indicated by Table 3.

Implementing the UDX Design in 40-nm and 28-nm FPGAs Page 5 Table 3 also indicates the nature of memory interface support for the specified target devices. Altera's 40-nm FPGAs offer external memory interfaces via soft memory controllers, implemented in the user-programmable logic and memory portions of the device. These soft controllers have been demonstrated and tested with the UDX design in actual hardware, and they have proven to deliver the required efficiency and resulting bandwidth required. In the 28-nm Arria V FPGA, the memory interface is implemented in a hard memory controller. This hard memory controller is based on the proven soft memory controller, and is designed to provide even higher efficiency, along with easy, built-in timing closure. Table 3. FPGA and Total Power Consumption FPGA Resource Arria II GX (40nm) Target device 2AGX190 5AGXA3 Logic elements (LEs) 190K 150K Total Memory (Mbits) 9.9 10.4 Max 18x18 multipliers 656 792 Max transceiver channels 16 12 Max memory bandwidth with symmetric interfaces 51.2 Gbps (soft controller) Arria V (28nm) 136.4 Gbps (hard controller) PCIe hard IP support Up to Gen1x8 Up to Gen2x4 Capture card total power consumption 10.8 watts 5.8 watts The last row in Table 3 indicates the total power consumption for the capture card design as implemented in each device. This power is calculated using the PowerPlay Early Power Estimator (EPE) tool. Both FPGA options provide the lowest total power at their respective process nodes, delivering significant benefits for the increasingly power-sensitive end markets in the broadcast space. f For more information about the EPE tool, visit the PowerPlay Early Power Estimators (EPE) and Power Analyze website. A larger design based on the UDX design can better demonstrate the full integration capabilities of the most advanced FPGAs. For example, a 16-input, 8-channel A/V switcher, as shown in Figure 3.

Page 6 Implementing the UDX Design in 40-nm and 28-nm FPGAs Figure 3. 16-input AV Switcher with 8-Channel UDX PCIe Clip/Still Store FPGA 16 inputs /DVI Up/Down/X PGM Bus Key OSD (logo/text) /DVI SD/HD/3G Up/Down/X PRV Bus Key OSD (logo/text) /DVI /DVI Downscale (SD resolution) Downscale (SD resolution) Image Mixer for Multi-Viewer DisplayPort The design shown in Figure 3 requires only a single advanced FPGA to implement. However, this design would require multiple ASSPs, along with the associated additional board space, power consumption, and higher design complexity. The first step in implementing this design in a single FPGA is to calculate the memory bandwidth required for the 8 channels of UDX as follows: 8 channels x 13.22 Gbps = 105.76 Gbps Table 4 below outlines the resources required for a 16-input, 8-channel switcher, including a DisplayPort output for monitoring, and a PCIe interface to transfer the video data to the host and obtain clips and still images. Table 4. Required FPGA for 16-input, 8-Channel A/V Switcher FPGA Resource per Channel per 8 Channels DisplayPort and PCIe interface Logic elements (LEs) 45K 360K 12K 373K Internal RAM (Mbits) 2.6 20.8 0.3 21.1 DSP (18x18 multipliers) 110 880 N/A 880 Transceiver channels 2 ( or DVI) 24 (16 input, 8 output) 4 DisplayPort plus 8 PCIe (2 Gen2x4, Gen2x8) 16-Input, 8 Channel AV Switcher 36 Altera s 28-nm FPGAs Optimized for Broadcast Video Application April 2011 Altera Corporation

28-nm FPGA Optimizations for Broadcast Applications Page 7 Table 5 shows the target 40-nm and 28-nm FPGAs that are the best fit for the 16-input, 8-channel A/V switcher design, as well as their relevant device resource counts. As described, only symmetric interfaces are used to determine the maximum memory bandwidth, and both options easily meet the memory bandwidth requirement of 105.76 Gbps. Table 5. FPGA Device and Total Power Consumption for 16-Input, 8-Channel A/V Switcher FPGA Resource Stratix IV GX (40nm) Target device EP4SGX530 5AGXB7 Logic elements (LEs) 531.2K 500K Total memory (Mbits) 27.3 23.7 Max 18x18 multipliers 1040 2278 Max transceiver channels 48 36 Max memory bandwidth with symmetric interfaces 136.4 Gbps (soft controller) Arria V (28nm) 136.4 Gbps (hard controller) PCIe hard IP support Up to Gen2x8 Up to Gen2x4 A/V Switcher total power consumption 22.4 watts 15 watts In addition to implementing this complex design in a single chip, the FPGA options deliver the lowest total power of any FPGA implementation at their respective process node, thus providing the most attractive solution at every product generation. In addition, designers benefit from an easy migration path to next generation FPGAs, since the underlying technology of the UDX design and associated memory controller architecture is consistent across FPGA generations. 28-nm FPGA Optimizations for Broadcast Applications In addition to providing consistency at the algorithm and implementation level, Altera also made specific architectural enhancements in its 28-nm FPGAs to better meet the needs of broadcast applications. Optimized Video Embedded Memory Blocks Altera configured its embedded memory blocks to efficiently and precisely accommodate 10-bit video data. Accordingly, Altera offers embedded memory blocks in its 28-nm devices that can be configured with widths in increments of 10 (that is, x10, x20, and x40) without wasting bits. Altera's broadcast-focused optimization contrasts with older FPGA architectures in which the embedded memory blocks are arranged in 18- and 36-bit widths, which results in inefficiencies, wasted memory, and the use of larger devices to obtain the required memory resources.

Page 8 28-nm FPGA Optimizations for Broadcast Applications Variable-Precision DSP Blocks Another broadcast-focused optimization is the introduction of variable-precision DSP blocks. These blocks can implement multipliers of various precisions, including 9x9, 18x18, and 27x27. In addition, designers can cascade the variable-precision DSP blocks to efficiently implement higher precision multipliers. For example, the UDX design requires multiplications of up to 10x16 (10 bits x up to 16-bit coefficients). Each variable-precision DSP block can implement two multipliers of 18x18 precision, which covers the 10x16 maximum precision required by the UDX design. In older FPGA architectures, a 10x16 multiplication may require a full DSP block, and older DSP blocks cannot be decomposed into lower precisions, which results in inefficient implementation utilization of more FPGA resources than necessary. Lowest Power Transceivers Another important optimization is the reduction of transceiver power. Many broadcast applications require increasingly more channels, and therefore more transceiver channels. The benefits of higher integration are severely mitigated if the resulting design consumes high amounts of power that requires additional cooling costs, or produces a less competitive product. Altera is continuing its trend of transceiver power reduction by reducing the power-per-channel of its transceivers at the 28-nm node. This reduction allows designers to integrate more transceiver channels into a single device, while maintaining or reducing their thermal budget. Figure 4 shows the historical trend of power-per-transceiver across three generations of FPGAs, and demonstrates Altera's commitment and ability to reduce transceiver power. This commitment reflects a decade of internal transceiver expertise that is unmatched in the industry. The significant reduction in transceiver power contributes to Altera's ability to provide the lowest total power FPGAs. Figure 4. Historical Trend of Transceiver Power-Per-Channel in FPGAs Competitive FPGAs Altera FPGAs 300 300 200 200 100 100 0 65nm 40nm 28nm 0 Stratix II GX Stratix IV GX Stratix V / Arria V Transceiver Power Per Channel (Total PMA in mw) 3 Gbps 6 Gbps Altera s 28-nm FPGAs Optimized for Broadcast Video Application April 2011 Altera Corporation

Conclusion Page 9 Conclusion The bandwidth and power challenges faced by broadcast-equipment developers can be met with today's FPGAs. Equipment developers leveraging FPGAs can benefit from highly-integrated hardware-accelerated video processing and vendor-provided IP frameworks. These frameworks provide common video building blocks while enabling designers to focus on proprietary functions. The most comprehensive FPGA offerings combine low-power approaches and proven video processing techniques to minimize risk, while providing a clear roadmap to even more advanced FPGAs with broadcast-specific architecture enhancements and optimizations for even lower power. Further Information Acknowledgements Meeting the Low Power Imperative at 28nm http://www.altera.com/literature/wp/wp-01158-low-power-28nm.pdf Reducing Power Consumption and Increasing Bandwidth on 28-nm FPGAs http://www.altera.com/literature/wp/wp-01148-stxv-power-consumption.pdf Girish Malipeddi, Senior Technical Marketing Manager, Altera Corporation. Martin S. Won, Senior Member of Technical Staff, Altera Corporation.

Page 10 Acknowledgements Altera s 28-nm FPGAs Optimized for Broadcast Video Application April 2011 Altera Corporation