USING FUSION SYSTEM ARCHITECTURE FOR BROADCAST VIDEO Edward Callway AMD
USING PC COMPONENTS FOR BROADCAST VIDEO Video processing from pure analog to digital compute PC Design for video Parallel GPU computing (Graphics Processing Unit) PC GPU/CPU/Memory architecture Operating systems and parallel processing software Roll your own or off the shelf? Other design considerations Broadcast applications 3 Using Fusion System Architecture for Broadcast Video June 2011
VIDEO PROCESSING SPEEDS & FEEDS 4 Using Fusion System Architecture for Broadcast Video June 2011
1940 ANALOG RASTER PROCESSING Signal, not Pixels Raster Scan Input Clamp Gain & Offset Peak High Frequency Raster Scan Output Video was pushed through the pipe by the raster scan Processing was analog, in-line, real-time, controlled by knobs Delay was <1us, no audio sync issues 5 Using Fusion System Architecture for Broadcast Video June 2011
1970 DIGITAL RASTER PROCESSING Pixels Video HW Control Raster Scan Input Detect Low Pass Filter Non Linear Function Modify Pixels Raster Scan Output Pixel or Line Delay Idea of analyze then modify introduced Still raster push, delay up to a few lines Gradual transition from analog to digital All fixed function, not programmable 6 Using Fusion System Architecture for Broadcast Video June 2011
2000 DIGITAL FRAME BUFFER PROCESSING Pixels CPU Control Video HW Control Input Images Image & Control Frame Buffer Output Images Fast Pixel Hardware Analyze Compute Controls Modify Pixels Fixed function pixel HW has high bandwidth CPU doesn t touch pixels Frames, not raster Slow VSYNC Register Access CPU Decisions CPU Memory Adds many frame delays! 7 Using Fusion System Architecture for Broadcast Video June 2011
VIDEO COMPUTE LOAD Input Images Image & Control Frame Buffer Output Images Analyze Compute Controls Modify Pixels Image R or W Video Processing Stage Pixel Operations 1 Write new source image as input 0 5 Read multiple source images and analyze 100 1 Read previous motion map 0 1 Calculate and write new motion map 100 1 Calculate and write processed image 100 1 Read processed image to output 0 10 Total images accessed per output image 4,977 Mbytes/s memory bandwidth Operations per pixel 300 Moperations per output image 622 Moperations/s 37,325 Example Conditions: 1920x1080 60 fps 32 bpp 5 frame history Motion map 8 Using Fusion System Architecture for Broadcast Video June 2011
MAP COMPUTE TO COMPONENTS Input Images Image & Control Frame Buffer Output Images Analyze Compute Controls Modify Pixels Required Memory CPU GPU FPGA Video Load 128 bit DDR3 @ ~ 200$ CPU ~ 200$ High speed Example 1333 MHz graphics card FPGA Bandwidth GB/s 5 21.3 21.3 128 User designed Operations G/s 37-100 2,000 1,000 Typical midrange memory and compute components All are suitable for video tasks GPU handles ~25 video tasks, CPU ~3 for similar pricing FPGAs very capable, but require severe roll your own work 9 Using Fusion System Architecture for Broadcast Video June 2011
GPU DESIGN 101 Input Images Image & Control Frame Buffer Output Images Analyze GPUs grew powerful 3D from PC gaming and CAD High frame rates & resolution, anti-aliasing, lots of polygons, tessellation Modern OS designs use 3D acceleration for enhanced UI experience Originally register-based, hard-wired 3D graphics engines Shaders added hundreds to thousands of programmable compute units Gradual shift from pure SIMD to flexible parallel compute Stream Computing is using the GPU for general purpose compute tasks Special memory and wide buses provide high bandwidth Compute Controls Modify Pixels 10 Using Fusion System Architecture for Broadcast Video June 2011
UPDATING THE PC FOR BETTER VIDEO COMPUTE 11 Using Fusion System Architecture for Broadcast Video June 2011
THE PC AND VIDEO = ONLY BIG BOXES? Offline render farms well known for professional work Next step is reducing the power and size envelope of a PC Real time video processing happens today in consumer PCs 3D Stereo HD video decode acceleration in UVD block Cadence detect, de-interlacing, dynamic gamma, scaling, color space conversion, color and detail enhancements, random/mosquito/block noise reduction algorithms run on laptop GPUs 12 Using Fusion System Architecture for Broadcast Video June 2011
INSIDE A CLASSIC PC Multicore CPU and memory controller Northbridge containing a GPU + display Southbridge for more I/O Optional external GPU High speed buses eat power, excrete EMI Big, warm 13 Using Fusion System Architecture for Broadcast Video June 2011
THE AMD FUSION APU Combines CPU and GPU into one APU (Accelerated Processing Unit) Saves power and board space, increases memory efficiency Shrinks core PC components to a single chip below 10 watts Enhanced high performance GPU and multicore CPU 14 Using Fusion System Architecture for Broadcast Video June 2011
2011 OPENCL + FUSION DISTRIBUTED PROCESSING Pixels CPU Control GPU Control Multicore CPU Engines Analyze Compute Controls Modify Pixels Other Tasks Input Images Combined Memory Output Images Multicore GPU Engines Analyze Compute Controls Modify Pixels Fusion shares the memory OpenCL shares the tasks 15 Using Fusion System Architecture for Broadcast Video June 2011
PARALLEL PROCESSING FOR EVERYONE Analyze Compute Controls Modify Pixels Other Tasks Multicore CPUs and GPUs need easy parallel programming Microsoft has released DX11 DirectCompute, enabling general purpose parallel programming on a GPU, or GPGPU The Khronos Group has released OpenCL, which spreads a parallel compute task across CPU and GPU cores OpenCL can utilize all the processing power in a platform One slide can t cover two languages, dig in and try them! Analyze Combined Memory Compute Controls Modify Pixels 16 Using Fusion System Architecture for Broadcast Video June 2011
OPERATING SYSTEM CHOICES Older equipment had no OS, just interrupt handlers Today s connected devices require a standard OS Do you want to write a USB or network stack from scratch? Windows, Mac OS, Linux suitable for Fully Open devices Walled Garden devices like TVs and STBs need a full feature OS with restricted access Fully Closed devices like cameras need complete configurability to add and remove modules 17 Using Fusion System Architecture for Broadcast Video June 2011
APPLICATIONS 18 Using Fusion System Architecture for Broadcast Video June 2011
DIGITAL BROADCAST Better Compression is More Money! Bandwidth is never free: air, satellite, net The more channels you cram in a pipe, the more money you make But if quality drops too low, customers complain or walk MPEG-2 still rolling out, even H.264 needs help Many rack mount compression solutions Video preprocessors to remove noise Better compressors with more delay Compress 32 different ways, dynamically pick the best Statistical multiplexing to spread the load across channels Individual processors inflexible, don t like to talk to each other 19 Using Fusion System Architecture for Broadcast Video June 2011
MID SIZE Flexible Compression is More Money! Broadcast always changing New CODECs, STBs, PMPs, phones Multicore CPU Engines Compress Compress Compress Vectors Vectors Compress Vectors Vectors Prepare Prepare Bitstreams Bitstreams Prepare Bitstreams Prepare Bitstreams Pick Best Other Tasks New business models Don t want to rebuild the headend Raw Videos Combined Memory Output Streams APUs and FPGAs both flexible, reconfigurable Downloading new configuration is sweet! Multicore GPU Engines Clean Video Clean Video Clean Clean Video Video Find Motion Find Motion Find Find Motion Motion Compute Vectors Compute Vectors Compute Compute Vectors Vectors But look around the room... How many can program in C? How many can program FPGAs? APUs provide complete flexibility 20 Using Fusion System Architecture for Broadcast Video June 2011
3D CAMERA CORRECTION Most digital 3D is shot with some combination of 2 lenses and 2 sensors Studio lenses and sensors are very good, but they rarely match Lenses differ in focus, zoom strength + centering, aberrations, dust specks Sensors have differing overall gain and weak pixels Cameras recalibrated for each day s shoot, parameters saved 3D movies are not real time, left and right views matched in post production 21 Using Fusion System Architecture for Broadcast Video June 2011
SMALL SIZE Real Time 3D Camera Correction Analyze Compute Controls Modify Pixels Other Tasks Combined Memory Analyze Compute Controls Modify Pixels Live news is...live! No time for post production. Even harder to make ENG stereo images match Lighter chassis + rough handling = more L/R differences One camera person, not a whole crew for support Cameras must continuously auto-calibrate and correct Shake & defect removal, geometric & lighting differences are ideal tasks for on-board Fusion processor 22 Using Fusion System Architecture for Broadcast Video June 2011
JUMBO SIZE Movie Render Farms Individual CG effects on smaller clusters Growing market for whole movie effects Convert 2D movies to 3D Upgrade movies for large format theaters Upgrade TV shows to Blu-ray Re-render entire catalog for new targets Restore old movies Work starts after movie is finished, but before release Time is money, a few days can mean millions! Driving cloud compute farms for movies 23 Using Fusion System Architecture for Broadcast Video June 2011
INSIDE A MOVIE RENDER FARM Components designed for GPU render farms AMD FirePro V7800P graphics compute engine Passively cooled for server environments Dell M610X Blade Server Supports multiple AMD GPUs Microsoft RemoteFX Remote hosting and visualization http://www.amd.com/us/press-releases/pages/firepro-v7800p-2011may16.aspx AMD FirePro V7800P Dell M610X Blade Server 24 Using Fusion System Architecture for Broadcast Video June 2011
WORKING WITH PC VENDORS Know your product lifetime requirements One-off project? Hundreds over many years? Mission critical? Avoid dying technology like serial ports and VGA Do the research Use Workstation and Embedded components and vendors Speed, feature, price, reliability, support, 24/7 operation tradeoffs AMD has Workstation and Embedded resources For critical custom SW designs, set up escrow as protection 25 Using Fusion System Architecture for Broadcast Video June 2011
SUMMARY PC components have the performance for live video Fusion APU small size/low power envelope enables portable use OpenCL & DirectCompute open up parallel programming Packetized high speed serial buses replacing older interfaces Use workstation components for real work Engage the vendors about your requirements! 26 Using Fusion System Architecture for Broadcast Video June 2011
QUESTIONS
Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. OpenCL is a trademark of Apple Inc. used with permission by Khronos. 2011 Advanced Micro Devices, Inc. All rights reserved. 28 Using Fusion System Architecture for Broadcast Video June 2011