GPU s for High Performance Signal Processing in Infrared Camera System Stefan Olsson, PhD Senior Company Specialist-Video Processing Project Manager at FLIR 2015-05-28
Instruments Automation/Process Monitoring Building Diagnostics Electrical/Mechanical Inspection Optical Gas Imaging Range/Science/R&D Extech brand OEM & Emerging Automotive Night Vision Personal Vision Systems Intelligent Traffic Systems Mobile Accessories Camera Cores and Detectors Surveillance Border/Force Protection Solutions Airborne Maritime Land Surveillance Vehicle Systems Man Portable Maritime Commercial Maritime Navigation/Night Vision Raymarine brand Detection Chemical Biological Radiological Nuclear Explosives Security Facility Security Lorex brand 2
FLIR Systems, Inc Portland, OR, USA - World wide HQ - Airborne FLIR Systems, Boston Boston, MA, USA - Airborne - Maritime - Handheld Pittsburgh, PA, USA - Tactical Systems - Optics FLIR Systems AB Stockholm, Sweden - Thermography - Land and Maritime Systems - Polytech Airborne Systems Canada Operations Montreal, Canada Radars FLIR Systems, Indigo Santa Barbara CA, USA - Commercial Imaging - Detectors Key figures 2013 Revenue 1 496 MUSD Employees 2 962 R&D 148 MUSD (9.9% of revenue) Net earnings 177 MUSD 3
Wide product range FLIR ONE Ranger HDC 4
Tough environments 5
gray levels Infrared radiation Image processing and infrared systems 16000 levels 256 levels 100 levels Using simplest algorithm The operator may miss 99% of the available information! 6
New product requirements Moving advanced IP and Video Analytics inside the sensor Simpler system solution Less demand for high bandwidth video Processing data close to sensor will improve quality We need SWAP optimized solutions that can handle these requirements Signal processing box 7
The turbulence problem No turbulence Turbulence Only warping Only blur 8
Merlin-ASX Anti turbulence filter Noisy & turbulent De-noised & Stabilized 9
Algorithm requirements 300 250 200 150 100 50 0 GFLOPS GFLOPS @ 30Hz ~10000 Floating point operations per pixel That is roughly x100 more calculations than we ever done in the FPGA video chain 10
Technological trends Information-communicationstechnologies now consumes ~10% of the total electricity All of these stuff contains more and more GPU s Take advantage of this trend 11
Choosing right technology FPGA GPU Multicore CPU Peak GFLOPS Excellent Excellent Poor GFLOPS/W Excellent Good Poor Matureness Excellent Good Excellent Productivity Poor Good Excellent 12
Choosing right technology FPGA GPU Multicore CPU Peak GFLOPS Excellent Excellent Poor GFLOPS/W Excellent Good Poor Matureness Excellent Good Excellent Productivity Poor Good Excellent 13
Algorithm requirements Requirements with ultra high settings @ 30 Hz Rows Cols param1 param2 param3 FPS [Hz] 1280 720 8 4 13 30 300 250 200 150 100 50 0 GFLOPS GFLOPS @ 30Hz Possible Chip Peak bandwidth Peak performance 14.7 GB/s 325 GFLOPS 14
Algorithm requirements Requirements with ultra high settings @ 30 Hz Rows Cols param1 param2 param3 FPS [Hz] 1280 720 8 4 13 30 400,0% 350,0% 300,0% 250,0% 200,0% GB/s GB/s utilization Possible Chip Peak bandwidth Peak performance 14.7 GB/s 325 GFLOPS Our application is more bandwidth bound than FLOPS bound! 150,0% 100,0% 50,0% 0,0% 15
Algorithm requirements Requirements with medium settings @ 30 Hz Rows Cols param1 param2 param3 FPS [Hz] 1280 720 8 7 7 30 16 14 12 10 8 GB/s GB/s @ 30 Hz Possible Chip Peak bandwidth Peak performance 14.7 GB/s 325 GFLOPS With medium settings 100.5 % utilization -> almost theoretically feasible 6 4 2 0 16
Algorithm implementation workflow FPGA Invent algorithm Code VHDL algorithm and testbench Write algorithm description Simulate Deploy 17
Algorithm implementation workflow GPU Fast prototype Embedded target Cross compile 18
Summary Pros For this project we probably had a 5-10 times acceleration of the development of initial roughly optimized CUDA code compare with hand-coded VHDL for FPGA. Creating an algorithm testbench with MATLAB and common PC tools is an advantage compared to usual VDHL developing environment. Specially true for complex algorithms and video analytics. Challenges Special competence (experienced GPGPU programmer) is needed for writing optimized code for demanding and complex projects, such as the embedded anti-turbulence filter (maybe better tools can improve this). Often need for special adaptations and customizations, can not by off-the-shelf solutions. Military and goverment programs may span over 10 years ot more (sourcing). In most of our products we require rugged and military specified components. At the same time we require modular solution (COTS) for the other products. The size, power, cost of the GPGPU technology is still a big issue. 19
Thank you! 20