An Algorithm to Silicon ESL Design Methodology Nitin Chawla,Harvinder Singh & Pascal Urard STMicroelectronics
SOC Design Challenges:Increased Complexity 992 994 996 998 2 22 24 26 28 2.7.5.35.25.8.3 9 65 45 32 k 5k 5k 3k 45k 8k 5k 3k 6k.2M #Gates / Die (5mm2) conservative numbers 5K 25K 75k.5M 2.2M 4M 7.5M 5M 3M 6M #Gates per Designer per year 4k 6k 9k 4k 56k 9k 25k 2k 2k 2k Men / Years per 5 mm2 Die ~ ~4 ~8 ~4 ~4 ~43 ~6 ~75 ~5 ~3 Need to improve design productivity DAC 29: User Track Nitin Chawla STMicroelectronics - 29 2
SOC Design Challenges: System to Implementation barrier 5. Understand specification J. Gerlach DAC 29: User Track Nitin Chawla STMicroelectronics - 29 3
Need an ESL Vision:Extending High Level Synthesis Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 4
Design Space Exploration: Beyond standalone HLS Area/Power/Parameters Raising level of productivity Algorithmic + Architectural Design Space Exploration Best manual solution Design space Current HLS allows better exploration but limited to local minima DAC 29: User Track Nitin Chawla STMicroelectronics - 29 5
ESL Design Flow:Step Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 6
Template class for N stage DIF Streaming FFT Serial input input v) - W output v) input v) - output v) W input v) - W output v) input 2) - W output 2) input ) - W output ) Serial output v-2 v- buffer N-th stage v=2 N- v-2 v- buffer (N-)th stage v=2 N-2 v-2 v- buffer n-th stage v=2 n- buffer 2-nd stage buffer st stage N objects of stage class Stage class input samples > = v) output samples > = v) <stage computation precision> <stage input precision> Serial input - W Serial output <stage output precision> v-2 v- buffer Nth stage v=2 n- <stage number N> DAC 29: User Track Nitin Chawla STMicroelectronics - 29 7
ESL Design Flow:Step 2 Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 8
Model Based Design System Model creates the Executable Specification. System Model is the center of the development process and enables Specification Capture. Block level partitioning and assembly. Continuous test and Verification. Block level reuse. Common design environment. Examples Simulink(Mathworks) ADS/SystemVue(Agilent) DAC 29: User Track Nitin Chawla STMicroelectronics - 29 9
Single Source Model Based Design: Simulink S-function Encapsulation void block(ac_channel<type_a> &input, ac_channel<type_b> &output) HLS C++ void block_wrapper(double input[n], double output[n]) Matlab supported native datatype interface wrapper S function structure defination Legacy_code( sfcn_cmex_generate,def); Legacy_code( compile,def); MATLAB ENVIRONMENT Simulink Source sfn_block S-function Simulink sink SIMULINK ENVIRONMENT Simulink block DAC 29: User Track Nitin Chawla STMicroelectronics - 29
Numerical Refinement for Noise Budgets :SQNR vs I/P Signal PAPR for an FFT No Noise modulation based On I/P Signal PAPR Sharp fall in SQNR Due to clipping for I/P PAPR < 6 db PAPR:Peak to Average Power Ratio DAC 29: User Track Nitin Chawla STMicroelectronics - 29
ESL Design Flow:Step 3 Single digital models for algorithmic and architecture exploration Executable Specs RF Analog coarse model chain analysis Encapsulate C/C++ IPs Refine spec c chain Stream block block2 block3 block4 block5 Output HLS Formal proof RTL chain Stream block block2 block3 block4 block5 Output DAC 29: User Track Nitin Chawla STMicroelectronics - 29 2
HLS Explorations: Area, Throughput tradeoff Area, Throughput samples / n cycles II = n Area =~ a/n sample/cycle II= Area = a n samples / cycle II= Unroll n (main loop) DAC 29: User Track Nitin Chawla STMicroelectronics - 29 3
Unfolded Architecture: Stage Implementation input v) output v) Parallelism Achieved by loop unrolling M parallel Inputs butterfly M parallel outputs............ Multi banked buffer (MX(2^n-/M) n-th stage of radix 2 sdf FFT unfolded by M DAC 29: User Track Nitin Chawla STMicroelectronics - 29 4
Multimillion Gate GS/s Frequency Domain Processor N/2 N/2 N/2 N/2 FFT Filter Channel Shifter Shifter IFFT N/2 N/2 N/2 N/2 N N N N N/2 N/2 N/2 Interleaver FFT Filter Channel Shifter Shifter IFFT N/2 N/2 N/2 Add block Frequency mask Channel shift value Unfolded by (4) systolic 248 pt FFT/IFFT. Interblock FIFO communication. DAC 29: User Track Nitin Chawla STMicroelectronics - 29 5
Physical Prototyping at ESL Level In ASIC technologies of 65nm and below path delays are wire dominated. Most Signal Processing applications use lot of compiler generated memory cuts. Memory Architecture Choices at the ESL level are simply made on the basis of BandWidth and Ports. But Memory cuts create routing blockages and wire detours. In the end it s the Silicon area post P&R that matters. DAC 29: User Track Nitin Chawla STMicroelectronics - 29 6
Physical prototyping: Memory Architecture exploration 4 RAMS (Width 4X) SKEWED ASPECT RATIO NARROW & DEEP ROUTING CHANNELS - HUGE ROUTING CONGESTION (Width 4X) replaced with 4 (Width X) RAMS CREATION OF NEW ROUTING CHANNELS MEMORY AREA INCREASES CORE UTILIZATION IMPROVES BY 3% POST P&R AREA IMPROVES BY 2% DAC 29: User Track Nitin Chawla STMicroelectronics - 29 7
Design Productivity vs Manual RTL X % 5X X t /2X Behavioral IP Reuse, further improves design productivity DAC 29: User Track Nitin Chawla STMicroelectronics - 29 8
Conclusion ESL synthesis can successfully build production worthy multi million gate complex application engines from untimed C/C++ algorithmic models. Key benefits of ESL Synthesis Increased design productivity and faster time to market Flexibility and scalability to try alternative architectures Better QOR vs Hand Coded design due to enhanced Design Space Exploration DAC 29: User Track Nitin Chawla STMicroelectronics - 29 9