Design TKK S-88.134 ASIC-PIIRIEN SUUNNITTELU Design Flow 3.2.2005 RTL Design 10.2.2005 Implementation 7.4.2005 Contents 1. Terminology 2. RTL to Parts flow 3. Logic synthesis 4. Static Timing Analysis and Formal verification 5. Design For Test 6. Floor-planning 7. Physical Synthesis 8. Clock tree synthesis 9. Placement & Routing 10. Manufacturing
Terminology ASIC Vendor Company who performs layout, creates masks, manufactures and tests chips and handles logistics E.g. Toshiba, NEC, IBM, Motorola, ST Fabless ASIC Vendor Company who performs layout, possibly creates masks, and handles logistics but sub-contracts manufacturing and testing. E.g. E-silicon Fab (Fabric) Company who manufactures chips, possibly also creates masks and tests the chips E.g. UMC, TSMC Traditional flow Terminology Designer provides synthesized netlist to ASIC vendor ASIC Vendors performs layout and provides back annotation to designer Designer performs timing analysis Customer Owned Tool (COT) flow Today, big design houses (old Customers of ASIC vendors) may have the layout tools in order to have more control on the layout process (time, price, ) Design houses perform in COT flow layout and provide layout results (GDSII data base) for manufacturing.
RTL to Parts flow 1. Logic synthesis 2. Pre-layout Static Timing Analysis 3. Test structure insertion 4. Test pattern generation 5. Floorplan 6. Physical synthesis 7. Clock tree insertion 8. Routing 9. Post layout Static timing analysis 10. Manufacturing Logic synthesis Logic synthesis phase contains 1. Design constraints creation 2. RTL synthesis i.e. converting RTL HDL code into netlist 3. JTAG insertion 4. IO pads and Hard macro (RAMs, CTS buffers) insertion Reasonable block size for synthesis is < 50-100 Kgates. Bigger blocks may require too much time to complete, making iterations too slow (> 4 hrs). Big designs are synthesized bottom up: First sub-blocks then connecting them together
Logic synthesis 1. Constraining Defining IO delays Target clock frequency Operation conditions (process, voltage, temperature, PVT) BCCOM, WCMIL, WCIND Area / power targets Max fan out IO drive capability / load Net delay estimation : wire load model 1. Constraining example Logic synthesis
1. Constraining example Logic synthesis Logic synthesis 2. RTL synthesis Analysis, checking the syntax Elaboration, converting HDL into generic gates Mapping into target technology Scan flip-flops can be inserted automatically Top-down for small designs Bottom up for big designs with timing budgeting
Logic synthesis 2. RTL synthesis : Top-down versus Bottom up methodology Local constraints RTL RTL Global constraints SubA synth SubB synth SubC synth STA STA STA constraints met? SubA SubB SubC Top Level synth Global constraints Top Level synth STA constraints met? STA Next step constraints met? Next step Logic synthesis 2. RTL synthesis Checking if constraints were met Timing Area Fanout Testability Used cells Optimization Flattening, grouping Changing constraints In Place Optimization (IPO) after place and route Scaling Buffing
Logic synthesis 2. RTL synthesis Timing report example: Logic synthesis 3. BScan insertion Boundary scan controller insertion Bscan cell insertion 4. IO pad insertion and Hard Macro insertion Often done in Top-Level VHDL code by hand Can be done by synthesis tools (script)
Static Timing Analysis Checking that the timing criteria are met Orders of magnitude faster than simulations No need for simulation vectors Capasity of millions of gates How STA works MY_DESIGN A CLK Path 1 D Q FF1 Path 2 D Q FF2 Path 3 QB QB Path 4 Z Design is broken down into sets of timing paths Delays and slews on each path are propagated and computed Path delays are checked to see if timing constraints are met
STA output Textual or graphical reports Whether or not design meets frequency Types of constraints violated: Setup/hold, clock gating glitches Min period, max transition, etc. How many paths violated Violation magnitudes Complete, traced signal paths Forward annotation and constrain information for P&R (SDF,SDC) Formal verification Equivalent checking between two models: proves mathematically that two designs have the same functionality Orders of magnitude faster than simulations No need for simulation vectors Capasity of millions of gates No timing verification
How formal verification works Compare points are then mapped: End points of logic cones (compare points) are primary outputs, registers, and black-box inputs Formal verification tools translate your designs into boolean equations, then compares the two equations, then verifies the logic driving each cone BB D Q Reference Design CP CP CP BB D Q Implementation Design DFT: Test structure insertion Inserting SCAN flip-flops normally done at synthesis phase Insertion of RAM BIST Insertion of logic BIST Scan chains insertion Definition of Scan inputs and outputs Definition of number of scan chains and maximum lengths Insertion of test logic to by-pass non-scan-testable logic (clock dividers, plls etc.)
DFT: Test pattern generation ATPG generation fault coverage target >95% IDDQ patterns for quiescent current leakage measurement Functional pattern generation, from simulation cases BSCAN pattern generation Test vectors can be simulated to verify operation Test vector generation tool provides test benches Parallel simulation, no shifting, fast Serial simulation very slow (weeks) Fault simulation Simulating the simulations coverage. As a result, tool will tell what is the fault coverage with the applied stimulus. Floorplan Floorplan defines sub-block placement on die Floorplan defines Chip boundaries IO placement Sub-block size, shape, orientation and placement Hard macro placement Power / Ground grids
Physical Synthesis By integrating the synthesis and placement into one tool, we can avoid iterations between synthesis and P&R. Optimizing the logic according to the actual placement Takes floorplan as input Places the cells, based on floorplan, estimates the routing and sizes the cells to meet timing requirements. Output is netlist with placement information, no detailed routing info. Can be done top-down for small designs < 1MG, bottomup for big designs. Physical synthesis 1. Front-end timing is becoming unreliable With traditional flows, all nets with the same fanout have the same estimated interconnect delay during front-end design Delay Fanout
Physical synthesis 1. Front-end timing is becoming unreliable 2. Placement can change timing dramatically After placement, it is obvious that nets with the same fanout will not have the same interconnect delay Logical View Physical View Physical synthesis 1. Front-end timing is becoming unreliable 2. Placement can change timing dramatically 3. Detailed routing has only a minor effect when good global routing is done to model interconnect After Placement After Routing
Inserting clock tree Clock Tree Synthesis Guarantees Setup and hold times for FFs Small clock skew in order to logic to operate correctly Big clock skew in order minimize simultaneous switching taking too much power Clock tree can take huge amount of power Placement & Routing Possible changes in Placement after CTS Routing Connecting cell with each other and to Ios The bigger the design the more difficult is to meet timing Routing can not overcome bad placement, synthesis or RTL problems Things to be taken into account Parallel wires, distance, capacitance Congestion Antenna effects Power / GND routing (IR drop) Obstructions
Chip finishing Modify silicon area to meet manufacturing requirements GDSII generation Layout vs schematic (LVS) check Possible optimizations to get better yield 1. Mask generation 2. Creating die, layer after layer Manufacturing First layers forms the transistors Metal layers create interconnects Chip can have 3 to 8 layers If design contains a bug, it may be possible to correct by changing only the metal layers 3. Testing the chip 4. Packaging and shipment to customer
Technology choises Ga, Analog and Full Custom technologies for very niche products Standard cell Predefined cells (and, and2, and3, or ) are used on chip upon design need Hard macros as needed (Rams, high speed Ios. Etc.) Very good utilization, performance, power etc. Gate array Cells, including hard macros are pre-existing on silicon and connected upon design need Cheaper NRE, slower, and higher part cost vs. SC.