Wireless Information Transmission System Lab. System IC esign: Timing Issues and FT Hung-Chih Chiang Institute of Communications Engineering National Sun Yat-sen University
SoC Timing Issues Outline Timing terminologies Synchronous vs. asynchronous design Interfaces and timing closure Clock issues Reset esign for Testability (FT) SoC test plan Scan, ATPG, FT design rules Embedded memory test, embedded core test 2
SoC Clock Issues ata Cache Instr. Cache Clock/OSC Microprocessor Memory High Speed I/O Ctrl High Speed Bus Memory Ctrl HS IP Bus Bridge Peripheral Bus Timer Intr Ctrl GPIO UART LS I/O Clock/OSC 3
SoC Clock omains CLK2 OSC Clock Generator CLK1 CLK3 CLK4 CLK6 CLK5 4
Timing Terminologies Cell timing specification Setup time, hold time, release time, width, period and skew Max. clock frequency, timing closure Cell delay and wire delay Environments and process variations Simulation best case, typical case, worst case and pseudo worst case Cell delay and wire delay affecting factors Loading and driving capacity 5
Basic Cell Timings and elays SN N RN Interconnection elay IA IB O Cell elay Cell elay Setup Hold Recovery Width Width Skew RN 1 Period 2 6
Simulation Cases Best Case highest operation voltage, lowest temperature, fast process, eg. 0.25µm@2.75V, 0 C, fast process Typical Case standard operation voltage, room temperature, typical process, eg. 0.25µm@2.5V, 25 C, typical process Pseudo Worst Case lowest operation voltage, highest temperature, typical process, eg. 0.25µm@2.25V, 125 C, typical process Worst Case lowest operation voltage, highest temperature, slow process, eg. 0.25µm@2.25V, 125 C, slow process 7
Linear elay Model Loading and Cell elay t typical = t intrinsic + (K load * C load ) (atabook) Non-Linear elay Model (Table Lookup) t typical = F(t rf, C load ) (EA Timing Model) 8
Cell atasheet: NAN2 (1) Cell escription The NAN2 cell provides the logical NAN of two inputs (A, B). The output (Y) is represented by the logical equation: Y = A B Logic Symbol A B Y Function Table A B Y Cell Size rive Strength NAN2XL Height (µm) 8.0 Width (µm) 3.2 0 x 1 x 0 1 1 1 0 NAN2X1 NAN2X2 NAN2X4 8.0 8.0 8.0 3.2 4.8 6.4 9
Cell atasheet: NAN2 (2) AC Power Pin Capacitance Pin Power (µw/mhz) XL X1 X2 X4 Pin Capacitance (pf) XL X1 X2 X4 A 0.0134 0.0270 0.0510 0.0950 A 0.0023 0.0069 0.0132 0.0250 B 0.0158 0.0336 0.0647 0.1215 B 0.0023 0.0067 0.0139 0.0248 elays @ 25, 2.5V, Typical Process escription Intrinsic elay XL X1 X2 X4 K load (ns/pf) XL X1 X2 X4 A Y 0.059 0.038 0.034 0.035 11.020 3.067 1.450 0.780 A Y 0.051 0.038 0.036 0.035 9.388 3.151 1.545 0.782 B Y 0.072 0.047 0.047 0.044 11.016 3.069 1.449 0.780 B Y 0.059 0.046 0.043 0.041 9.383 3.150 1.545 0.783 10
Cell atasheet: FF (1) Logic Symbol Cell escription The FF cell is a positive-edge triggered static -type flip-flop. N Function Table Cell Size [n+1] N[n+1] rive Strength Height (µm) Width (µm) 0 0 1 FFXL 8.0 14.4 1 1 0 FFX1 8.0 14.4 x [n] N[n] FFX2 8.0 15.2 FFX4 8.0 21.6 11
AC Power Power (mw/mhz) Pin XL X1 0.0604 0.0856 Cell atasheet: FF (2) Pin Capacitance Pin Capacitance (pf) X2 X4 XL X1 X2 0.1163 0.1828 0.0023 0.0032 0.0037 X4 0.0058 0.0712 0.0984 0.1308 0.2013 0.0024 0.0046 0.0065 0.0107 0.0738 0.1311 0.2143 0.3997 elays @ 25, 2.5V, Typical Process escription Intrinsic elay XL X1 X2 X4 K load (ns/pf) XL X1 X2 X4 0.515 0.346 0.297 0.262 11.027 3.003 1.530 0.732 0.573 0.289 0.254 0.226 6.625 3.228 1.614 0.787 0.635 0.388 0.344 0.308 11.008 2.997 1.460 0.730 0.686 0.474 0.409 0.357 6.310 3.217 1.566 0.783 12
Cell atasheet: FF (3) Timing constraints @ 25, 2.5V, Typical Process Pin Requirement Interval (ns) XL X1 X2 X4 setup 0.09 0.14 0.16 0.15 setup 0.28 0.20 0.24 0.20 hold -0.07-0.09-0.10-0.09 hold -0.10-0.02-0.04-0.02 Minpwh 0.50 0.50 0.50 0.50 Minpwl 0.60 0.60 0.60 0.60 13
SoC Timing Issues Outline Timing terminologies Synchronous vs. asynchronous design Interfaces and timing closure Clock issues Reset esign for Testability (FT) SoC test plan Scan, ATPG, FT design rules Embedded memory test, embedded core test 14
Synchronous vs. Asynchronous esign Synchronous esign Flip-flop based (clock based) Easy timing handling FT compliant Asynchronous Latch based Timing ambiguity causes problems Not FT compliant 15
Flip-Flop (Clock) Based esign combinational logic Poor HL coding of combinational logics can produce unintentional latches Avoid using flip-flops with enable input Use positive edge triggered flip-flops for module RTL coding if flip-flops in cell library is triggered at positive clock edge 16
Flip-Flop Clock Edge If negative edge triggered flip-flops are required in a design while Cell Library contains positive edge triggered flip-flops, invert the clock phase first and then write RTL codes using positive edge triggered flip-flops to avoid inverters being inserted at clock inputs of each modules during logic synthesis. 17
Clock-Based Timing (single clock source) d 1 d 2 Combinational logic t hold < d 1 + d 2 < T t setup Assuming all clocks arrive at the same time Must identify multi-cycle paths and asynchronous signals during logic synthesis! 18
Problem of Latch: possible /E race E E E Need to ensure that there will be enough hold time for stable after the falling edge of E 19
Problem of Latch: timing ambiguity in E E output setup output setup 20
SoC Timing Issues Outline Timing terminologies Synchronous vs. asynchronous design Interfaces and timing closure Clock issues Reset esign for Testability (FT) SoC test plan Scan, ATPG, FT design rules Embedded memory test, embedded core test 21
Interfaces and Timing Closure A proper design of block interfaces makes timing closure a local problem. A major timing issue in deep submicron technology is the wire delay due to wire load capacitance and RC delay can be much larger than intrinsic cell delays. Timing driven APR helps deal with this problem by taking into account the wire load model. Physical synthesis takes a further stride in achieving timing closure by combining synthesis and timing driven placement. 22
Macro Interfaces Macro A Macro B Both inputs and outputs should be registered. This gives a full clock cycle to propagate the outputs of one macro to inputs of another. 23
Sub-block Interfaces Macro Sub-block A Sub-block B Macro A Any block that is synthesized as a unit should have its own outputs registered. Any block that is floor-planned as a unit should have its own inputs and outputs registered. 24
Example: interface specification 3ns 3ns T on t care Valid on t care 25
Example: registered vs. unregistered inputs d 1 + t setup < 3ns? Comb. 1 d 2 + t setup < 3ns Comb. 2 26
SoC Timing Issues Outline Timing terminologies Synchronous vs. asynchronous design Interfaces and timing closure Clock issues Reset esign for Testability (FT) SoC test plan Scan, ATPG, FT design rules Embedded memory test, embedded core test 27
Clocking Issues Clock skew and clock tree ivided clocks Asynchronous clock interface Clock gating Synchronize Hard IP Other considerations 28
Clock Skew in Combinational FF0/ FF0 FF1/ FF1 FF0/ Skew FF1/ 29
Clock Skew May Cause Errors in 0 0 1 1 FF0/ FF1/ in FF0/ in FF0/ 0 FF1/ 0 FF1/ 1 1 30
Clock Tree Big buffer Small buffer Insert clock tree during APR Clock tree can significantly increase power consumption 31
Clock Tree Example Match + Skew 32
Clock Tree Example Start point 33
ivided Clocks 1 (f/2 Hz) 1 +skew 1 Module A Clock Generator 0 (f Hz) 0 +skew 0 Module B t0 t1 t2 Ck2 (f/4 Hz) 2 +skew 2 Module C 0 1 2 34
An Alternative esign Approach for a ivided Clock omain En1 +skew Module A Clock Generator Ck Module B En2 Module C En1 En2 35
Asynchronous Clock Interface d b X Y a a 1a Ck2 Combinational Z b b 1b d a Ck1 Block 1 angerous design!!! Random logic errors may occur due to the delay time difference between d a and d b. 36
Asynchronous Errors 2 X Y (a) Z (b) 1a/CLK1b ab: E.g. 01 X=0 X=1 10 X=1 0 11 01 X=0 1 00 37
Clock Synchronization in Synchronization in Ck2 Ck1 Block A Not all asynchronous inputs need to be synchronized! A single flip-flop may not be good enough for clock synchronization. 38
ASIC Flip-Flop N T 1 N T 3 N T 2 T 4 N = L : T1 & T4 on; T2 & T3 off = H : T1 & T4 off; T2 & T3 on 39
Metastability X Asynchronous to 1 1 Y Synchronous to 1 1 X Y 40
Standard Asynchronous Interface in Ck2 Ck1 Block A Two staged flip-flop to reduce the probability of metastability 41
ual Flip-Flop Synchronization X Y Z Ck2 Ck1 Block A 1 X Y Z 42
Peak Power Reduction A Sync. I/F Clock Generator Ck B Sync. I/F Sync. I/F C Cka A Clock Generator Ckb Ckc B C Async. I/F Async. I/F Async. I/F a b c 43
Clock Gating Module A Clock Enable Module B Module C Not recommended! Increase difficulties for synthesis and APR tools. 44
Clock Gating For Low Power esign Clock Enable Module A Clock Generator Module B Module C 45
Clock elays For Hard Blocks 1 +skew 1 Clock Generator 2 +skew 2 Take into account insertion delays of hard macros 46
Clock Planning Guidelines The system clock generation and control logic should be separate from all function blocks of the system ocument clock domain information - frequencies, PLL - interface timing (input and output) - skew requirement among clocks Use the standard synchronization interface for asynchronous inputs Compensate insertion delays of hard macros Bypass clock gating and PLL in test mode 47
SoC Timing Issues Outline Timing terminologies Synchronous vs. asynchronous design Interfaces and timing closure Clock issues Reset esign for Testability (FT) SoC test plan Scan, ATPG, FT design rules Embedded memory test, embedded core test 48
Chip Reset Issues Synchronous or Asynchronous? External or Internal Power On Reset? Voltage etector for Power own Reset? Hard Reset and Soft Reset? Each Module Individually Resetable for ebugging Purposes? 49
Synchronous Reset Reset N N N Easy to synthesize since reset is treated as a logic signal Reset slightly affect data timing Need at least one active clock edge for reset to take place. This could become a problem upon power on 50
Verilog/VHL for Synchronous Reset // Sync. Reset (Verilog) always@(psoedge Clk) if (~Rst_n) begin A <= ; B <= ; end else begin ; ; End // Sync. Reset (VHL) library IEEE; use IEEE.std_logic_1164.all;... process(clk) begin if rising_edge(clk) then if (Rst_n='0') then A <=..; B <=..; else..;..; end if; end if; end process; 51
Asynchronous Reset N N N RN SN RN No clock required during reset period Reset does not affect data timing Like clock, a reset tree is usually required during APR 52
Verilog/VHL for Asynchronous Reset // Async. Reset (Verilog) always@(psoedge Clk or negedge Rst_n) if (~Rst_n) begin A <= ; B <= ; end else begin ; ; End // Async. Reset (VHL) library IEEE; use IEEE.std_logic_1164.all; process(clk, Rst_n,.) begin if (Rst_n='0') then A <=..; B <=..; elseif rising_edge(clk) then..;..; end if; end process; 53
Synchronous or Asynchronous Reset? If properly designed, both synchronous and asynchronous reset schemes can work on most application systems. Synchronous reset requires additional latency, while asynchronous reset is more sensitive to system noise. Reset must be synchronously de-asserted so that all state machine flip-flops starts at the same active clock edge. All flip-flops/latches should be reset to a predefined state ( 0" or 1") to avoid ambiguity voltage output of sequential elements. 54
Glitches Removing A1 active low C1 A1 B1 B1 C1 A2 active high C2 A2 B2 B2 C2 Process, temperature and voltage dependent! 55
Synchronous Reset Architecture buffer tree Rst_n Reset_n Timing adjusted by synthesis tools 56
Asynchronous Reset Architecture buffer tree 1 Rst_n Reset_n Reset timing budget: almost 1 full clock 57
Reset Buffer Tree with Registers Synchronous reset buffer tree Rst_n Reset_n 1 full clock Timing adjusted by synthesis tools 58
Reset Buffer Tree with Registers Asynchronous reset buffer tree 1 Rst_n 1 Reset_n 1 1 full clock 1 full clock 59
Synchronous Reset for Multiple Clock omain Rst1_n buffer tree Reset_n Clk1 buffer tree Rst2_n Clk2 60
Asynchronous Reset for Multiple Clock omain 1 Rst1_n buffer tree Clk1 Reset_n 1 Rst2_n buffer tree Clk2 61
Sequential Reset Releasing (Synchronous) buffer tree Reset_n Rst1_n Clk1 buffer tree Rst2_n Clk2 62
Sequential Reset Releasing (Asynchronous) 1 Clk1 Rst1_n buffer tree Reset_n Rst2_n buffer tree Clk2 63
SoC Timing Issues Outline Timing terminologies Synchronous vs. asynchronous design Interfaces and timing closure Clock issues Reset esign for Testability (FT) SoC test plan Scan, ATPG, FT design rules Embedded memory test, embedded core test 64
esign For Testability IC testing vs. verification manufacturing defect vs. functional defect Importance of IC testing cost of RMA Test phases: wafer test (chip probing), final test (packaged IC testing) Test principle: different kinds of blocks require different test strategies use a test controller at top level as a sequencer 65
An Example of SoC Test Plan T_ T_M T_I T_O Test Mode: Processor Test RAM BIST ROM Check Sum SCAN/ATPG Functional Test Analog Macros... Test Mode Test I/O Control Embedded Core Test Memory BIST Scan Analog Macros Parallel Test Reduce test cost Need to watch out for maximal testing power consumption Sequential Test Fewer test pins 66
SCAN Chain Mux- Scan Cell Reset_n Scan_en N RN N SN combinational logic 67
ATPG Advantage of ATPG Scan Test with respect to Functional Test: a much shorter pattern with a higher fault coverage Test Steps 1. Reset whole chip, release reset 2. Enable scan mode, read out initial register values 3. Shift in a test vector 4. isable scan mode, run one clock 5. Enable scan mode, shift out flip-flop contents and check results 6. Repeat 3~5 until finishing all test vectors 68
IEEE1149.1 Boundary Scan Chain ICs on a PCB for board level test. Enable connectivity test without sending test vectors to cores of all ICs. Boundary Scan Cell Test_Normal O I SI SO Shift_Load 69
Boundary Scan Architecture TMS T TI JTAG test circuit TO Core Core Boundary Scan Cell 70
FT Guidelines - 1 Avoid internally gated clocks or derived clocks Clock Gated clock Combinational erived clock 71
FT Guidelines - 2 Provide test control for uncontrollable signals Clock PLL Clock Test_en Gate_en or Test_en 72
FT Guidelines - 3 Avoid feeding data path with clocks Clock RN N 73
FT Guidelines - 4 Avoid using flip-flops with an enable input (synthesis) A E A E B B Enable is not controllable! 74
FT Guidelines - 5 Avoid internally generated asynchronous reset signals N RN N N N RN RN RN 75
FT Guidelines - 6 Avoid using latches during logic synthesis scan_enable E scan path normal path A latch can not be inserted into a scan chain due to the uncontrollable enable input 76
FT Guidelines - 4 Avoid combinational feedback Combinational circuit Race/unstable! ATPG is not applicable! 77
Embedded Memory Test Schemes irect Access Simple circuits; flexible test patterns; possible to test memory at higher than normal operation frequency. Extra test pins needed Embedded Process Program Minimal hardware cost Slow test speed; test patterns fixed within processor codes Built-in Self-Test High efficient Highest hardware cost among the three test schemes 78
irect Access Memory Test Test_In Test_Out Test_Ctrl Momory Memory Embedded Memory 79
Typical Memory BIST Architecture A/i/CE A/i/CE o Clk Memory Module Clk Pattern Generator Test_ctrl one Compressor Result ROM: Read only -> Linear Feedback Shift Register, Check SUM RAM: Read/Write Patterns 80
Shared RAM BIST Circuits A 1 /i 1 /CE 1 /Clk 1 A 1 /i 1 /Clk 1 A 2 /i 2 /CE 2 /Clk 2 CE 1 Memory Module 1 o 1 Pattern Generator CE 2 A 2 /i 2 /Clk 2 o 2 Memory Module 2 Test_ctrl Compressor one Result 81
RAM Bist Algorithm: March14C+ 1) Lowest -> Highest address, write 01010101. (0x55.) 2) Lowest -> Highest address, read 0x55, write 0xAA, read 0xAA 3) Lowest -> Highest address, read 0xAA, write 0x55, read 0x55 4) Highest -> Lowest address, read 0x55, write 0xAA, read 0xAA 5) Highest -> Lowest address, read 0xAA, write 0x55, read 0x55 6) Highest -> Lowest address, read 0x55 5 writes + 9 reads Write/read 0 and 1 at every bit High speed read -> write -> read High speed address increasing and decreasing 82
Scan Chain and Memory Block Interface ATPG_test Combinational logic Memory Module Combinational logic Combinational logic 83
SoC Core Test Issues Shared test pin design and number of test pins Cores with different configurations such as scan cell type, scan chain length and test frequency Total test power consumption for parallel test. Total test cost (time) 84
IEEE1500 Embedded Core Test TAM-source Test Access Mechanism (TAM) TAM-sink Functional Input TAM-in P1500 Core Test Wrapper TAM-out Functional Functional Output Input TAM-in TAM-out P1500 Core Test Wrapper Functional Output WSI Core 1 WSO WSI Core k WSO WIR Test Controller 85
Summaries Recommend flip-flop based design. Use Latches only under a well-controlled situation. A proper design of block interfaces makes timing closure a local problem. Clock domains require special cares. A global reset signal is recommended. A proper SoC test plan is important to reduce RMA costs. FT rules must be followed to ensure the testability of designs. 86